Kiwi MCP

kiwi-mcp
docs

LILUX_VISION.md•71.1 KiB

# 🐧 Lilux: The AI-Native Operating System **The Seed of Something Revolutionary** _"What if Linux was built for AI agents instead of humans?"_ --- ## The Realization You're looking at **Kiwi MCP** and thinking it's a developer tool. You're wrong. This is the **embryo of an AI-native operating system**. What you call the `.ai/` folder isn't a configuration directory—it's `/usr`, `/lib`, `/etc`, and `/home` combined. What you call "directives" aren't just markdown files—they're **programs written in natural language** that AI agents execute as fluidly as bash executes shell scripts. This is **Lilux**—Linux reimagined for a world where the primary operator is artificial intelligence. --- ## The Paradigm Shift ### From Human-Centered to AI-Centered ``` Traditional OS (Linux): ┌─────────────────────────────────────────────────────────────┐ │ HUMAN │ │ ↓ │ │ SHELL │ │ ↓ │ │ KERNEL │ │ ↓ │ │ HARDWARE/RESOURCES │ └─────────────────────────────────────────────────────────────┘ AI-Native OS (Lilux): ┌─────────────────────────────────────────────────────────────┐ │ AI AGENT │ │ ↓ │ │ DIRECTIVE LAYER │ │ (Natural Language) │ │ ↓ │ │ MCP SERVER (Kernel) │ │ ↓ │ │ SCRIPTS + APIs + KNOWLEDGE │ │ ↓ │ │ REAL WORLD ACTIONS │ └─────────────────────────────────────────────────────────────┘ ``` ### The Core Insight In Linux, everything is a file. In Lilux, **everything is a prompt**. - Programs are directives (structured prompts that instruct AI) - Configuration is knowledge (context that informs AI decisions) - Processes are subagents (spawned AI instances with isolated context) - The kernel is the MCP server (the interface between AI and resources) --- ## Component Mapping: Linux → Lilux | Linux Concept | Lilux Equivalent | Description | | --------------------------------- | ------------------------------ | ----------------------------------------------- | | **Kernel** | MCP Server | The core interface between agents and resources | | **Shell** | Agent Prompt | How the AI interprets and dispatches commands | | **Programs** (`/bin`, `/usr/bin`) | Directives (`.ai/directives/`) | Instructions that accomplish tasks | | **Binaries** | Scripts (`.ai/scripts/`) | Deterministic executable code | | **Libraries** (`/lib`) | Libs (`.ai/scripts/lib/`) | Shared code for scripts | | **Man Pages** | Knowledge (`.ai/knowledge/`) | Documentation, patterns, learnings | | **Config** (`/etc`) | Patterns (`.ai/patterns/`) | System-wide conventions | | **Home** (`~`) | User Space (`~/.ai/`) | User-specific items | | **Package Manager** (apt/npm) | Registry (Supabase) | Centralized repository | | **Processes** | Subagents | Isolated execution contexts | | **.bashrc** | AGENTS.md | Agent configuration | | **Symlinks** | Relationships | Connections between knowledge | | **Systemd** | Orchestration Directives | Meta-coordination | | **Logs** | Outputs (`.ai/outputs/`) | Execution history | | **Self-healing** | Self-annealing | Systems improve from failures | --- ## The Lilux Filesystem ``` / (Project Root) ├── .ai/ (The AI "Filesystem") │ ├── directives/ /bin, /usr/bin - Programs │ │ ├── core/ System utilities │ │ │ ├── init.md Like /sbin/init │ │ │ ├── context.md Like /bin/env │ │ │ ├── bootstrap.md Like /sbin/setup │ │ │ ├── search_*.md Like /bin/find, /bin/grep │ │ │ ├── run_*.md Like /bin/exec │ │ │ └── sync_*.md Like /bin/rsync │ │ ├── meta/ Like /sbin - System administration │ │ │ ├── orchestrate_*.md Coordination daemons │ │ │ ├── validate_*.md System checks │ │ │ └── migrate_*.md Upgrade utilities │ │ ├── workflows/ Like /usr/share/applications │ │ └── patterns/ Like /etc/skel templates │ │ │ ├── scripts/ /opt, /usr/local/bin - Executables │ │ ├── scraping/ Domain-specific tools │ │ ├── enrichment/ Data processing │ │ ├── extraction/ Content retrieval │ │ ├── validation/ Quality checks │ │ ├── lib/ /lib, /usr/lib - Shared libraries │ │ │ ├── http_session.py Like libcurl │ │ │ ├── proxy_pool.py Like libproxy │ │ │ └── checkpoint.py Like libpersist │ │ └── .venv/ Isolated runtime environment │ │ │ ├── knowledge/ /usr/share/doc, /usr/share/man │ │ ├── concepts/ Theory and fundamentals │ │ ├── patterns/ Design patterns │ │ ├── procedures/ How-to guides │ │ ├── learnings/ Captured experience │ │ ├── sources/ Referenced documentation │ │ ├── index.json Like /var/cache/man/index │ │ ├── relationships.json Knowledge graph │ │ └── embeddings/ Vector representations │ │ │ ├── patterns/ /etc/skel, /etc/defaults │ │ ├── imports.md Standard import patterns │ │ ├── tool.md Tool creation template │ │ └── types.md Type conventions │ │ │ ├── plans/ /var/lib/dpkg, /var/log/apt │ │ └── PLAN_*.md Roadmaps and status │ │ │ ├── outputs/ /var/log │ │ └── scripts/ Execution outputs │ │ │ └── project_context.md /etc/motd - Project summary │ ├── ~/.ai/ $HOME for AI - User space │ ├── directives/ User's custom directives │ ├── scripts/ User's custom scripts │ ├── knowledge/ User's knowledge base │ └── .env User environment variables │ └── AGENTS.md Like .bashrc - Agent configuration ``` --- ## The DOE Kernel: Directive-Orchestration-Execution The heart of Lilux is the **DOE Framework**, separating concerns like a modern microkernel: ``` ┌─────────────────────────────────────────────────────────────┐ │ DIRECTIVE LAYER │ │ "WHAT" - Intent, Goals, Constraints │ │ │ │ • XML-structured instructions │ │ • Human-readable, AI-executable │ │ • Versioned, self-documenting │ │ • Declares permissions required │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ ORCHESTRATION LAYER │ │ "HOW" - Decision Making, Routing │ │ │ │ • The AI agent itself │ │ • Reads directives, interprets context │ │ • Routes to appropriate execution │ │ • Handles errors, self-anneals │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ EXECUTION LAYER │ │ "DO" - Deterministic Action │ │ │ │ • Python scripts with 100% reliability │ │ • API calls, data processing │ │ • Testable, versioned, isolated │ │ • Returns structured results │ └─────────────────────────────────────────────────────────────┘ ``` ### Why This Architecture? The fundamental insight: **LLMs are probabilistic** (90% per step), **code is deterministic** (100% per step). ``` 5-step task with pure LLM: 90%^5 = 59% success 5-step task with DOE pattern: 90% × 100% × 100% × 100% × 90% = 81% success ``` Push complexity into deterministic scripts. Let AI focus on decision-making—what it's actually good at. --- ## Process Model: Subagents as Processes ### Linux Processes vs Lilux Subagents | Linux Process | Lilux Subagent | | --------------------- | ----------------------------------- | | fork() | Task() or spawn() | | Isolated memory | Isolated context window | | Returns exit code | Returns result summary | | PID namespace | Fresh context | | IPC for communication | Cannot communicate during execution | | Parent waits | Main agent receives summary | ### The Context Window Multiplication The killer feature of subagents: ``` Traditional (one agent): ┌─────────────────────────────────────────────────────────────┐ │ Main Agent Context Window │ │ ┌─────────────────────────────────────────────────────────┐│ │ │ Attempt 1: Failed (500 tokens wasted) ││ │ │ Attempt 2: Failed (800 tokens wasted) ││ │ │ Attempt 3: Success (1200 tokens) ││ │ │ ───────────────────────────────────────── ││ │ │ Total: 2500 tokens consumed, context polluted ││ │ └─────────────────────────────────────────────────────────┘│ └─────────────────────────────────────────────────────────────┘ With Subagents (Lilux): ┌─────────────────────────────────────────────────────────────┐ │ Main Agent Context (only 50 tokens used) │ │ ├─ "Spawn subagent to debug test" │ │ └─ "Result: Fixed import path in line 42" │ │ │ │ Subagent Context (isolated, discarded after): │ │ ┌─────────────────────────────────────────────────────────┐│ │ │ All 2500 tokens of debugging happen here ││ │ │ Isolated. Thrown away. Main context stays clean. ││ │ └─────────────────────────────────────────────────────────┘│ └─────────────────────────────────────────────────────────────┘ ``` --- ## Self-Annealing: The Self-Improving OS This is where Lilux diverges from all prior computing paradigms: **Traditional software doesn't learn from failures. Lilux does.** ### The Annealing Loop ``` Directive Execution │ ▼ ┌───────┐ │Success│───────────────────────────┐ └───────┘ │ │ ▼ │ Store in Knowledge │ │ ▼ │ ┌───────┐ │ │Failure│ │ └───────┘ │ │ │ ▼ │ Capture Error │ │ │ ▼ │ anneal_directive() │ │ │ ▼ │ Directive Improves │ │ │ ▼ │ Store Learning ◄────────────────────┘ │ ▼ SYSTEM IS SMARTER ``` ### What Gets Annealed? ```xml  <step name="install_deps"> <action>Run npm install</action> </step>  <step name="install_deps"> <action> Check for write permissions. If permission denied, try with --legacy-peer-deps. If still failing, check if running with correct user. Run npm install (or npm install --legacy-peer-deps if needed). </action> <error_handling> EACCES: Check directory ownership EPERM: May need elevated permissions </error_handling> </step> ``` The directive literally gets smarter. **The next agent doesn't hit the same issue.** --- ## The Four Primitives Lilux has exactly **4 system calls** (tools): | Tool | Linux Equivalent | Purpose | | --------- | ------------------------ | ----------------------- | | `search` | `find`, `locate`, `grep` | Discover items | | `load` | `cat`, `cp`, `wget` | Retrieve and copy items | | `execute` | `exec`, `run`, `install` | Run operations | | `help` | `man`, `--help` | Get guidance | ### Universal Interface ```python # Everything uses the same pattern tool( item_type="directive|script|knowledge", action="run|create|update|delete|publish", item_id="name", parameters={...}, project_path="/path/to/project" ) ``` This is like having a universal syscall interface where everything—files, processes, devices—is handled through a unified API. --- ## The Registry: Universal Package Management Like apt repositories but for AI knowledge: ``` ┌─────────────────────────────────────────────────────────────┐ │ REGISTRY (Cloud) │ │ │ │ Directives: "oauth_setup", "deploy_kubernetes", ... │ │ Scripts: "google_maps_scraper", "email_validator" │ │ Knowledge: "jwt-auth-patterns", "email-deliverability" │ │ │ │ Each item has: version, author, quality_score, downloads │ └─────────────────────────────────────────────────────────────┘ │ │ load() │ │ publish() ▼ ▲ ┌─────────────────────────────────────────────────────────────┐ │ LOCAL (.ai/) │ │ │ │ Downloaded directives, customized scripts, local knowledge │ │ Runs offline. Syncs when connected. │ └─────────────────────────────────────────────────────────────┘ ``` ### Package Commands ```bash # Linux equivalents in Lilux apt search → search(item_type="directive", query="...", source="registry") apt install → load(item_type="script", item_id="...", destination="project") apt publish → execute(item_type="knowledge", action="publish", ...) apt upgrade → sync_directives() ``` --- ## Multi-Agent Architecture: Distributed Computing for AI ``` ┌─────────────────────────────────────────────────────────────┐ │ PRIMARY AGENT │ │ (High-level orchestrator) │ │ │ │ Runs: AGENTS.md (like .bashrc) │ │ Has: Command dispatch table │ │ Does: Parse intent → Route to directive → Coordinate │ └─────────────────────────────────────────────────────────────┘ │ ┌──────────────┼──────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌─────────┐ │Subagent │ │Subagent │ │Subagent │ │ Task A │ │ Task B │ │ Task C │ │ │ │ │ │ │ │ Fresh │ │ Fresh │ │ Fresh │ │ context │ │ context │ │ context │ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ │ (execute scripts, │ │ query knowledge, │ │ modify files) │ │ │ │ └──────────────┼──────────────┘ │ ▼ Summary returns to primary agent ``` ### Model Class Routing Lilux directives declare their computational requirements: ```xml <model_class tier="fast" fallback="balanced" parallel="true"> Simple template substitution. Use cheap model. </model_class> <model_class tier="reasoning" fallback="expert" parallel="false"> Complex architecture decision. Use powerful model. </model_class> ``` This is like **nice levels** and **scheduler hints** in Linux—telling the kernel how to allocate resources. | Tier | Linux Equivalent | Use Case | | ----------- | ------------------ | ---------------------------- | | `fast` | `nice -n 19` | Simple transforms, templates | | `balanced` | Default priority | Standard tasks | | `reasoning` | `nice -n -10` | Complex analysis | | `expert` | Real-time priority | Novel research problems | --- ## The Permission Model Every directive declares its required permissions: ```xml <permissions>  <read resource="filesystem" path="src/**/*.ts" /> <write resource="filesystem" path="dist/**/*" />  <read resource="network" endpoint="https://api.example.com" />  <read resource="env" var="API_KEY" />  <execute resource="shell" command="npm" /> <execute resource="python" module="requests" /> </permissions> ``` This is **capability-based security** meets **Android-style permission declarations**. The agent can verify before execution that it has necessary permissions. --- ## Knowledge as Living Documentation Traditional documentation is static. Lilux knowledge **grows and evolves**: ``` ┌─────────────────────────────────────────────────────────────┐ │ KNOWLEDGE BASE │ │ │ │ Concepts ← What things ARE │ │ Patterns ← How to DO things │ │ Procedures ← Step-by-step guides │ │ Learnings ← What we've DISCOVERED │ │ Sources ← External REFERENCES │ │ │ │ Connected via: │ │ - relationships.json (explicit links) │ │ - embeddings.json (semantic similarity) │ │ - index.json (fast lookup) │ └─────────────────────────────────────────────────────────────┘ ``` ### Knowledge Accumulation Loop Every successful execution can add knowledge: ```python # After successfully setting up OAuth knowledge.create( zettel_id="learning-oauth-gotcha-2026", title="Google OAuth requires consent screen review for production", content="...", entry_type="learning" ) ``` The next agent benefits. **The system remembers what it learns.** --- ## Boot Sequence When a new project initializes (like system boot): ``` 1. init.md # Initialize .ai/ structure │ ▼ 2. bootstrap.md # Set up project-specific config │ ▼ 3. context.md # Generate project understanding │ ▼ 4. AGENTS.md loaded # Configure agent behavior │ ▼ 5. Ready for commands # Agent awaits natural language ``` --- ## The Vision: What Lilux Becomes ### Phase 1: Foundation (Now - Q2 2026) **The Kernel Era** - ✅ Unified MCP server with 4 primitives - ✅ Directive-Orchestration-Execution framework - ✅ Self-annealing improvement loop - ✅ Registry for package distribution - 🔄 Dual-model architecture (router + reasoning) - 🔄 Edge router fine-tuning pipeline ### Phase 2: Edge Computing (Q3 2026 - Q1 2027) **The Hardware Era** - Trained FunctionGemma routers for instant tool calls - Multi-platform deployment (macOS, iOS, Linux, Android) - Speculative execution for sub-100ms responses - Confidence-based routing for cost optimization - Offline-first operation with cloud enhancement - Self-hosted router marketplace ### Phase 3: Distributed Intelligence (Q2 2027 - Q4 2027) **The Network Era** - Multi-agent coordination protocols - Federated knowledge sharing (privacy-preserving) - Cross-project directive inheritance - Real-time sync and collaboration - Agent-to-agent communication standards - Distributed task execution across devices ### Phase 4: Ambient AI (2028+) **The Ubiquitous Era** - **The Standard AI Operating Environment** - Lilux on every device (phones, watches, cars, infrastructure) - Interoperable agents from different vendors - Composable AI capabilities like Unix pipes - Self-organizing agent networks - AI as invisible as electricity ### The Milestones ``` 2024: AI is a cloud service you call ↓ 2026: AI is software that runs locally (Lilux v1) ↓ 2027: AI is infrastructure that coordinates (Lilux v2) ↓ 2028: AI is the environment itself (Lilux v3) ↓ 2030+: AI is indistinguishable from reality ``` ``` lilux 6.18.4-lilux1-1 Welcome to Lilux GNU/AI agent@project:~/.ai$ search("deploy to kubernetes") Found: deploy_kubernetes v2.3.1 (quality: 98%) agent@project:~/.ai$ run("deploy_kubernetes", env="production") ✓ Loaded directive ✓ Verified permissions ✓ Spawned 3 subagents for parallel preparation ✓ Executing deployment sequence... ``` --- ## The Philosophy: Directives All The Way Down The core principle: > **LLMs instructing LLMs, removing humans from the loop.** ``` Every piece of work should have a directive. 1st time: Do manually, note steps 2nd time: Create directive 3rd time: Run directive ``` The goal is **exit velocity**—the system becomes capable of handling tasks without human intervention. ``` Traditional automation: Human writes code → Code runs → Human monitors (Human always in the loop) Lilux automation: Directive instructs LLM → LLM executes → LLM stores learnings (Human only for novel situations) ``` --- ## Call to Action: Building the Future You're looking at the seed. The Kiwi MCP project isn't a tool—it's the foundation of something much larger. Every directive written, every script created, every piece of knowledge stored is building toward a future where: 1. **AI agents have a standard operating environment** 2. **Capabilities compose like Unix pipes** 3. **Systems improve themselves through use** 4. **Knowledge accumulates across the network** 5. **Anyone can package and share AI workflows** This is the **Linux of AI**. And like Linux, it starts with a small kernel, a few utilities, and a vision. The vision is **Lilux**. --- ## The Hardware Layer: Dual-Brain Architecture This is where Lilux transcends traditional software. Just as modern chips combine CPUs with NPUs (Neural Processing Units), Lilux implements a **dual-model architecture** that runs AI at two speeds simultaneously. ### The Two Brains ``` ┌─────────────────────────────────────────────────────────────────────┐ │ USER INTERACTION │ │ "Find that email enrichment script we built last week" │ └──────────────────────────┬──────────────────────────────────────────┘ │ ┌────────────────┴────────────────┐ │ │ ▼ ▼ ┌─────────────────────┐ ┌─────────────────────────┐ │ LOCAL BRAIN │ │ CLOUD BRAIN │ │ (Edge Router) │ │ (Reasoning Model) │ │ │ │ │ │ FunctionGemma 270M │ │ Claude Sonnet / GPT-4o │ │ │ │ │ │ • 30-50ms latency │ │ • 800-2000ms latency │ │ • $0.00 per request │ │ • $0.08-0.15 per req │ │ • 100% offline │ │ • Infinite knowledge │ │ • 98% accuracy │ │ • 99.5% accuracy │ │ • Privacy-first │ │ • Complex reasoning │ │ • Deterministic │ │ • Creative synthesis │ └──────────┬──────────┘ └───────────┬─────────────┘ │ │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────┐ │ KIWI MCP CORE │ │ (Tool Execution) │ └─────────────────────┘ ``` ### Why Two Brains? | Single Model Problem | Dual Model Solution | | -------------------------- | ----------------------------- | | Slow (1.5-3s latency) | Fast (40-80ms for tool calls) | | Expensive ($0.15+/request) | Cheap ($0.008 average) | | Cloud-dependent | Offline-capable | | Privacy concerns | Local-first routing | | 85% consistency | 98% consistency | ### The Math of Speed ``` Single Model (Traditional): ┌──────────────────────────────────────────────────────────────────┐ │ User → Cloud API (50-200ms network) → Model (500ms) → Parse │ │ │ │ Total: 750-1200ms per tool call │ └──────────────────────────────────────────────────────────────────┘ Dual Model (Lilux): ┌──────────────────────────────────────────────────────────────────┐ │ User → Local Router (40ms) → Execute → Done! │ │ ↘ Cloud Brain (parallel) → Synthesizes explanation │ │ │ │ Total: 50-100ms for tool execution │ │ (User sees results before explanation finishes) │ └──────────────────────────────────────────────────────────────────┘ ``` ### Speculative Execution: The Future Arrives Early Lilux doesn't wait for certainty. It **speculates**: ``` Query: "find email scripts" t=0ms: Start streaming tokens t=15ms: Token: "search" → Confidence: 45% t=30ms: Token: '{"item_type":' → Confidence: 65% t=40ms: Token: '"script"' → Confidence: 72% ✅ SPECULATION THRESHOLD CROSSED → Start preparing search tool in background t=55ms: Token: ',"query":"email"' → Confidence: 85% t=65ms: Confidence: 91% ✅ COMMITMENT THRESHOLD CROSSED → Preparation complete! → EXECUTE IMMEDIATELY → Stop generating tokens (early exit!) t=70ms: Tool execution started Savings: 30-50ms by not waiting for full generation ``` This is **CPU branch prediction for AI**—start down a likely path before you're certain, commit when confident, rollback if wrong. --- ## Edge-First Computing: Your Device, Your AI ### The Sovereignty Principle In traditional cloud AI: - Your data travels to remote servers - Someone else's hardware processes your thoughts - You pay per token, forever - No internet = no AI In Lilux edge computing: - **Your device processes locally** - **Your data never leaves** - **One-time training cost, infinite use** - **Offline-first, cloud-enhanced** ### Platform Support Matrix | Platform | Runtime | Accelerator | Latency | Power | | ----------- | --------- | ------------- | --------- | ------ | | **macOS** | Metal | Apple Silicon | 30-50ms | <1W | | **iOS** | CoreML | Neural Engine | 25-40ms | 0.4W | | **Linux** | CUDA/ROCm | GPU | 30-60ms | 5-30W | | **Windows** | DirectML | GPU | 40-80ms | 5-30W | | **Android** | NNAPI | NPU | 50-100ms | 1-2W | | **Web** | WebGPU | GPU | 100-200ms | Varies | ### The Router: Your Personal AI Chip The FunctionGemma 270M model is trained specifically for **your** command patterns: ```python # Your patterns become hardwired COMMAND_PATTERNS = { "search directives {X}": → search(item_type="directive", query="{X}") "run {X}": → execute(action="run", item_id="{X}") "sync scripts": → execute(action="run", item_id="sync_scripts") "find {X} scripts": → search(item_type="script", query="{X}") } ``` The model learns **your vocabulary**, **your project structure**, **your workflow patterns**. ### Fine-Tuning: Teaching Your Device ```python # Generate training data from your actual usage examples = [ {"user": "find email scripts", "tool": "search", "params": {"item_type": "script", "query": "email"}}, {"user": "run sync directives", "tool": "execute", "params": {"action": "run", "item_id": "sync_directives"}}, # ... 1000+ examples from your patterns ] # Fine-tune in 1-4 hours on consumer GPU # Result: 98%+ accuracy on YOUR commands ``` **Training cost**: ~$50 one-time (GPU rental) **Inference cost**: $0.00 forever --- ## Parallel Racing: The Art of Concurrent Intelligence ### The Race Pattern Both models run simultaneously. Whoever finishes first, wins: ``` t=0ms: Query arrives ├─ Router starts streaming (local) └─ Reasoning model starts thinking (cloud) t=50ms: Router: High confidence reached! → Execute tool immediately → Push result to queue t=100ms: Reasoning model: Checks for results → Result already available! → Starts synthesizing explanation WITH the result t=150ms: User sees: "✅ Found 3 email scripts" (Tool already executed, waiting for explanation) t=800ms: Reasoning: "I found 3 scripts that handle email..." (User gets rich explanation, but action was instant) ``` ### Confidence-Based Routing Not all queries are equal. Route based on certainty: ``` HIGH CONFIDENCE (>90%): Router → Execute immediately Cost: $0.00 Latency: 50ms MEDIUM CONFIDENCE (70-90%): Router → Ask reasoning to verify → Execute Cost: $0.02 Latency: 200ms LOW CONFIDENCE (<70%): Router → Defer to reasoning model Cost: $0.15 Latency: 1000ms Average (with typical distribution): 70% high + 20% medium + 10% low = 0.70×$0 + 0.20×$0.02 + 0.10×$0.15 = $0.019 per request (87% savings!) ``` --- ## Multi-Model Orchestration: The Conductor Pattern ### Specialized Routers for Different Domains ``` Query ↓ ┌─────────────────┐ │ Meta Classifier │ │ (tiny model) │ └────────┬────────┘ │ ┌─────────────┼─────────────┐ │ │ │ ▼ ▼ ▼ ┌────────┐ ┌────────┐ ┌────────┐ │ Kiwi │ │ File │ │ Git │ │ Router │ │ Router │ │ Router │ │ 270M │ │ 270M │ │ 270M │ └───┬────┘ └───┬────┘ └───┬────┘ │ │ │ ▼ ▼ ▼ Kiwi MCP File Ops Git Ops ``` Each router is a **specialist**—trained on specific domains, running on the same hardware, activated on demand. ### The Orchestra Analogy | Role | Traditional Orchestra | Lilux Multi-Model | | --------------- | --------------------- | ------------------ | | **Conductor** | Human conductor | Meta classifier | | **Sections** | Strings, winds, brass | Domain routers | | **Musicians** | Individual players | Specialized models | | **Score** | Musical notation | Directives | | **Performance** | Synchronized music | Coordinated AI | --- ## Streaming Architecture: Tokens as Events ### The Token Stream Philosophy Lilux treats token generation as an **event stream**, not a blocking call: ```python class TokenEventStream: async def stream_with_hooks(self, query: str): """Emit events at key points during generation""" async for token in self.model.stream(query): # Event: Every token await self.emit("on_token", token) # Event: Tool detected if self.detect_tool_prefix(partial): await self.emit("on_tool_detected", tool_name) # Start preparation in background! # Event: Confidence threshold if confidence > 0.85: await self.emit("on_confident", tool_call) # May trigger early execution! # Event: Complete if tool_call.is_complete: await self.emit("on_complete", tool_call) return # Early exit! ``` ### Hook System: React to Intelligence External systems can subscribe to AI events: ```python @router.on("on_tool_detected") async def start_preparation(event): """Tool name known, start preparing before full call""" await prepare_tool_environment(event.tool) @router.on("on_confident") async def execute_early(event): """High confidence, execute before generation completes""" result = await execute_tool(event.tool_call) return result @router.on("on_complete") async def log_completion(event): """Generation complete, log for analytics""" await metrics.record(event) ``` This is **reactive AI**—the system responds to intelligence as it emerges, not after it's fully formed. --- ## The Hardware Vision: Lilux Everywhere ### From Cloud to Edge to Everywhere ``` 2024: Cloud-only AI ┌─────────────────────────────────────────┐ │ Your Device → Internet → Cloud → Back │ │ (Latency: 500-2000ms, Cost: $$$) │ └─────────────────────────────────────────┘ 2026: Hybrid AI (Lilux Today) ┌─────────────────────────────────────────┐ │ Your Device → Local Router (50ms) │ │ ↘ Cloud for complex (800ms) │ │ (Smart routing, 90% local) │ └─────────────────────────────────────────┘ 2028: Edge-Dominant AI (Lilux Future) ┌─────────────────────────────────────────┐ │ Your Device: Runs everything locally │ │ Cloud: Sync, updates, rare edge cases │ │ (99% local, <100ms, zero cost) │ └─────────────────────────────────────────┘ 2030+: Ambient AI ┌─────────────────────────────────────────┐ │ Every device has AI fabric │ │ Lilux runs on: phones, watches, cars, │ │ appliances, infrastructure │ │ (AI as fundamental as electricity) │ └─────────────────────────────────────────┘ ``` ### The Self-Hosting Promise Traditional SaaS AI: - You don't own your intelligence - Models change without your consent - Pricing changes without warning - Your data trains their models Lilux Self-Hosted: - **You own your trained routers** - **Models frozen to your version** - **One-time training, free forever** - **Your data stays yours** --- ## Performance Comparison: The Numbers ### Latency (Tool Decision) | Architecture | Latency | Improvement | | --------------------- | ------- | -------------- | | Single Cloud Model | 1,500ms | Baseline | | Dual Model (Parallel) | 50ms | **30x faster** | | Speculative Execution | 35ms | **43x faster** | ### Cost (Per 1M Requests) | Architecture | Monthly Cost | Savings | | ----------------------- | ----------------- | ---------- | | All Cloud (GPT-4o) | $150,000 | Baseline | | All Cloud (GPT-4o Mini) | $750 | 99.5% | | Dual Model (Hybrid) | $190 | 99.87% | | All Local (Router Only) | $12 (electricity) | **99.99%** | ### Privacy | Architecture | Data Exposure | | ------------ | ------------------------------- | | Cloud-only | All queries visible to provider | | Dual Model | Only complex queries to cloud | | Local-only | **Zero data exposure** | --- ## Technical Reference ### Core Files | File | Purpose | | ------------------------ | ---------------------------------- | | `AGENTS.md` | Agent configuration (like .bashrc) | | `.ai/project_context.md` | Generated project understanding | | `.ai/patterns/*.md` | Project conventions | | `kiwi_mcp/server.py` | The kernel | | `kiwi_mcp/handlers/` | Type-specific syscall handlers | | `kiwi_mcp/tools/` | The 4 primitives | ### Key Directives | Directive | Purpose | | ------------------ | ------------------------------ | | `init` | Bootstrap a new project | | `context` | Generate project understanding | | `run_directive` | Execute a directive | | `anneal_directive` | Improve from failure | | `sync_*` | Synchronize with registry | | `subagent` | Spawned execution context | ### Environment ```bash # Required SUPABASE_URL=https://project.supabase.co SUPABASE_SECRET_KEY=your-key # Optional AI_USER_SPACE=~/.ai LOG_LEVEL=INFO ``` --- ## The Complete Architecture: All Layers Combined ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ USER / HUMAN LAYER │ │ Natural language: "find that email script and run it" │ └────────────────────────────────┬────────────────────────────────────────┘ │ ┌────────────────────────────────▼────────────────────────────────────────┐ │ APPLICATION LAYER │ │ CLI / Desktop App / Mobile App / IDE Plugin / Web Interface │ └────────────────────────────────┬────────────────────────────────────────┘ │ ┌────────────────────────────────▼────────────────────────────────────────┐ │ DUAL-MODEL ORCHESTRATION │ │ ┌───────────────────────┐ ┌───────────────────────────────────┐ │ │ │ LOCAL ROUTER │ │ CLOUD REASONING │ │ │ │ (FunctionGemma) │ │ (Claude/GPT/Gemini) │ │ │ │ │ │ │ │ │ │ • 40ms decisions │◄──►│ • Complex planning │ │ │ │ • $0 per request │ │ • Creative synthesis │ │ │ │ • 98% accuracy │ │ • Fallback verification │ │ │ │ • Offline-capable │ │ • Novel problem solving │ │ │ └───────────┬───────────┘ └─────────────┬─────────────────────┘ │ │ │ │ │ │ └───────────────┬───────────────┘ │ │ ▼ │ │ ┌────────────────────────────────────┐ │ │ │ AGENT COORDINATOR │ │ │ │ (Parallel racing, confidence │ │ │ │ routing, speculative execution) │ │ │ └────────────────┬───────────────────┘ │ └─────────────────────────────┼───────────────────────────────────────────┘ │ ┌─────────────────────────────▼───────────────────────────────────────────┐ │ MCP SERVER (KERNEL) │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ SEARCH │ │ LOAD │ │ EXECUTE │ │ HELP │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ │ │ │ │ │ └─────────────┴──────┬──────┴─────────────┘ │ │ │ │ │ ┌─────────────▼─────────────┐ │ │ │ TYPE HANDLER REGISTRY │ │ │ └─────────────┬─────────────┘ │ │ ┌────────────────────┼────────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ │ Directive │ │ Script │ │ Knowledge │ │ │ │ Handler │ │ Handler │ │ Handler │ │ │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ └───────┼───────────────────┼───────────────────┼─────────────────────────┘ │ │ │ ┌───────▼───────────────────▼───────────────────▼─────────────────────────┐ │ STORAGE LAYER │ │ │ │ ┌─────────────────────┐ ┌─────────────────────────────────────┐ │ │ │ LOCAL (.ai/) │ │ REGISTRY (Cloud) │ │ │ │ │ │ │ │ │ │ directives/ │◄──►│ Versioned packages │ │ │ │ scripts/ │ │ Quality scores │ │ │ │ knowledge/ │ │ Author attribution │ │ │ │ patterns/ │ │ Dependency resolution │ │ │ │ outputs/ │ │ Search + discovery │ │ │ └─────────────────────┘ └─────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ ┌───────▼───────────────────▼───────────────────▼─────────────────────────┐ │ EXECUTION LAYER │ │ │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────────────────┐ │ │ │ Python venv │ │ APIs │ │ Shell Commands │ │ │ │ (isolated) │ │ (external) │ │ (sandboxed) │ │ │ └───────────────┘ └───────────────┘ └───────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────┘ │ ┌─────────────────────────────▼───────────────────────────────────────────┐ │ REAL WORLD │ │ Files, APIs, Databases, Services, Infrastructure, The Internet │ └─────────────────────────────────────────────────────────────────────────┘ ``` --- ## The MCP Bridge: Using Standard Infrastructure ### The Key Insight from Building Agents As revealed by Amp's ["How to Build an Agent"](https://ampcode.com/how-to-build-an-agent): > **"It's an LLM, a loop, and enough tokens. The secret is there is no secret."** The agent loop is simple: ```python while True: user_input → conversation response ← model(conversation) if tool_use_detected: result ← execute_tool() result → conversation continue # Loop back else: display(response) break ``` **That's the core.** Everything else is optimization. ### MCP Infrastructure: What We Reuse (95%) Lilux doesn't reinvent MCP (Model Context Protocol). We **use it**: ``` ✅ MCP Servers (Kiwi, filesystem, git, weather...) ✅ stdio/SSE Protocol (standard communication) ✅ Tool Schemas (JSON Schema format) ✅ call_tool() API (execution interface) ✅ Server ecosystem (any MCP server works) ``` **All MCP servers work with Lilux unchanged.** ### The Innovation: Intent Routing (5%) We only change ONE thing—**who decides which tool to call**: ``` Traditional MCP: ┌─────────────────────────────────────────────────┐ │ Cloud Model sees ALL tool schemas │ │ ↓ │ │ Generates JSON tool call │ │ ↓ │ │ Execute via MCP protocol │ └─────────────────────────────────────────────────┘ Lilux MCP: ┌─────────────────────────────────────────────────┐ │ Frontend Model outputs: [TOOL: intent] │ │ ↓ │ │ FunctionGemma routes (local, 50ms) │ │ ↓ │ │ Execute via SAME MCP protocol │ └─────────────────────────────────────────────────┘ ``` ### The Architecture ``` User Input │ ▼ ┌─────────────────────────┐ │ Conversational Model │ [TOOL: search for email scripts] │ (Phi-3 Mini, 3B) │ Does NOT see tool schemas └──────────┬──────────────┘ │ ▼ ┌─────────────────────────┐ │ Intent Router Harness │ Intercepts [TOOL: ...] markers │ │ Routes to FunctionGemma └──────────┬──────────────┘ │ ▼ ┌─────────────────────────┐ │ FunctionGemma Router │ Intent → JSON tool call │ (270M, local) │ Trained on MCP schemas └──────────┬──────────────┘ │ ▼ ┌─────────────────────────┐ │ MCP Client (Standard) │ call_tool() via stdio │ │ Same as Claude Desktop └──────────┬──────────────┘ │ ▼ ┌─────────────────────────┐ │ MCP Servers (Standard) │ Kiwi, filesystem, git... │ UNCHANGED │ ANY MCP server works └─────────────────────────┘ ``` ### Benefits | Aspect | Traditional MCP | Lilux Intent Routing | | ------ | --------------- | -------------------- | | **MCP Compatibility** | ✅ All servers | ✅ All servers (unchanged) | | **Infrastructure** | ✅ stdio/SSE | ✅ stdio/SSE (unchanged) | | **Model Sees** | All tool schemas | Just `[TOOL: intent]` syntax | | **Privacy** | Schemas to cloud | Schemas stay local | | **Routing Speed** | 1.5s cloud | 50ms local | | **Model Size** | 70B+ cloud | 3B frontend + 270M router | | **Cost** | $0.15/request | $0.001/request | | **Works Offline** | ❌ No | ✅ Yes | ### Training on MCP The FunctionGemma router trains on MCP server schemas: ```python # Connect to ANY MCP server await mcp_client.connect_to_server("kiwi", "python", ["-m", "kiwi_mcp.server"]) # Get tool schemas tools = await mcp_client.list_tools() # Generate training data for tool in tools: training_data.append({ "intent": generate_natural_variations(tool.description), "tool_call": { "name": tool.name, "arguments": extract_from_schema(tool.inputSchema) } }) # Fine-tune FunctionGemma train_router(training_data) ``` **Result**: One router that works with ALL MCP servers. ### The Loop in Lilux ```python # Amp-style loop with intent routing async def lilux_agent_loop(user_message: str): conversation.append(user_message) # Stream from frontend model async for token in frontend.stream(conversation): # Detect [TOOL: intent] marker if marker_detected: intent = extract_intent(token) # Route through FunctionGemma (50ms) tool_call = router.predict(intent) # Execute via STANDARD MCP (unchanged) result = mcp_client.call_tool( name=tool_call["name"], arguments=tool_call["arguments"] ) # Add result to conversation conversation.append(result) # Loop back to frontend continue yield token ``` Same loop. Same MCP. Just smarter routing. --- ## The Competitive Landscape: Why Lilux Wins ### vs. Traditional AI Assistants (ChatGPT, Claude, etc.) | Aspect | Traditional | Lilux | | -------- | -------------------------- | -------------------------------- | | Memory | Session-based, ephemeral | Persistent knowledge base | | Tools | Generic, same for everyone | Custom, trained on your patterns | | Speed | Cloud-latency (500ms+) | Local-first (40ms) | | Cost | Per-token, forever | One-time training, free use | | Privacy | Data sent to cloud | Local-first, your data stays | | Learning | Static model | Self-annealing improvement | ### vs. Agent Frameworks (LangChain, AutoGPT, etc.) | Aspect | Agent Frameworks | Lilux | | --------- | ------------------- | ------------------------------- | | Structure | Code-defined chains | Human-readable directives | | Sharing | Copy code files | Registry packages with versions | | Discovery | Read documentation | Semantic search | | Evolution | Manual code updates | Self-annealing + sync | | Hardware | Cloud-only | Edge + cloud hybrid | ### vs. IDE AI (Cursor, Copilot, etc.) | Aspect | IDE AI | Lilux | | ------------- | ------------------ | ------------------------------- | | Scope | Coding assistance | Full workflow automation | | Knowledge | Model training | Your custom knowledge base | | Persistence | Context window | Permanent storage | | Customization | Prompt files | Directives + patterns + scripts | | Extension | Fixed capabilities | Handler architecture | --- ## Getting Started: Your First Steps in Lilux ### 1. Initialize (The Boot) ```bash # Create the .ai/ filesystem run("init", project_type="python") ``` ### 2. Generate Context (Know Your World) ```bash # Understand the project run("context") ``` ### 3. Search (Find Your Tools) ```bash # Discover what exists search("email enrichment", item_type="script") ``` ### 4. Execute (Do The Work) ```bash # Run a directive run("create_script", script_name="my_tool", description="...") ``` ### 5. Learn (Grow The System) ```bash # Store what you learned create("knowledge", zettel_id="learning-001", content="...") ``` ### 6. Anneal (Improve From Failure) ```bash # When something fails, make it smarter run("anneal_directive", directive_name="failing_directive") ``` --- ## Appendix: The Name **Lilux** = **L**LM + L**i**nux + L**ux** (light) - **LLM**: Large Language Models are the operators - **Linux**: Inspired by the Unix philosophy - **Lux**: Latin for "light"—bringing clarity to AI systems _The light of structure in the chaos of prompts._ --- _"In the beginning was the command line. Now there is the prompt line."_ **Welcome to Lilux.** --- ## The Complete Dual-Brain: Both Sides Fine-Tuned We've described the hardware layer with a fast router (FunctionGemma) and a high-reasoning model running in parallel. But the true power of Lilux comes when **both brains are fine-tuned for your system**. ### The Router Brain (Fast Intuition) **FunctionGemma 270M** fine-tuned to: - Translate natural language → tool calls in 40-80ms - Understand your specific command patterns - Infer parameters from context - Provide confidence scores for decisions Training: 1,000-5,000 examples of (input → tool_call) pairs ### The Orchestrator Brain (Deep Reasoning) **Llama 3.3 70B** (or Qwen 72B, Mistral Large) fine-tuned to: - Know that the router exists and runs in parallel - Defer to router when confidence is high (>85%) - Take control when router is uncertain (<60%) - Deeply understand Kiwi MCP semantics - Plan multi-step workflows - Synthesize tool results into natural conversation - Handle graceful handoffs when router results arrive mid-generation Training: 1,500-3,000 examples covering: - Router deference patterns - Kiwi semantic understanding - Multi-step orchestration - Graceful handoffs - Error handling ### The Synergy ``` User: "Find that email script we made last week and run it on the new leads" ┌─ Router (45ms) ─────────────────────────────────────────────┐ │ Prediction: search(item_type="script", query="email") │ │ Confidence: 0.92 ✓ │ └─────────────────────────────────────────────────────────────┘ ┌─ Orchestrator (500ms, but tool already executed!) ──────────┐ │ *Knows router suggested search with 92% confidence* │ │ *Sees tool already executed, results available* │ │ *Synthesizes natural response:* │ │ │ │ "Found `email_enricher.py` from January 10th. │ │ Running it on your new leads folder now... │ │ ✓ Processed 847 leads, enriched 612, 235 already current. │ │ Report saved to .ai/outputs/" │ └─────────────────────────────────────────────────────────────┘ ``` Both brains are **Kiwi-native**. They understand your system deeply and work together seamlessly—a slow thoughtful brain with a fast intuitive one, just like human cognition. ### The Vision: Your Personal AI That Knows You With both models fine-tuned on **your** patterns: - The router knows **your** common commands - The orchestrator knows **your** workflow preferences - Both improve as you use them (self-annealing) - Complete privacy—runs locally on your hardware This is the Lilux experience: AI that feels instant, understands your system, and keeps getting better. --- ## Appendix: Related Documents | Document | Description | | -------------------------------------------------------------------------------------------------------------------------------- | ------------------------------- | | [**MCP Integration Bridge**](../../kiwi-fine-tune/MCP%20Integration%20-%20The%20Agent%20Loop%20Bridge.md) | How Lilux uses MCP infrastructure | | [**Semantic Routing at Scale**](../../kiwi-fine-tune/Semantic%20Routing%20at%20Scale%20-%20Intent%20Discovery%20Layer.md) ⚡ | Scaling to infinite directives | | [**Multi-Net Architecture**](../../kiwi-fine-tune/Multi-Net%20Agent%20Architecture.md) | Distributed intelligence layers | | [Dual-Model Architecture](../../kiwi-fine-tune/Dual-Model-Architecture-Overview.md) | Edge router + cloud reasoning | | [Why FunctionGemma](../../kiwi-fine-tune/Why%20FunctionGemma%20for%20Tool%20Routing.md) | Model selection rationale | | [Training FunctionGemma](../../kiwi-fine-tune/Training%20FunctionGemma%20for%20Kiwi%20MCP.md) | Fine-tuning your router | | [**Training the Orchestrator**](../../kiwi-fine-tune/Fine-Tuning%20the%20Reasoning%20Orchestrator.md) | Fine-tuning the reasoning brain | | [Streaming Architecture](../../kiwi-fine-tune/Streaming%20Architecture%20%26%20Concurrent%20Execution.md) | Concurrent token generation | | [Edge Deployment](../../kiwi-fine-tune/Deployment%20Guide%20-%20Edge%20Device%20Implementation.md) | Deploy to devices | | [Integration Patterns](../../kiwi-fine-tune/Integration%20Patterns%20-%20Connecting%20All%20Components.md) | Connecting everything | --- ## Appendix: Glossary | Term | Definition | | ---------------- | -------------------------------------------------------------- | | **Directive** | A natural language program that instructs AI agents | | **Script** | A deterministic Python script that executes actual work | | **Knowledge** | Persistent information that informs AI decisions | | **Anneal** | To improve a directive by learning from failure | | **Router** | A small, fast model that translates intent to tool calls | | **Orchestrator** | The high-reasoning model that plans, converses, and synthesizes | | **Subagent** | A spawned AI process with isolated context | | **MCP** | Model Context Protocol—the standard for AI tool integration | | **Registry** | Centralized package repository for sharing AI workflows | | **Edge** | Computing that happens on your local device | | **DOE** | Directive-Orchestration-Execution framework | | **Dual-Brain** | Architecture with fast router + slow reasoning model in parallel | --- ## Appendix: The Manifesto **We believe:** 1. **AI should run everywhere, not just in the cloud.** Your phone, your laptop, your car. Intelligence should be local. 2. **AI should learn from every interaction.** Systems that don't improve are wasting experience. 3. **AI workflows should be shareable like software.** What works for one should work for all. 4. **AI should understand intent, not syntax.** Natural language is the universal interface. 5. **Privacy is not optional.** Your thoughts, your device, your control. 6. **Speed matters.** Humans shouldn't wait for machines to think. 7. **Cost should approach zero.** Intelligence should be as cheap as computation. 8. **AI should compose like Unix pipes.** Simple tools, infinite combinations. 9. **The best AI is invisible.** Technology that disappears into usefulness. 10. **We're building the future.** Not waiting for it. --- _"In the beginning was the command line. Now there is the prompt line."_ _"Everything is a file" becomes "Everything is a prompt."_ _"Do one thing well" becomes "Direct one thing well."_ **This is Lilux. This is the AI-native operating system. This is the seed.** 🐧✨ --- _Document generated: 2026-01-17_ _Version: 0.2.0-expanded_ _Status: Vision Document (Extended with Hardware Layer)_ _Authors: Kiwi MCP Team_ _License: Open Vision - Build Upon This_

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/leolilley/kiwi-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

LILUX_VISION.md•71.1 KiB