Skip to main content
Glama
Replicant-Partners

Congo River Compositional Intelligence

YAGO-INTEGRATION.md11.6 kB
# YAGO Knowledge Base Integration **Status:** ✅ Implemented (Phase 1) **Version:** YAGO 4.5 Core Subset This document describes how YAGO (Yet Another Great Ontology) is integrated into Congo River as the base/default ontology for the neuro-symbolic reasoning system. --- ## Overview YAGO 4.5 provides **50+ million entities** and **2+ billion facts** from Wikidata, with 95% accuracy. Congo River uses YAGO as the foundational world knowledge layer, enabling: - **Grounded reasoning**: LLM queries backed by factual knowledge - **Compositional intelligence**: User concepts compose with world knowledge - **Proof traces**: Reasoning chains traceable to YAGO facts - **Neuro-symbolic queries**: Hybrid LLM + knowledge graph operations --- ## Architecture ``` ┌─────────────────────────────────────────────────┐ │ Congo River MCP Server │ │ │ │ ┌────────────────────────────────────────┐ │ │ │ Neuro-Symbolic Query Tool │ │ │ │ (Combines LLM + Knowledge Graph) │ │ │ └──────────────┬──────────────────────────┘ │ │ │ │ │ ┌──────────────▼──────────────┐ │ │ │ Knowledge Graph (triples) │ │ │ │ │ │ │ │ ┌─────────────────────────┐ │ │ │ │ │ User-Generated Triples │ │ │ │ │ │ context: 'user' │ │ │ │ │ └─────────────────────────┘ │ │ │ │ │ │ │ │ ┌─────────────────────────┐ │ │ │ │ │ YAGO Base Ontology │ │ │ │ │ │ context: 'yago-*' │ │ │ │ │ │ - yago-schema │ │ │ │ │ │ - yago-taxonomy │ │ │ │ │ │ - yago-facts │ │ │ │ │ └─────────────────────────┘ │ │ │ └──────────────────────────────┘ │ └─────────────────────────────────────────────────┘ │ ┌─────────────▼─────────────┐ │ PostgreSQL + pgvector │ │ - triples table │ │ - yago_imports tracking │ │ - yago_entity_map │ └────────────────────────────┘ ``` --- ## Quick Start ### 1. Download YAGO 4.5 Core Files ```bash cd /home/mdz-axolotl/ClaudeCode/congo-river-mcp # Download YAGO 4.5 core subset (schema, taxonomy, facts) npm run yago:download ``` **Downloads:** - `yago-schema.ttl` - Property definitions and constraints - `yago-taxonomy.ttl` - Class hierarchy (schema.org) - `yago-facts.ttl` - Entities with English Wikipedia pages **Note:** Files are downloaded to `data/yago/` directory. ### 2. Import into Database ```bash # Import all files (schema, taxonomy, facts) npm run yago:import # Or import specific subset npm run yago:import -- --subset taxonomy # Or test without importing (dry run) npm run yago:import:dry-run ``` ### 3. One-Command Setup ```bash # Download + Import in one command npm run yago:setup ``` --- ## Database Schema ### Core Tables **`triples`** - Main RDF storage (existing) ```sql CREATE TABLE triples ( subject TEXT NOT NULL, predicate TEXT NOT NULL, object TEXT NOT NULL, context TEXT, -- 'yago-schema', 'yago-taxonomy', 'yago-facts' source TEXT, -- 'yago' confidence FLOAT, -- YAGO confidence scores created_at TIMESTAMP, metadata JSONB ); ``` **`yago_imports`** - Import tracking ```sql CREATE TABLE yago_imports ( id UUID PRIMARY KEY, version TEXT, -- '4.5' subset TEXT, -- 'taxonomy', 'schema', 'facts' file_name TEXT, context TEXT, -- Context used in triples table triple_count INTEGER, import_duration_ms INTEGER, success BOOLEAN, created_at TIMESTAMP ); ``` **`yago_entity_map`** - Entity metadata ```sql CREATE TABLE yago_entity_map ( id UUID PRIMARY KEY, yago_uri TEXT UNIQUE, yago_label TEXT, entity_type TEXT, wikipedia_title TEXT, concept_node_id UUID ); ``` ### Utility Functions ```sql -- Get import summary SELECT * FROM get_yago_summary(); -- Find entities by label SELECT * FROM find_yago_entities('consciousness', 10); -- View YAGO statistics SELECT * FROM yago_stats; -- Query YAGO triples only SELECT * FROM yago_triples WHERE subject LIKE '%Entity%'; ``` --- ## Usage Examples ### From Congo River MCP Tools Once YAGO is imported, it's automatically available to all reasoning tools: #### 1. **Triple Decomposition** (adds to YAGO knowledge) ```typescript triple_decomposition({ concept: "Consciousness relates to neural activity", store_in_db: true, context: "user" // Separate from YAGO facts }) ``` #### 2. **Graph Query** (queries YAGO + user knowledge) ```typescript graph_query({ query: "Find all properties of consciousness", contexts: ["yago-facts", "user"] // Query both contexts }) ``` #### 3. **Neuro-Symbolic Query** ⭐ (the showcase) ```typescript neuro_symbolic_query({ query: "How does consciousness relate to brain structures in neuroscience?", include_proof: true }) // This queries YAGO for neuroscience facts + uses LLM reasoning // Returns answer WITH proof trace showing YAGO sources ``` #### 4. **Proof Search** (uses YAGO as premises) ```typescript proof_search({ goal: "Consciousness involves the brain", premises: ["auto"], // Automatically pulls from YAGO method: "forward_chaining" }) ``` ### From SQL/Database ```sql -- Count YAGO triples by context SELECT context, COUNT(*) as count FROM triples WHERE source = 'yago' GROUP BY context; -- Find specific entities SELECT subject, predicate, object FROM triples WHERE context = 'yago-facts' AND subject LIKE '%http://yago-knowledge.org/resource/Consciousness%' LIMIT 10; -- Get YAGO import history SELECT subset, triple_count, import_duration_ms / 1000.0 as duration_seconds, success, import_end_time FROM yago_imports ORDER BY import_end_time DESC; ``` --- ## Context Separation Strategy Congo River uses **context fields** to separate knowledge sources: | Context | Purpose | Source | Mutable | |---------|---------|--------|---------| | `yago-schema` | Property definitions, constraints | YAGO 4.5 | No | | `yago-taxonomy` | Class hierarchy (schema.org) | YAGO 4.5 | No | | `yago-facts` | World knowledge (entities, facts) | YAGO 4.5 | No | | `user` | User-generated concepts/triples | `triple_decomposition` tool | Yes | | `derived` | Inferred knowledge from reasoning | `proof_search` tool | Yes | **Why separate?** - Query specific knowledge sources - Prevent pollution of YAGO base knowledge - Track provenance of facts - Enable selective updates --- ## Import Options ### Command-Line Options ```bash # Import specific subset npm run yago:import -- --subset taxonomy npm run yago:import -- --subset schema npm run yago:import -- --subset facts npm run yago:import -- --subset all # Import specific file npm run yago:import -- --file yago-taxonomy.ttl # Dry run (parse but don't insert) npm run yago:import:dry-run # With verbose output npm run yago:import -- --verbose ``` ### Programmatic Import ```typescript import { YAGOImporter } from './scripts/import-yago.js'; const importer = new YAGOImporter(); await importer.import({ subset: 'taxonomy', dryRun: false, verbose: true }); ``` --- ## Performance Considerations ### Import Performance - **Batch size**: 1000 triples per insert - **Parallel inserts**: Up to 10 concurrent connections - **Expected time**: - Taxonomy: ~30 seconds - Schema: ~1 minute - Facts: Varies by size (can be hours for full dataset) ### Query Performance YAGO integration includes optimized indexes: ```sql -- Fast lookups on subject/predicate/object CREATE INDEX idx_triples_spo ON triples(subject, predicate, object); -- Fast context filtering CREATE INDEX idx_triples_context ON triples(context); -- Fast source filtering CREATE INDEX idx_triples_source ON triples(source); ``` **Query Tips:** - Always filter by `context` when possible - Use `LIMIT` for large result sets - Consider `yago_triples` view for YAGO-only queries --- ## Troubleshooting ### Download Issues ```bash # Manual download if script fails cd data/yago curl -L -O https://yago-knowledge.org/data/yago4.5/yago-schema.ttl curl -L -O https://yago-knowledge.org/data/yago4.5/yago-taxonomy.ttl curl -L -O https://yago-knowledge.org/data/yago4.5/yago-facts.ttl ``` ### Import Errors **"Database connection failed"** - Check `.env` has `CLOUD_DB_URL` or `DATABASE_URL` - Verify database is accessible **"Migration errors"** - Migrations may already be applied (safe to ignore warnings) - Check `schema_version` table: `SELECT * FROM schema_version;` **"Out of memory"** - Import subsets separately instead of `--subset all` - Reduce `BATCH_SIZE` in `import-yago.ts` ### Verification ```bash # Check import status psql $CLOUD_DB_URL -c "SELECT * FROM yago_imports;" # Count triples by context psql $CLOUD_DB_URL -c " SELECT context, COUNT(*) as count FROM triples WHERE source = 'yago' GROUP BY context;" # Get summary psql $CLOUD_DB_URL -c "SELECT * FROM get_yago_summary();" ``` --- ## Future Enhancements ### Phase 2: Extended YAGO Integration - [ ] **Full YAGO 4.5 dataset** (beyond English Wikipedia) - [ ] **YAGO 4** (2B triples from full Wikidata) - [ ] **Temporal facts** (time-stamped knowledge) - [ ] **Entity embeddings** (pgvector integration) - [ ] **Incremental updates** (track YAGO changes) ### Phase 3: Advanced Features - [ ] **YAGO entity search** (full-text search on labels) - [ ] **SPARQL endpoint** (standard RDF query interface) - [ ] **Entity linking** (map user concepts to YAGO entities) - [ ] **Confidence propagation** (track certainty through reasoning) --- ## Related Documentation - **YAGO Official Site**: https://yago-knowledge.org - **YAGO 4.5 Downloads**: https://yago-knowledge.org/downloads/yago-4-5 - **Schema.org**: https://schema.org (YAGO's property vocabulary) - **Congo River README**: `README.md` - **Database Schema**: `src/db/schema.sql` - **Migration**: `src/db/migrations/002-yago-integration.sql` --- ## Support For issues or questions: 1. Check import logs: `SELECT * FROM yago_imports WHERE success = false;` 2. Review YAGO stats: `SELECT * FROM yago_stats;` 3. Test with dry run: `npm run yago:import:dry-run` 4. Open issue on GitHub (if applicable) --- **🌊 YAGO gives Congo River its massive tributaries - world knowledge flowing into compositional intelligence.**

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Replicant-Partners/Congo'

If you have feedback or need assistance with the MCP directory API, please join our Discord server