Congo River Compositional Intelligence

Congo
docs

YAGO-INTEGRATION.md•11.4 KiB

# YAGO Knowledge Base Integration **Status:** ✅ Implemented (Phase 1) **Version:** YAGO 4.5 Core Subset This document describes how YAGO (Yet Another Great Ontology) is integrated into Congo River as the base/default ontology for the neuro-symbolic reasoning system. --- ## Overview YAGO 4.5 provides **50+ million entities** and **2+ billion facts** from Wikidata, with 95% accuracy. Congo River uses YAGO as the foundational world knowledge layer, enabling: - **Grounded reasoning**: LLM queries backed by factual knowledge - **Compositional intelligence**: User concepts compose with world knowledge - **Proof traces**: Reasoning chains traceable to YAGO facts - **Neuro-symbolic queries**: Hybrid LLM + knowledge graph operations --- ## Architecture ``` ┌─────────────────────────────────────────────────┐ │ Congo River MCP Server │ │ │ │ ┌────────────────────────────────────────┐ │ │ │ Neuro-Symbolic Query Tool │ │ │ │ (Combines LLM + Knowledge Graph) │ │ │ └──────────────┬──────────────────────────┘ │ │ │ │ │ ┌──────────────▼──────────────┐ │ │ │ Knowledge Graph (triples) │ │ │ │ │ │ │ │ ┌─────────────────────────┐ │ │ │ │ │ User-Generated Triples │ │ │ │ │ │ context: 'user' │ │ │ │ │ └─────────────────────────┘ │ │ │ │ │ │ │ │ ┌─────────────────────────┐ │ │ │ │ │ YAGO Base Ontology │ │ │ │ │ │ context: 'yago-*' │ │ │ │ │ │ - yago-schema │ │ │ │ │ │ - yago-taxonomy │ │ │ │ │ │ - yago-facts │ │ │ │ │ └─────────────────────────┘ │ │ │ └──────────────────────────────┘ │ └─────────────────────────────────────────────────┘ │ ┌─────────────▼─────────────┐ │ PostgreSQL + pgvector │ │ - triples table │ │ - yago_imports tracking │ │ - yago_entity_map │ └────────────────────────────┘ ``` --- ## Quick Start ### 1. Download YAGO 4.5 Core Files ```bash cd /home/mdz-axolotl/ClaudeCode/congo-river-mcp # Download YAGO 4.5 core subset (schema, taxonomy, facts) npm run yago:download ``` **Downloads:** - `yago-schema.ttl` - Property definitions and constraints - `yago-taxonomy.ttl` - Class hierarchy (schema.org) - `yago-facts.ttl` - Entities with English Wikipedia pages **Note:** Files are downloaded to `data/yago/` directory. ### 2. Import into Database ```bash # Import all files (schema, taxonomy, facts) npm run yago:import # Or import specific subset npm run yago:import -- --subset taxonomy # Or test without importing (dry run) npm run yago:import:dry-run ``` ### 3. One-Command Setup ```bash # Download + Import in one command npm run yago:setup ``` --- ## Database Schema ### Core Tables **`triples`** - Main RDF storage (existing) ```sql CREATE TABLE triples ( subject TEXT NOT NULL, predicate TEXT NOT NULL, object TEXT NOT NULL, context TEXT, -- 'yago-schema', 'yago-taxonomy', 'yago-facts' source TEXT, -- 'yago' confidence FLOAT, -- YAGO confidence scores created_at TIMESTAMP, metadata JSONB ); ``` **`yago_imports`** - Import tracking ```sql CREATE TABLE yago_imports ( id UUID PRIMARY KEY, version TEXT, -- '4.5' subset TEXT, -- 'taxonomy', 'schema', 'facts' file_name TEXT, context TEXT, -- Context used in triples table triple_count INTEGER, import_duration_ms INTEGER, success BOOLEAN, created_at TIMESTAMP ); ``` **`yago_entity_map`** - Entity metadata ```sql CREATE TABLE yago_entity_map ( id UUID PRIMARY KEY, yago_uri TEXT UNIQUE, yago_label TEXT, entity_type TEXT, wikipedia_title TEXT, concept_node_id UUID ); ``` ### Utility Functions ```sql -- Get import summary SELECT * FROM get_yago_summary(); -- Find entities by label SELECT * FROM find_yago_entities('consciousness', 10); -- View YAGO statistics SELECT * FROM yago_stats; -- Query YAGO triples only SELECT * FROM yago_triples WHERE subject LIKE '%Entity%'; ``` --- ## Usage Examples ### From Congo River MCP Tools Once YAGO is imported, it's automatically available to all reasoning tools: #### 1. **Triple Decomposition** (adds to YAGO knowledge) ```typescript triple_decomposition({ concept: "Consciousness relates to neural activity", store_in_db: true, context: "user" // Separate from YAGO facts }) ``` #### 2. **Graph Query** (queries YAGO + user knowledge) ```typescript graph_query({ query: "Find all properties of consciousness", contexts: ["yago-facts", "user"] // Query both contexts }) ``` #### 3. **Neuro-Symbolic Query** ⭐ (the showcase) ```typescript neuro_symbolic_query({ query: "How does consciousness relate to brain structures in neuroscience?", include_proof: true }) // This queries YAGO for neuroscience facts + uses LLM reasoning // Returns answer WITH proof trace showing YAGO sources ``` #### 4. **Proof Search** (uses YAGO as premises) ```typescript proof_search({ goal: "Consciousness involves the brain", premises: ["auto"], // Automatically pulls from YAGO method: "forward_chaining" }) ``` ### From SQL/Database ```sql -- Count YAGO triples by context SELECT context, COUNT(*) as count FROM triples WHERE source = 'yago' GROUP BY context; -- Find specific entities SELECT subject, predicate, object FROM triples WHERE context = 'yago-facts' AND subject LIKE '%http://yago-knowledge.org/resource/Consciousness%' LIMIT 10; -- Get YAGO import history SELECT subset, triple_count, import_duration_ms / 1000.0 as duration_seconds, success, import_end_time FROM yago_imports ORDER BY import_end_time DESC; ``` --- ## Context Separation Strategy Congo River uses **context fields** to separate knowledge sources: | Context | Purpose | Source | Mutable | |---------|---------|--------|---------| | `yago-schema` | Property definitions, constraints | YAGO 4.5 | No | | `yago-taxonomy` | Class hierarchy (schema.org) | YAGO 4.5 | No | | `yago-facts` | World knowledge (entities, facts) | YAGO 4.5 | No | | `user` | User-generated concepts/triples | `triple_decomposition` tool | Yes | | `derived` | Inferred knowledge from reasoning | `proof_search` tool | Yes | **Why separate?** - Query specific knowledge sources - Prevent pollution of YAGO base knowledge - Track provenance of facts - Enable selective updates --- ## Import Options ### Command-Line Options ```bash # Import specific subset npm run yago:import -- --subset taxonomy npm run yago:import -- --subset schema npm run yago:import -- --subset facts npm run yago:import -- --subset all # Import specific file npm run yago:import -- --file yago-taxonomy.ttl # Dry run (parse but don't insert) npm run yago:import:dry-run # With verbose output npm run yago:import -- --verbose ``` ### Programmatic Import ```typescript import { YAGOImporter } from './scripts/import-yago.js'; const importer = new YAGOImporter(); await importer.import({ subset: 'taxonomy', dryRun: false, verbose: true }); ``` --- ## Performance Considerations ### Import Performance - **Batch size**: 1000 triples per insert - **Parallel inserts**: Up to 10 concurrent connections - **Expected time**: - Taxonomy: ~30 seconds - Schema: ~1 minute - Facts: Varies by size (can be hours for full dataset) ### Query Performance YAGO integration includes optimized indexes: ```sql -- Fast lookups on subject/predicate/object CREATE INDEX idx_triples_spo ON triples(subject, predicate, object); -- Fast context filtering CREATE INDEX idx_triples_context ON triples(context); -- Fast source filtering CREATE INDEX idx_triples_source ON triples(source); ``` **Query Tips:** - Always filter by `context` when possible - Use `LIMIT` for large result sets - Consider `yago_triples` view for YAGO-only queries --- ## Troubleshooting ### Download Issues ```bash # Manual download if script fails cd data/yago curl -L -O https://yago-knowledge.org/data/yago4.5/yago-schema.ttl curl -L -O https://yago-knowledge.org/data/yago4.5/yago-taxonomy.ttl curl -L -O https://yago-knowledge.org/data/yago4.5/yago-facts.ttl ``` ### Import Errors **"Database connection failed"** - Check `.env` has `CLOUD_DB_URL` or `DATABASE_URL` - Verify database is accessible **"Migration errors"** - Migrations may already be applied (safe to ignore warnings) - Check `schema_version` table: `SELECT * FROM schema_version;` **"Out of memory"** - Import subsets separately instead of `--subset all` - Reduce `BATCH_SIZE` in `import-yago.ts` ### Verification ```bash # Check import status psql $CLOUD_DB_URL -c "SELECT * FROM yago_imports;" # Count triples by context psql $CLOUD_DB_URL -c " SELECT context, COUNT(*) as count FROM triples WHERE source = 'yago' GROUP BY context;" # Get summary psql $CLOUD_DB_URL -c "SELECT * FROM get_yago_summary();" ``` --- ## Future Enhancements ### Phase 2: Extended YAGO Integration - [ ] **Full YAGO 4.5 dataset** (beyond English Wikipedia) - [ ] **YAGO 4** (2B triples from full Wikidata) - [ ] **Temporal facts** (time-stamped knowledge) - [ ] **Entity embeddings** (pgvector integration) - [ ] **Incremental updates** (track YAGO changes) ### Phase 3: Advanced Features - [ ] **YAGO entity search** (full-text search on labels) - [ ] **SPARQL endpoint** (standard RDF query interface) - [ ] **Entity linking** (map user concepts to YAGO entities) - [ ] **Confidence propagation** (track certainty through reasoning) --- ## Related Documentation - **YAGO Official Site**: https://yago-knowledge.org - **YAGO 4.5 Downloads**: https://yago-knowledge.org/downloads/yago-4-5 - **Schema.org**: https://schema.org (YAGO's property vocabulary) - **Congo River README**: `README.md` - **Database Schema**: `src/db/schema.sql` - **Migration**: `src/db/migrations/002-yago-integration.sql` --- ## Support For issues or questions: 1. Check import logs: `SELECT * FROM yago_imports WHERE success = false;` 2. Review YAGO stats: `SELECT * FROM yago_stats;` 3. Test with dry run: `npm run yago:import:dry-run` 4. Open issue on GitHub (if applicable) --- **🌊 YAGO gives Congo River its massive tributaries - world knowledge flowing into compositional intelligence.**

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Replicant-Partners/Congo'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

YAGO-INTEGRATION.md•11.4 KiB