# PT-MCP Implementation Plan
## YAGO 4.5 & Schema.org Integration
> **"Where am I now?"** - Based on proven patterns from Ludwig neurosymbolic system
## Executive Summary
PT-MCP will integrate YAGO 4.5 and Schema.org using **battle-tested patterns** discovered in the Ludwig system (`/home/mdz-axolotl/ClaudeCode/Ludwig/`). Ludwig provides production-ready code for:
- YAGO entity resolution with SPARQL
- Schema.org property mapping
- RDF triple storage and querying
- Confidence-based auto-linking
- Semantic enrichment workflows
## Architecture: Three-Layer Semantic Stack
```
┌─────────────────────────────────────────────────────┐
│ PT-MCP Server (Model Context Protocol) │
├─────────────────────────────────────────────────────┤
│ Layer 1: Code Analysis (✅ Implemented) │
│ - File structure & language detection │
│ - Entry point identification │
│ - Package dependency analysis │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Layer 2: Semantic Enrichment (🚧 To Implement) │
│ ┌─────────────────────┬───────────────────────┐ │
│ │ YAGO Resolver │ Schema.org Mapper │ │
│ │ - Entity linking │ - Type classification │ │
│ │ - Fact retrieval │ - Property extraction │ │
│ │ - SPARQL queries │ - JSON-LD generation │ │
│ └─────────────────────┴───────────────────────┘ │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Layer 3: Knowledge Graph (🚧 To Implement) │
│ - Triple store (SQLite) │
│ - RDF relationships │
│ - Ontology cache │
│ - Query interface │
└─────────────────────────────────────────────────────┘
```
## Phase 1: Foundation (Week 1-2)
### 1.1 Dependencies
**Add to `package.json`**:
```json
{
"dependencies": {
"rdflib": "^2.2.34",
"n3": "^1.17.2",
"sparqljs": "^3.7.1",
"sparql-http-client": "^2.4.1",
"jsonld": "^8.3.1",
"better-sqlite3": "^9.2.2"
}
}
```
### 1.2 Database Schema
**Create: `src/database/schema.sql`**
```sql
-- Core entities from codebase analysis
CREATE TABLE entities (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
type TEXT NOT NULL, -- 'package', 'class', 'function', 'framework', etc.
source_file TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
metadata JSON
);
CREATE INDEX idx_entities_name ON entities(name);
CREATE INDEX idx_entities_type ON entities(type);
-- YAGO mappings
CREATE TABLE yago_mappings (
entity_id INTEGER PRIMARY KEY,
yago_uri TEXT NOT NULL UNIQUE,
yago_type TEXT, -- Schema.org type from YAGO
confidence REAL CHECK(confidence >= 0 AND confidence <= 1),
facts JSON, -- YAGO facts as key-value pairs
cached_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (entity_id) REFERENCES entities(id)
);
CREATE INDEX idx_yago_uri ON yago_mappings(yago_uri);
-- Schema.org annotations
CREATE TABLE schema_annotations (
entity_id INTEGER PRIMARY KEY,
schema_type TEXT NOT NULL, -- e.g., 'WebApplication', 'SoftwareLibrary'
properties JSON, -- Schema.org properties
context_url TEXT DEFAULT 'https://schema.org',
FOREIGN KEY (entity_id) REFERENCES entities(id)
);
-- Ontology cache (from YAGO taxonomy)
CREATE TABLE ontology_classes (
class_uri TEXT PRIMARY KEY,
label TEXT,
description TEXT,
parent_class TEXT,
source TEXT DEFAULT 'yago', -- 'yago' or 'schema.org'
cached_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- SPARQL query cache
CREATE TABLE sparql_cache (
query_hash TEXT PRIMARY KEY,
query TEXT NOT NULL,
result JSON NOT NULL,
cached_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP
);
CREATE INDEX idx_sparql_expires ON sparql_cache(expires_at);
```
### 1.3 Directory Structure
```
src/
├── database/
│ ├── schema.sql # Database schema
│ ├── connection.ts # SQLite connection manager
│ └── migrations.ts # Schema migrations
├── services/
│ ├── yago-resolver.ts # YAGO entity resolution
│ ├── schema-mapper.ts # Schema.org mapping
│ ├── sparql-client.ts # SPARQL query execution
│ ├── rdf-parser.ts # RDF/Turtle parsing
│ └── ontology-cache.ts # Taxonomy caching
├── tools/
│ ├── enrich-context.ts # NEW: Context enrichment tool
│ └── query-knowledge.ts # NEW: Knowledge graph queries
└── types/
├── yago.ts # YAGO types
└── schema-org.ts # Schema.org types
```
## Phase 2: YAGO Integration (Week 3-4)
### 2.1 YAGO Resolver Service
**Create: `src/services/yago-resolver.ts`**
```typescript
import { SPARQLClient } from './sparql-client.js';
import { Database } from './database/connection.js';
export interface YAGOEntity {
uri: string;
label: string;
type: string; // Schema.org type
description?: string;
facts: Record<string, string[]>;
}
export class YAGOResolver {
private sparql: SPARQLClient;
private db: Database;
private cacheT TL = 30 * 24 * 60 * 60 * 1000; // 30 days
constructor(db: Database) {
this.db = db;
this.sparql = new SPARQLClient({
endpoint: 'https://yago-knowledge.org/sparql',
fallback: 'https://query.wikidata.org/sparql'
});
}
/**
* Resolve entity name to YAGO URI(s)
* Based on Ludwig's yago_client.py
*/
async resolveEntity(name: string): Promise<YAGOEntity[]> {
// Check cache first
const cached = this.getCachedMapping(name);
if (cached) return cached;
// SPARQL query pattern from Ludwig
const query = `
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?entity ?label ?type ?description
WHERE {
?entity rdfs:label "${name}"@en .
?entity a ?type .
OPTIONAL { ?entity schema:description ?description }
FILTER(STRSTARTS(STR(?type), "http://schema.org/"))
}
LIMIT 10
`;
const results = await this.sparql.query(query);
const entities = results.map(r => this.parseEntity(r));
// Cache results
this.cacheEntities(name, entities);
return entities;
}
/**
* Get entity facts (properties and values)
* Based on Ludwig's get_entity_facts()
*/
async getEntityFacts(entityUri: string): Promise<Record<string, string[]>> {
const query = `
PREFIX schema: <http://schema.org/>
SELECT ?property ?value
WHERE {
<${entityUri}> ?property ?value .
FILTER(STRSTARTS(STR(?property), "http://schema.org/"))
}
`;
const results = await this.sparql.query(query);
// Group by property
const facts: Record<string, string[]> = {};
for (const row of results) {
const prop = row.property;
const value = row.value;
if (!facts[prop]) facts[prop] = [];
facts[prop].push(value);
}
return facts;
}
/**
* Link entity with confidence scoring
* Confidence: 1.0 = exact match, 0.7 = partial match
*/
async linkEntity(entityId: number, name: string): Promise<void> {
const candidates = await this.resolveEntity(name);
for (const candidate of candidates) {
const confidence = this.calculateConfidence(name, candidate);
if (confidence >= 0.9) {
// Auto-link high confidence
await this.db.execute(`
INSERT INTO yago_mappings (entity_id, yago_uri, yago_type, confidence, facts)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT(entity_id) DO UPDATE SET
yago_uri = excluded.yago_uri,
yago_type = excluded.yago_type,
confidence = excluded.confidence,
facts = excluded.facts,
cached_at = CURRENT_TIMESTAMP
`, [entityId, candidate.uri, candidate.type, confidence, JSON.stringify(candidate.facts)]);
break; // Take first high-confidence match
}
}
}
private calculateConfidence(name: string, entity: YAGOEntity): number {
const nameLower = name.toLowerCase();
const labelLower = entity.label.toLowerCase();
if (nameLower === labelLower) return 1.0;
if (labelLower.includes(nameLower) || nameLower.includes(labelLower)) return 0.7;
return 0.5;
}
private getCachedMapping(name: string): YAGOEntity[] | null {
// Check cache with TTL
const result = this.db.query(`
SELECT * FROM yago_mappings ym
JOIN entities e ON e.id = ym.entity_id
WHERE e.name = ?
AND ym.cached_at > datetime('now', '-30 days')
`, [name]);
if (result.length > 0) {
return result.map(r => ({
uri: r.yago_uri,
label: r.name,
type: r.yago_type,
facts: JSON.parse(r.facts)
}));
}
return null;
}
private cacheEntities(name: string, entities: YAGOEntity[]): void {
// Implementation of cache storage
}
private parseEntity(row: any): YAGOEntity {
return {
uri: row.entity,
label: row.label,
type: row.type,
description: row.description,
facts: {}
};
}
}
```
### 2.2 SPARQL Client
**Create: `src/services/sparql-client.ts`**
```typescript
import fetch from 'node-fetch';
export interface SPARQLClientConfig {
endpoint: string;
fallback?: string;
timeout?: number;
}
export class SPARQLClient {
private config: SPARQLClientConfig;
constructor(config: SPARQLClientConfig) {
this.config = {
timeout: 30000,
...config
};
}
async query(sparql: string): Promise<any[]> {
try {
return await this.executeQuery(this.config.endpoint, sparql);
} catch (error) {
if (this.config.fallback) {
console.warn(`Primary endpoint failed, trying fallback: ${error.message}`);
return await this.executeQuery(this.config.fallback, sparql);
}
throw error;
}
}
private async executeQuery(endpoint: string, sparql: string): Promise<any[]> {
const params = new URLSearchParams({
query: sparql,
format: 'json'
});
const response = await fetch(`${endpoint}?${params}`, {
method: 'GET',
headers: {
'Accept': 'application/sparql-results+json'
},
signal: AbortSignal.timeout(this.config.timeout)
});
if (!response.ok) {
throw new Error(`SPARQL query failed: ${response.statusText}`);
}
const data = await response.json();
return data.results.bindings.map((b: any) => {
const row: any = {};
for (const [key, value] of Object.entries(b)) {
row[key] = (value as any).value;
}
return row;
});
}
}
```
## Phase 3: Schema.org Integration (Week 5-6)
### 3.1 Schema.org Mapper
**Create: `src/services/schema-mapper.ts`**
```typescript
/**
* Maps codebase entities to Schema.org types
* Based on Ludwig's schema_mapper.py
*/
export const CODEBASE_TO_SCHEMA: Record<string, string> = {
// Application types
'web-app': 'schema:WebApplication',
'mobile-app': 'schema:MobileApplication',
'api': 'schema:WebAPI',
'library': 'schema:SoftwareLibrary',
'package': 'schema:SoftwareLibrary',
'framework': 'schema:SoftwareApplication',
// Document types
'documentation': 'schema:TechArticle',
'tutorial': 'schema:HowTo',
'readme': 'schema:TechArticle',
'guide': 'schema:HowTo',
// Code elements
'source-file': 'schema:SoftwareSourceCode',
'test-suite': 'schema:SoftwareTest',
'database': 'schema:Dataset',
};
export const PROPERTY_MAPPINGS: Record<string, string> = {
'dependencies': 'schema:softwareRequirements',
'version': 'schema:softwareVersion',
'authors': 'schema:author',
'maintainers': 'schema:maintainer',
'license': 'schema:license',
'description': 'schema:description',
'url': 'schema:url',
'repository': 'schema:codeRepository',
'programmingLanguage': 'schema:programmingLanguage',
'runtimePlatform': 'schema:runtimePlatform',
};
export class SchemaMapper {
/**
* Map entity to Schema.org type
*/
mapType(entityType: string): string {
return CODEBASE_TO_SCHEMA[entityType] || 'schema:SoftwareApplication';
}
/**
* Generate Schema.org JSON-LD annotation
*/
generateAnnotation(entity: any, analysis: any): object {
const schemaType = this.mapType(entity.type);
const annotation: any = {
'@context': 'https://schema.org',
'@type': schemaType.replace('schema:', ''),
'name': entity.name,
};
// Map properties
if (analysis.packageInfo) {
const pkg = analysis.packageInfo;
if (pkg.version) annotation['softwareVersion'] = pkg.version;
if (pkg.description) annotation['description'] = pkg.description;
if (pkg.license) annotation['license'] = pkg.license;
// Dependencies as software requirements
if (pkg.dependencies) {
annotation['softwareRequirements'] = Object.keys(pkg.dependencies);
}
}
// Programming languages from analysis
if (analysis.languages) {
annotation['programmingLanguage'] = Object.keys(analysis.languages);
}
return annotation;
}
/**
* Bidirectional property mapping
*/
toPT MCP(schemaProperty: string): string {
for (const [key, value] of Object.entries(PROPERTY_MAPPINGS)) {
if (value === schemaProperty) return key;
}
return schemaProperty;
}
toSchema(ptmcpProperty: string): string {
return PROPERTY_MAPPINGS[ptmcpProperty] || ptmcpProperty;
}
}
```
## Phase 4: New MCP Tools (Week 7-8)
### 4.1 Enrich Context Tool
**Create: `src/tools/enrich-context.ts`**
```typescript
import { YAGOResolver } from '../services/yago-resolver.js';
import { SchemaMapper } from '../services/schema-mapper.js';
import { Database } from '../database/connection.js';
interface EnrichContextArgs {
path: string;
analysis_result?: any;
enrichment_level?: 'minimal' | 'standard' | 'comprehensive';
include_yago?: boolean;
include_schema?: boolean;
}
export async function enrichContext(args: EnrichContextArgs) {
const {
path,
analysis_result,
enrichment_level = 'standard',
include_yago = true,
include_schema = true
} = args;
// Get codebase analysis if not provided
let analysis = analysis_result;
if (!analysis) {
const analyzeCodebase = await import('./analyze-codebase.js');
const result = await analyzeCodebase.analyzeCodebase({ path });
analysis = JSON.parse(result.content[0].text);
}
const db = new Database();
const yagoResolver = new YAGOResolver(db);
const schemaMapper = new SchemaMapper();
// Extract entities from analysis
const entities = extractEntities(analysis);
// YAGO enrichment
const yagoEnrichment = [];
if (include_yago) {
for (const entity of entities) {
const yagoEntities = await yagoResolver.resolveEntity(entity.name);
for (const yagoEntity of yagoEntities) {
const facts = await yagoResolver.getEntityFacts(yagoEntity.uri);
yagoEnrichment.push({
entity: entity.name,
yago_uri: yagoEntity.uri,
type: yagoEntity.type,
facts: facts
});
}
}
}
// Schema.org annotation
let schemaAnnotation = null;
if (include_schema) {
schemaAnnotation = schemaMapper.generateAnnotation(
{ name: analysis.packageInfo?.name || 'Unknown', type: 'web-app' },
analysis
);
}
return {
content: [{
type: 'text',
text: JSON.stringify({
codebase_context: analysis,
knowledge_graph: {
yago_entities: yagoEnrichment,
schema_annotations: schemaAnnotation
},
enrichment_level,
timestamp: new Date().toISOString()
}, null, 2)
}]
};
}
function extractEntities(analysis: any): Array<{name: string, type: string}> {
const entities: Array<{name: string, type: string}> = [];
// Extract from package dependencies
if (analysis.packageInfo?.dependencies) {
for (const dep of Object.keys(analysis.packageInfo.dependencies)) {
entities.push({ name: dep, type: 'library' });
}
}
// Extract from detected languages
if (analysis.languages) {
for (const lang of Object.keys(analysis.languages)) {
entities.push({ name: lang, type: 'programming-language' });
}
}
return entities;
}
```
### 4.2 Register New Tool
**Update: `src/index.ts`**
Add to the `ListToolsRequestSchema` handler:
```typescript
{
name: "enrich_context",
description: "Enrich codebase context with YAGO knowledge graph and Schema.org annotations",
inputSchema: {
type: "object",
properties: {
path: { type: "string", description: "Root directory path" },
analysis_result: { type: "object", description: "Previous analysis result (optional)" },
enrichment_level: {
type: "string",
enum: ["minimal", "standard", "comprehensive"],
description: "Level of semantic enrichment",
default: "standard"
},
include_yago: { type: "boolean", description: "Include YAGO entities", default: true },
include_schema: { type: "boolean", description: "Include Schema.org annotations", default: true }
},
required: ["path"]
}
}
```
## Success Metrics
1. **Entity Resolution**: >90% of common packages/frameworks linked to YAGO
2. **Query Performance**: <2s for SPARQL queries (with caching)
3. **Cache Hit Rate**: >80% for repeated entities
4. **Schema Coverage**: Support 20+ codebase types initially
5. **Fact Accuracy**: >95% of YAGO facts are relevant to context
## Testing Strategy
```typescript
// Example test
describe('YAGOResolver', () => {
it('should resolve React to YAGO entity', async () => {
const resolver = new YAGOResolver(db);
const entities = await resolver.resolveEntity('React');
expect(entities.length).toBeGreaterThan(0);
expect(entities[0].type).toContain('JavaScriptLibrary');
expect(entities[0].facts['schema:programmingLanguage']).toContain('JavaScript');
});
});
```
## Next Steps
1. ✅ Review Ludwig code patterns
2. 📋 Implement SPARQL client with fallback
3. 📋 Create YAGO resolver with caching
4. 📋 Build Schema.org mapper
5. 📋 Add enrich_context MCP tool
6. 📋 Write comprehensive tests
7. 📋 Optimize query performance
8. 📋 Document usage examples
---
**Status**: Ready for implementation
**Estimated Time**: 8 weeks
**Risk Level**: Low (proven patterns from Ludwig)