TogoMCP

togo-mcp
docs

MIE_file_specs.md

MIE_file_specs.md•21.6 KiB

# MIE File Specification v1.0 ## 1. Overview ### 1.1 Purpose Metadata Interoperability Exchange (MIE) files provide compact, comprehensive documentation for RDF databases, enabling researchers to effectively query databases without exhaustive documentation bloat. ### 1.2 Design Philosophy **Essential over Exhaustive**: Documentation must be compact, clear, and complete—sufficient for effective querying without unnecessary content. ### 1.3 Format - **File Format**: YAML - **Encoding**: UTF-8 - **Extension**: `.yaml` - **Naming Convention**: `mie/[dbname].yaml` ## 2. File Structure ### 2.1 Required Sections MIE files MUST contain exactly nine sections in the following order: 1. `schema_info` - Database metadata 2. `shape_expressions` - ShEx schemas 3. `sample_rdf_entries` - Example RDF data 4. `sparql_query_examples` - Tested queries 5. `cross_references` - External database links 6. `architectural_notes` - Design patterns 7. `data_statistics` - Quantitative metrics 8. `anti_patterns` - Common mistakes 9. `common_errors` - Error scenarios ### 2.2 Section Dependencies - All sections are REQUIRED - Sections MUST appear in the specified order - No additional top-level sections permitted ## 3. Section Specifications ### 3.1 schema_info #### 3.1.1 Purpose Provides essential metadata about the RDF database. #### 3.1.2 Required Fields ```yaml schema_info: title: string # REQUIRED: Database name description: string # REQUIRED: 3-5 sentences covering: # - What it contains # - ALL main entity types # - Use cases # - Key features endpoint: uri # REQUIRED: SPARQL endpoint URL base_uri: uri # REQUIRED: Base namespace URI graphs: array<uri> # REQUIRED: List of named graph URIs version: # REQUIRED: Version metadata mie_version: string # REQUIRED: MIE spec version (e.g., "1.0") mie_created: date # REQUIRED: ISO 8601 format (YYYY-MM-DD) data_version: string # REQUIRED: Database version/release update_frequency: string # REQUIRED: Update schedule license: # REQUIRED: Licensing information data_license: string # REQUIRED: License name license_url: uri # REQUIRED: License URL access: # REQUIRED: Access constraints rate_limiting: string # REQUIRED: Query rate limits max_query_timeout: string # REQUIRED: Timeout duration backend: string # OPTIONAL: Triple store type (e.g., "Virtuoso") ``` #### 3.1.3 Constraints - `description` MUST be 3-5 sentences - `description` MUST document ALL major entity types - All URIs MUST be valid and accessible - `mie_created` MUST use ISO 8601 date format ### 3.2 shape_expressions #### 3.2.1 Purpose Defines ShEx (Shape Expressions) schemas for all entity types in the database. #### 3.2.2 Format ```yaml shape_expressions: | PREFIX declarations <EntityShape1> { property declarations } <EntityShape2> { property declarations } ``` #### 3.2.3 Requirements - MUST cover ALL major entity types discovered in the database - Comments MUST be minimal (only for non-obvious properties) - MUST use standard ShEx syntax - MUST include relevant PREFIX declarations #### 3.2.4 Constraints - No excessive commenting (comment only non-obvious properties) - Shape names MUST be descriptive (e.g., `<ProteinShape>`, `<CompoundShape>`) - MUST represent actual data patterns from the database ### 3.3 sample_rdf_entries #### 3.3.1 Purpose Provides representative RDF examples demonstrating data patterns. #### 3.3.2 Structure ```yaml sample_rdf_entries: - title: string # REQUIRED: Descriptive title description: string # REQUIRED: 1-2 sentences rdf: string # REQUIRED: Actual RDF from database ``` #### 3.3.3 Requirements - MUST contain EXACTLY 5 examples - Examples MUST cover diverse categories: 1. Core entity type 2. Related entity type 3. Sequence/molecular data 4. Cross-reference example 5. Geographic/temporal data (if applicable) - Each `description` MUST be 1-2 sentences - RDF MUST be actual data from the database (not fabricated) #### 3.3.4 Constraints - Total count: EXACTLY 5 examples - Description length: 1-2 sentences (not more) - RDF syntax MUST be valid Turtle or N-Triples ### 3.4 sparql_query_examples #### 3.4.1 Purpose Provides tested, working SPARQL queries demonstrating database usage. #### 3.4.2 Structure ```yaml sparql_query_examples: - title: string # REQUIRED: What the query does description: string # REQUIRED: Context and purpose question: string # REQUIRED: Natural language question complexity: enum # REQUIRED: basic | intermediate | advanced sparql: string # REQUIRED: Tested SPARQL query ``` #### 3.4.3 Requirements - MUST contain EXACTLY 7 queries with the following distribution: - 2 queries with `complexity: basic` - 3 queries with `complexity: intermediate` - 2 queries with `complexity: advanced` - MUST include at least one query with keyword filtering - MUST include at least one query with biological annotations (if applicable) - MUST NOT include cross-reference queries (those belong in `cross_references`) - ALL queries MUST be tested and confirmed working #### 3.4.4 Constraints - Total count: EXACTLY 7 queries - Complexity distribution: 2/3/2 (basic/intermediate/advanced) - All queries MUST execute without errors - Queries MUST use appropriate LIMIT clauses #### 3.4.5 Complexity Guidelines - **Basic**: Simple SELECT, single entity type, basic filters - **Intermediate**: Multiple entity types, OPTIONAL patterns, aggregations - **Advanced**: Complex joins, nested queries, sophisticated filtering ### 3.5 cross_references #### 3.5.1 Purpose Documents external database linkages organized by RDF pattern. #### 3.5.2 Structure ```yaml cross_references: - pattern: string # REQUIRED: RDF property pattern description: string # REQUIRED: How links work databases: # REQUIRED: Organized by category category_name: - database_name: coverage # Format: "Database (~XX%)" or similar sparql: string # REQUIRED: Representative query ``` #### 3.5.3 Requirements - Group by RDF pattern (e.g., `rdfs:seeAlso`, `owl:sameAs`, `skos:exactMatch`) - List ALL external databases found - Include coverage estimates where possible - Provide working SPARQL query for each pattern #### 3.5.4 Constraints - Do NOT create separate entries for each individual database - Organize by pattern, then categorize databases - Include coverage percentages when available - All SPARQL queries MUST be tested ### 3.6 architectural_notes #### 3.6.1 Purpose Documents design patterns, performance characteristics, and data quality issues. #### 3.6.2 Structure ```yaml architectural_notes: schema_design: - bullet point # REQUIRED: Design patterns performance: - bullet point # REQUIRED: Optimization tips data_integration: - bullet point # REQUIRED: Cross-reference patterns data_quality: - bullet point # REQUIRED: Data quirks and issues ``` #### 3.6.3 Requirements - MUST use YAML bullet format (not prose paragraphs) - MUST include all four subsections - Each subsection MUST have at least one bullet point - Content MUST be actionable and relevant to query writing #### 3.6.4 Constraints - No prose paragraphs - Bullets MUST be concise (1-2 sentences each) - Focus on information that helps with querying ### 3.7 data_statistics #### 3.7.1 Purpose Provides quantitative metrics about database contents and performance. #### 3.7.2 Structure ```yaml data_statistics: total_[entity_type]: integer # REQUIRED: Entity counts coverage: # REQUIRED: Property completeness property_name: string # Format: "~XX%" or ">XX%" cardinality: # REQUIRED: Relationship metrics avg_[relationship]: number # Average cardinality performance_characteristics: # REQUIRED: Query performance - observation # Tested observations data_quality_notes: # OPTIONAL: Data issues - note # Quality concerns ``` #### 3.7.3 Requirements - MUST include entity counts for all major types - MUST include coverage statistics for important properties - MUST include cardinality metrics for key relationships - MUST include performance observations from actual testing - MAY include data quality notes if relevant #### 3.7.4 Constraints - All statistics MUST be based on actual queries - Coverage percentages should be approximate ranges - Performance characteristics MUST be reproducible ### 3.8 anti_patterns #### 3.8.1 Purpose Documents common mistakes with corrected versions. #### 3.8.2 Structure ```yaml anti_patterns: - title: string # REQUIRED: Mistake description problem: string # REQUIRED: Why it's wrong wrong_sparql: string # REQUIRED: Incorrect query correct_sparql: string # REQUIRED: Fixed query explanation: string # REQUIRED: What changed ``` #### 3.8.3 Requirements - MUST contain 2-3 examples - Each MUST show both wrong and correct versions - Both queries SHOULD be tested (wrong should fail/timeout, correct should work) - Focus on mistakes that researchers actually make #### 3.8.4 Constraints - Count: 2-3 examples (not more, not less) - Both `wrong_sparql` and `correct_sparql` MUST be provided - Explanation MUST be clear and educational ### 3.9 common_errors #### 3.9.1 Purpose Documents error scenarios with causes and solutions. #### 3.9.2 Structure ```yaml common_errors: - error: string # REQUIRED: Error type/message causes: # REQUIRED: List of causes - cause # At least one cause solutions: # REQUIRED: List of solutions - solution # At least one solution example_fix: string # OPTIONAL: Before/after code ``` #### 3.9.3 Requirements - MUST contain 2-3 error scenarios - Each MUST have at least one cause and one solution - Focus on errors researchers actually encounter - MAY include example fixes if helpful #### 3.9.4 Constraints - Count: 2-3 scenarios - Each MUST have actionable solutions - Causes MUST be specific and accurate ## 4. Discovery Requirements ### 4.1 Systematic Discovery Process Before creating an MIE file, creators MUST: 1. Check for existing documentation (`get_MIE_file()`, `get_shex()`) 2. Query ontology graphs for ALL entity types 3. Explore multiple URI patterns 4. Sample instances for each discovered entity type 5. Verify findings across different query patterns ### 4.2 Avoiding Bias - MUST NOT rely solely on first 50 results - MUST query ontology graphs before sampling data - MUST NOT assume timeouts mean "data doesn't exist" - MUST explore multiple query strategies ### 4.3 Verification All SPARQL queries in the MIE file MUST be: - Actually executed against the database - Confirmed to return valid results - Tested with appropriate LIMIT values ## 5. Compliance Checking ### 5.1 Existing MIE File Evaluation When an existing MIE file is found, evaluate against: #### Structure & Format - [ ] Valid YAML syntax - [ ] All 9 required sections present - [ ] Sections in correct order - [ ] Version/license/access metadata in schema_info #### Sample RDF Entries - [ ] Exactly 5 examples - [ ] Covers diverse categories - [ ] Each has 1-2 sentence description #### SPARQL Query Examples - [ ] Exactly 7 queries - [ ] Correct complexity distribution (2/3/2) - [ ] Includes keyword filtering query - [ ] Includes biological annotations query (if applicable) - [ ] No cross-reference queries - [ ] All tested and working #### Shape Expressions - [ ] Minimal comments (only non-obvious) - [ ] Covers ALL major entity types #### Other Sections - [ ] Cross-references organized by pattern - [ ] All external databases listed - [ ] Architectural notes in YAML bullets - [ ] Statistics include counts, coverage, cardinality - [ ] 2-3 anti-patterns with both versions - [ ] 2-3 common errors with solutions #### Compliance Threshold - **≥90% pass**: Update existing file - **<90% pass**: Create new file from scratch ## 6. Quality Assurance ### 6.1 Pre-Finalization Checklist #### Discovery - [ ] Queried ontology graphs for all entity types - [ ] Explored multiple URI patterns - [ ] Documented ALL major entity types - [ ] Verified findings with multiple query strategies #### Structure - [ ] Valid YAML with all 9 required sections - [ ] Sections in correct order - [ ] Schema_info includes version/license/access - [ ] ShEx has minimal comments, covers all types - [ ] Exactly 5 diverse RDF examples - [ ] Exactly 7 SPARQL queries (2/3/2 distribution) - [ ] Includes keyword filtering query - [ ] Includes biological annotations query - [ ] Cross-references organized by pattern - [ ] Architectural notes in YAML bullets #### Quality - [ ] All SPARQL queries tested and work - [ ] 2-3 anti-patterns with wrong/correct versions - [ ] 2-3 common errors with solutions - [ ] Statistics: counts, coverage, cardinality, performance - [ ] Everything concise—no unnecessary content - [ ] No sampling bias in documentation - [ ] No premature conclusions ### 6.2 Content Standards - Descriptions are actionable and query-focused - No redundant or excessive information - All statistics based on actual measurements - All examples use real data from the database - Documentation enables effective querying ## 7. Best Practices ### 7.1 Writing Style - **Concise**: If it doesn't help query writing, omit it - **Clear**: Use simple, direct language - **Complete**: Cover all entity types and patterns - **Correct**: All queries tested and verified ### 7.2 SPARQL Queries - Always use appropriate LIMIT clauses - Test with different LIMIT values if queries timeout - Include comments for complex patterns - Use meaningful variable names ### 7.3 Coverage - Document ALL entity types, not just common ones - Include rare but important patterns - Note data quality issues that affect querying - Provide workarounds for known limitations ## 8. Common Pitfalls ### 8.1 Discovery Phase - ❌ Sampling bias: First 50 results don't represent entire database - ❌ Premature conclusions: Timeout ≠ "data doesn't exist" - ❌ Incomplete coverage: Only documenting obvious entity types - ❌ Missing error guidance: Not testing what fails ### 8.2 Documentation Phase - ❌ Excessive comments in ShEx - ❌ Wrong number of examples (must be exactly 5 and 7) - ❌ Untested SPARQL queries - ❌ Cross-reference queries in SPARQL examples section - ❌ Prose paragraphs in architectural_notes - ❌ Fabricated RDF examples ### 8.3 Quality Phase - ❌ Not verifying all queries work - ❌ Missing anti-patterns or common errors - ❌ Incomplete statistics - ❌ Invalid YAML syntax ## 9. Success Criteria An MIE file is considered complete and compliant when: 1. **Valid YAML** with all 9 sections in correct order 2. **Complete discovery**: All entity types documented 3. **Tested queries**: All 7 SPARQL queries work correctly 4. **Correct counts**: Exactly 5 RDF examples, exactly 7 queries, 2-3 anti-patterns, 2-3 errors 5. **Comprehensive shapes**: ShEx covers ALL major entity types 6. **Proper organization**: Cross-references by pattern, notes in bullets 7. **Quality metrics**: Counts, coverage, cardinality, performance included 8. **Error prevention**: Anti-patterns and common errors documented 9. **Metadata complete**: Version, license, and access information provided 10. **Actionable content**: Everything helps with query writing ## 10. Version History | Version | Date | Changes | |---------|------|---------| | 1.0 | [Date] | Initial specification | ## 11. References ### 11.1 Related Standards - **ShEx**: Shape Expressions Language (https://shex.io/) - **SPARQL**: SPARQL 1.1 Query Language (W3C Recommendation) - **YAML**: YAML Ain't Markup Language (https://yaml.org/) - **RDF**: Resource Description Framework (W3C Recommendation) ### 11.2 Tools - `get_sparql_endpoints()` - Get available SPARQL endpoints - `get_graph_list(dbname)` - List named graphs - `get_sparql_example(dbname)` - Get example query - `run_sparql(dbname, query)` - Execute SPARQL - `get_shex(dbname)` - Retrieve ShEx schema - `get_MIE_file(dbname)` - Retrieve existing MIE - `save_MIE_file(dbname, content)` - Save MIE file ## 12. Appendix A: Complete Template ```yaml schema_info: title: [DATABASE_NAME] description: | [3-5 sentences covering: contents, ALL entity types, use cases, features] endpoint: https://rdfportal.org/example/sparql base_uri: http://example.org/ graphs: - http://example.org/dataset - http://example.org/ontology version: mie_version: "1.0" mie_created: "YYYY-MM-DD" data_version: "Release YYYY.MM" update_frequency: "Monthly" license: data_license: "License name" license_url: "https://..." access: rate_limiting: "100 queries/min" max_query_timeout: "60 seconds" backend: "Virtuoso" shape_expressions: | PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> <EntityShape> { a [ schema:Type ] ; schema:property xsd:string ; schema:optional xsd:string ? } sample_rdf_entries: - title: [Descriptive title] description: [1-2 sentences] rdf: | [Real RDF from database] - title: [Descriptive title] description: [1-2 sentences] rdf: | [Real RDF from database] - title: [Descriptive title] description: [1-2 sentences] rdf: | [Real RDF from database] - title: [Descriptive title] description: [1-2 sentences] rdf: | [Real RDF from database] - title: [Descriptive title] description: [1-2 sentences] rdf: | [Real RDF from database] sparql_query_examples: - title: [What it does] description: [Context] question: [Natural language] complexity: basic sparql: | [Tested query] - title: [What it does] description: [Context] question: [Natural language] complexity: basic sparql: | [Tested query] - title: [What it does] description: [Context] question: [Natural language] complexity: intermediate sparql: | [Tested query] - title: [What it does] description: [Context] question: [Natural language] complexity: intermediate sparql: | [Tested query] - title: [What it does] description: [Context] question: [Natural language] complexity: intermediate sparql: | [Tested query] - title: [What it does] description: [Context] question: [Natural language] complexity: advanced sparql: | [Tested query] - title: [What it does] description: [Context] question: [Natural language] complexity: advanced sparql: | [Tested query] cross_references: - pattern: rdfs:seeAlso description: | [How external links work] databases: category_name: - Database1 (~XX%) - Database2 (~YY%) sparql: | [Representative query] architectural_notes: schema_design: - [Entity relationships] - [Design patterns] performance: - [Optimization tips] - [Query hints] data_integration: - [Cross-reference patterns] - [External links] data_quality: - [Data quirks] - [Known issues] data_statistics: total_entity_type1: count total_entity_type2: count coverage: property1_coverage: "~XX%" property2_coverage: ">YY%" cardinality: avg_relationship1: X.X avg_relationship2: Y.Y performance_characteristics: - "Query type A: <1s for N results" - "Query type B: timeout at LIMIT 10000" data_quality_notes: - "Issue description" anti_patterns: - title: "Common mistake 1" problem: "Why it's wrong" wrong_sparql: | # Bad query correct_sparql: | # Fixed query explanation: "What changed" - title: "Common mistake 2" problem: "Why it's wrong" wrong_sparql: | # Bad query correct_sparql: | # Fixed query explanation: "What changed" common_errors: - error: "Error type 1" causes: - "Cause 1" - "Cause 2" solutions: - "Solution 1" - "Solution 2" example_fix: | # Before/after (optional) - error: "Error type 2" causes: - "Cause 1" solutions: - "Solution 1" ``` ## 13. Appendix B: Validation Rules ### Structural Validation 1. File MUST be valid YAML 2. All 9 sections MUST be present 3. Sections MUST be in specified order 4. All required fields MUST be populated ### Content Validation 1. `sample_rdf_entries` count = 5 2. `sparql_query_examples` count = 7 3. SPARQL complexity distribution = 2 basic, 3 intermediate, 2 advanced 4. `anti_patterns` count = 2 or 3 5. `common_errors` count = 2 or 3 6. All dates in ISO 8601 format 7. All URIs are valid ### Query Validation 1. All SPARQL queries execute without syntax errors 2. All SPARQL queries return results (or documented as intentionally empty) 3. All queries have appropriate LIMIT clauses 4. No queries timeout (or timeout is documented) ### Coverage Validation 1. All major entity types have ShEx shapes 2. Cross-references document all external databases found 3. Statistics include all major entity types 4. Anti-patterns address common real mistakes --- **Document Status**: Specification v1.0 **Compliance**: REQUIRED for all MIE files **Principle**: Compact, Complete, Clear, Correct, Actionable

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/arkinjo/togo-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

MIE_file_specs.md•21.6 KiB