Skip to main content
Glama
rdf_portal_guide.md12.6 kB
# RDF Portal Guide v15.0 ## Core Principle **MIE Files → Explore RDF → Comprehensive SPARQL → TogoID Only If Needed** **MIE files** (Metadata Interoperability Exchange) contain ShEx schemas, RDF examples, and SPARQL templates - essential for understanding graph structures before querying. Most cross-references exist in anchor RDF databases. Explore first, convert only when necessary. - **70-95% coverage** achievable - **2x faster** than tool-heavy approaches - **Higher transparency** for research --- ## Workflow (10-20 minutes) 1. **MIE File Analysis** [CRITICAL] - Understand graph structure, properties, examples 2. **RDF Exploration** [MANDATORY] - Discover available cross-references 3. **Comprehensive SPARQL** [CORE] - Get ALL data in ONE query 4. **Execution** [CORE] - Run query, extract IDs 5. **TogoID** [IF NEEDED] - Only for missing databases 6. **Final Report** - Consolidate all findings **Documentation:** Create ONE artifact at Step 1, then update it progressively after each step for a complete analysis record. **Optional:** Use OLS for publication-grade keyword standardization --- ## Step 1: Analyze MIE File ⭐ ### Get MIE File FIRST **MIE files are CRITICAL** - they contain the graph structure, property definitions, and example queries. ```python # Always start here! mie_content = get_MIE_file(dbname="uniprot") # Create initial artifact with MIE analysis # Artifact name: rdf_analysis_results.md ``` **Create artifact: `rdf_analysis_results.md`** ```markdown # RDF Portal Analysis Results *This document is progressively updated at each step* --- ## Step 1: MIE File Analysis ### Database: uniprot ### MIE File Contents [MIE content displayed here - includes ShEx schema, RDF examples, SPARQL examples] ### Key Elements Identified - ShEx schema properties - Cross-reference predicates (e.g., rdfs:seeAlso) - Example SPARQL queries - Data structure patterns ``` ### Available Databases ```python # List all available databases databases = list_databases() # Update artifact with database list ``` **Update artifact to add:** ```markdown ### Available Databases [Database list displayed here] **Common anchors identified:** - uniprot, chembl, pubchem, pdb, reactome - mesh, go, taxonomy, wikidata ``` --- ## Step 2: Explore RDF Cross-References ⭐ ### Discover Available Cross-References ```sparql # What databases are linked? PREFIX up: <http://purl.uniprot.org/core/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?refType (COUNT(DISTINCT ?protein) as ?count) WHERE { ?protein a up:Protein ; up:reviewed 1 ; rdfs:seeAlso ?ref . BIND(REPLACE(STR(?ref), "^(https?://[^/]+).*", "$1") AS ?refType) } GROUP BY ?refType ORDER BY DESC(?count) LIMIT 30 ``` ```python # Execute discovery query xref_results = run_sparql(dbname="uniprot", sparql_query=discovery_query) # Update artifact with discovery results ``` **Update artifact to add:** ```markdown --- ## Step 2: Cross-Reference Discovery ### Discovery Query ```sparql [Discovery query displayed here] ``` ### Results [Query results displayed here - showing available databases and counts] ### Sample Entity Analysis **Entity examined:** P04637 (p53) [Sample cross-references displayed here] ### Findings Summary **Available in RDF (Use SPARQL):** - ✅ PDB - ✅ Reactome - ✅ ChEMBL - ✅ Ensembl **Missing from RDF (Consider TogoID):** - ❌ DrugBank - ❌ KEGG **Recommendation:** Use comprehensive SPARQL for available databases. Test TogoID coverage for missing databases. ``` --- ## Step 3: Comprehensive SPARQL Query ⭐ ### Design Pattern: ONE Query with OPTIONAL ```sparql PREFIX up: <http://purl.uniprot.org/core/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT DISTINCT ?protein ?label ?pdb ?reactome ?chembl ?ensembl WHERE { # Core properties ?protein a up:Protein ; up:reviewed 1 ; up:organism <http://purl.uniprot.org/taxonomy/9606> . # REQUIRED: Anchor cross-reference ?protein rdfs:seeAlso ?pdb . FILTER(CONTAINS(STR(?pdb), "rdf.wwpdb.org")) # OPTIONAL: Additional databases OPTIONAL { ?protein rdfs:seeAlso ?reactome . FILTER(CONTAINS(STR(?reactome), "reactome")) } OPTIONAL { ?protein rdfs:seeAlso ?chembl . FILTER(CONTAINS(STR(?chembl), "chembl")) } OPTIONAL { ?protein rdfs:seeAlso ?ensembl . FILTER(CONTAINS(STR(?ensembl), "ensembl")) } # Content filters FILTER([your keyword filters]) } LIMIT 100 ``` ```python # Execute comprehensive query comprehensive_query = """[Your SPARQL query here]""" results = run_sparql(dbname="uniprot", sparql_query=comprehensive_query) # Update artifact with query and results ``` **Update artifact to add:** ```markdown --- ## Step 3: Comprehensive SPARQL Query ### Query ```sparql [Comprehensive SPARQL query displayed here] ``` ### Results [Query results displayed here] ### Coverage Analysis - Total entities retrieved: [count] - Entities with PDB: [count] ([percentage]%) - Entities with Reactome: [count] ([percentage]%) - Entities with ChEMBL: [count] ([percentage]%) - Entities with Ensembl: [count] ([percentage]%) ``` **Key Points:** - Use `OPTIONAL` for cross-refs to avoid excluding entities - Get everything in ONE query, not multiple queries - Filter by organism early for performance --- ## Step 4: Execution Execute the SPARQL query and parse results to collect all cross-references for each entity. ```python # Parse results and extract cross-reference IDs import json # Assuming results is a JSON response parsed_results = json.loads(results) # Extract and organize IDs by database id_mapping = {} for binding in parsed_results.get('results', {}).get('bindings', []): protein_id = extract_id(binding.get('protein', {}).get('value', '')) id_mapping[protein_id] = { 'uniprot': protein_id, 'pdb': extract_id(binding.get('pdb', {}).get('value', '')) if 'pdb' in binding else None, 'reactome': extract_id(binding.get('reactome', {}).get('value', '')) if 'reactome' in binding else None, 'chembl': extract_id(binding.get('chembl', {}).get('value', '')) if 'chembl' in binding else None, 'ensembl': extract_id(binding.get('ensembl', {}).get('value', '')) if 'ensembl' in binding else None } # Update artifact with extracted IDs ``` **Update artifact to add:** ```markdown --- ## Step 4: Extracted Cross-Reference IDs ### Summary - Total proteins: [count] - Proteins with PDB: [count] - Proteins with Reactome: [count] - Proteins with ChEMBL: [count] - Proteins with Ensembl: [count] ### ID Mappings (JSON) ```json [JSON formatted mappings displayed here] ``` ### ID Mappings (TSV) ``` UniProt PDB Reactome ChEMBL Ensembl [TSV formatted data displayed here] ``` ``` --- ## Step 5: TogoID (Only If Needed) ### When to Use TogoID **Use ONLY if:** - Database NOT in anchor RDF (<20% SPARQL coverage) - TogoID coverage is better than SPARQL - Need multi-hop conversion **Test first:** ```python # Test coverage before full conversion sample_ids = "P04637,P00533,P21802" # Sample test_result = countId( ids=sample_ids, source="uniprot", target="drugbank" ) coverage = test_result["target"] / test_result["source"] # Update artifact with coverage test ``` **Update artifact to add:** ```markdown --- ## Step 5: TogoID Analysis ### Coverage Test - Source: uniprot - Target: drugbank - Sample IDs: [IDs listed here] - Source count: [count] - Target count: [count] - Coverage: [percentage]% ### Decision [Good coverage decision or limitation documented here] ``` ```python if coverage > 0.5: # Good coverage - use TogoID all_uniprot_ids = ",".join(id_mapping.keys()) converted = convertId( ids=all_uniprot_ids, route="uniprot,drugbank", report="pair" ) # Update artifact with conversion results ``` **If good coverage, update artifact to add:** ```markdown ### Conversion Results - Route: uniprot → drugbank - Total IDs submitted: [count] - Successfully converted: [count] - Conversion rate: [percentage]% ### Converted IDs [Conversion results displayed here] ``` **If poor coverage, update artifact to add:** ```markdown ### Limitation Documented - Database: drugbank - Coverage: [percentage]% - Conclusion: TogoID coverage insufficient. Consider alternative approaches or accept data limitation. ``` --- ## Step 6: Final Report After completing all steps, finalize the artifact with a comprehensive summary. ```python # Update artifact with final summary ``` **Update artifact to add:** ```markdown --- ## Final Report ### Project Overview - Analysis Date: [date] - Anchor Database: uniprot - Target Organism: Homo sapiens (9606) - Methodology: RDF Portal Guide v15.0 ### Workflow Summary **Step 1: MIE Analysis** ✅ - MIE file retrieved and analyzed - ShEx schema documented - RDF examples reviewed **Step 2: RDF Exploration** ✅ - Cross-references discovered in RDF - Sample entities examined - Available databases identified **Step 3: Comprehensive SPARQL** ✅ - Single comprehensive query executed - Results: [count] proteins retrieved - Coverage per database documented **Step 4: ID Extraction** ✅ - Cross-reference IDs extracted - Mappings available in JSON and TSV formats **Step 5: TogoID** [✅ or N/A] - Coverage test performed - [Conversions completed or not needed] ### Final Coverage Summary | Database | Source | Count | Coverage | |----------|--------|-------|----------| | PDB | SPARQL | [count] / [total] | [percentage]% | | Reactome | SPARQL | [count] / [total] | [percentage]% | | ChEMBL | SPARQL | [count] / [total] | [percentage]% | | Ensembl | SPARQL | [count] / [total] | [percentage]% | ### Methodology Notes - MIE-first approach for understanding graph structure - Comprehensive SPARQL for efficient data retrieval - TogoID used only when necessary - Single artifact progressively updated for complete documentation ### Reproducibility All queries, results, and methodologies documented in this artifact. Analysis can be reproduced by following the documented steps. --- **Analysis Complete** ✅ ``` --- ## Decision Tree ``` Need multiple databases? ↓ Get MIE file for anchor database ↓ Create artifact & document MIE analysis ↓ Study ShEx schema & examples ↓ Explore anchor RDF cross-references ↓ Update artifact with findings ↓ Databases in RDF (>80% coverage)? ├─ YES → Use comprehensive SPARQL, SKIP TogoID │ Update artifact with results └─ NO → Test TogoID coverage ├─ >50% → Use TogoID │ Update artifact with conversion └─ <50% → Document limitation Update artifact with notes ↓ Finalize artifact with summary report ``` --- ## Common Patterns ### UniProt Anchor (Proteins) **In RDF:** PDB (20-30%), Reactome (15-25%), ChEMBL (10-20%), Ensembl (95%), GO (80%) **May need TogoID:** DrugBank, KEGG ### ChEMBL/PubChem (Compounds) **In RDF:** UniProt, PubChem, ChEBI **May need TogoID:** DrugBank, KEGG ### Ensembl (Genes) **In RDF:** UniProt, GO, Reactome **May need TogoID:** OMIM, HGNC --- ## Critical Warnings ❌ **Don't skip MIE file analysis** - Critical for understanding graph structure ❌ **Don't assume TogoID is needed** - Most cross-refs are in RDF ❌ **Don't skip RDF exploration** - Leads to unnecessary tools ❌ **Don't make all cross-refs REQUIRED** - Use OPTIONAL patterns ❌ **Don't create multiple queries** - Use ONE comprehensive query ✅ **Always get MIE file first** ✅ **Create ONE artifact and update it progressively** ✅ **Always explore anchor RDF** before converting ✅ **Test TogoID coverage before using** ✅ **Document your methodology** ✅ **Compare SPARQL vs TogoID coverage** --- ## Coverage Expectations | Quality | SPARQL Direct | +TogoID | Multi-DB Total | |---------|---------------|---------|----------------| | **Field-Leading** | 80-95% | +5-10% | 85-100% | | **Publication** | 60-80% | +10-20% | 70-90% | | **Proof-of-Concept** | 40-60% | +15-25% | 55-75% | --- ## Key Takeaway > **"Start with MIE files to understand graph structure. Most cross-references already exist in anchor RDF. Explore first, convert only when necessary. Document everything in ONE progressively updated artifact for complete reproducibility."** - 📚 MIE files reveal structure & examples - ⚡ 2x faster analysis - 🎯 Better coverage (+5-10%) - 🔍 More transparent - ✅ Higher reproducibility - 📝 ONE artifact, progressively updated through all steps

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/arkinjo/togo-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server