Map identifiers between databases.
SYNTAX: biobtree_map(terms="ID", chain=">>source>>target")
- Chain MUST start with ">>"
- Source MUST match input ID type
ID TYPE → SOURCE:
- ENSG* → >>ensembl
- P*/Q*/O* → >>uniprot
- CHEMBL* → >>chembl_molecule
- GO:* → >>go
- MONDO:* → >>mondo
- HP:* → >>hpo
- HGNC:* or gene symbols → >>hgnc
SOME DRUG EXPLORATION PATHS:
- >>chembl_molecule>>chembl_target>>uniprot (drug targets)
- >>pubchem>>pubchem_activity>>uniprot (bioactivity)
- >>gtopdb_ligand>>gtopdb_interaction>>gtopdb>>uniprot (curated pharmacology with affinity data)
- >>ensembl>>reactome>>chebi (pathway chemicals - when no direct targets)
- Discover more via entry xrefs + EDGES
WARNING - GO terms with high xref_count (>100):
- Don't map GO → proteins → drugs (too many results)
- Instead: search drug class for condition → verify targets this GO term
DISEASE GENE PATTERNS:
- >>mondo>>gencc>>hgnc (curated)
- >>mondo>>clinvar>>hgnc (variant-based)
DISEASE → DRUG PATTERNS:
- >>mesh>>chembl_molecule (MeSH disease/condition → drugs with indications)
- >>mondo>>clinical_trials>>chembl_molecule (disease → trial drugs)
DISCOVERY APPROACH:
- Use biobtree_entry to see xrefs (what's connected)
- Use EDGES above to see where each dataset leads
- Build chains based on what connections exist for YOUR entity
RETURNS: mapped identifiers with dataset and name
EDGES (what connects to what):
ensembl: uniprot, go, transcript, exon, ortholog, paralog, hgnc, entrez, refseq, bgee, gwas, gencc, antibody, scxa
hgnc: ensembl, uniprot, entrez, gencc, pharmgkb_gene, msigdb, clinvar, mim, refseq, alphafold, collectri, gwas, dbsnp, hpo, cellphonedb
entrez: ensembl, uniprot, refseq, go, biogrid, pubchem_activity, ctd_gene_interaction
refseq: ensembl, entrez, taxonomy, ccds, uniprot, mirdb
mirdb: refseq
transcript: ensembl, exon, ufeature
uniprot: ensembl, alphafold, interpro, pfam, pdb, ufeature, intact, string, string_interaction, biogrid, biogrid_interaction, chembl_target, go, reactome, rhea, swisslipids, bindingdb, antibody, pubchem_activity, cellphonedb, jaspar, signor, diamond_similarity, esm2_similarity
alphafold: uniprot
interpro: uniprot, go, interproparent, interprochild
chembl_molecule: mesh, chembl_activity, chembl_target, pubchem, chebi, clinical_trials
chembl_activity: chembl_molecule, chembl_assay, bao
chembl_assay: chembl_activity, chembl_target, chembl_document, bao
chembl_target: chembl_assay, uniprot, chembl_molecule
pubchem: chembl_molecule, chebi, hmdb, pubchem_activity, pubmed, patent_compound, bindingdb, ctd, pharmgkb
pubchem_activity: pubchem, ensembl, uniprot
chebi: pubchem, rhea, intact
swisslipids: uniprot, go, chebi, uberon, cl
lipidmaps: chebi, pubchem
dbsnp: hgnc, clinvar, pharmgkb_variant, alphamissense, spliceai
clinvar: hgnc, mondo, hpo, dbsnp, orphanet
alphamissense: uniprot, transcript
gwas: gwas_study, efo, dbsnp, hgnc, mondo
gwas_study: gwas, efo, mondo
mondo: gencc, clinvar, efo, mesh, hpo, clinical_trials, antibody, cellxgene, cellxgene_celltype, orphanet, mondoparent, mondochild, gwas, gwas_study
gencc: mondo, hpo, hgnc, ensembl
clinical_trials: mondo, chembl_molecule
pharmgkb: hgnc, dbsnp, mesh, pharmgkb_gene, pharmgkb_variant, pharmgkb_clinical, pharmgkb_guideline, pharmgkb_pathway
pharmgkb_variant: pharmgkb_clinical, hgnc, mesh, dbsnp
pharmgkb_gene: hgnc, entrez, ensembl, pharmgkb
pharmgkb_clinical: dbsnp, hgnc, mesh, pharmgkb_variant
pharmgkb_guideline: hgnc, pharmgkb
pharmgkb_pathway: hgnc, pharmgkb
ctd: mesh, ctd_gene_interaction, ctd_disease_association, pubchem
ctd_gene_interaction: ctd, entrez, taxonomy, pubmed
ctd_disease_association: ctd, mesh, mim, pubmed
intact: uniprot, chebi, rnacentral
string: uniprot, string_interaction
string_interaction: string, uniprot
biogrid: entrez, uniprot, refseq, taxonomy
bgee: ensembl, uberon, cl, taxonomy, bgee_evidence
bgee_evidence: bgee, uberon, cl
cellxgene: cl, uberon, mondo, efo, taxonomy
cellxgene_celltype: cl, uberon, mondo
scxa: cl, uberon, taxonomy, ensembl, scxa_gene_experiment
scxa_expression: ensembl, scxa, scxa_gene_experiment
scxa_gene_experiment: ensembl, scxa, scxa_expression, cl
rnacentral: uniprot, ensembl, intact, hgnc, refseq, ena
reactome: ensembl, uniprot, chebi, go, reactomeparent, reactomechild
rhea: chebi, uniprot, go
go: ensembl, uniprot, reactome, msigdb, swisslipids, bgee, interpro, goparent, gochild
hpo: clinvar, gencc, mondo, msigdb, orphanet, mim, hmdb, hgnc, hpoparent, hpochild
efo: gwas, mondo, cellxgene, efoparent, efochild
uberon: bgee, cellxgene, cellxgene_celltype, swisslipids, uberonparent, uberonchild
cl: bgee, cellxgene, cellxgene_celltype, scxa, scxa_gene_experiment, clparent, clchild
taxonomy: ensembl, uniprot, bgee, biogrid, ctd_gene_interaction, taxparent, taxchild
mesh: pharmgkb, ctd, ctd_disease_association, pubchem, mondo, chembl_molecule, meshparent, meshchild
eco: ecoparent, ecochild
antibody: ensembl, uniprot, mondo, pdb
msigdb: hgnc, entrez, go, hpo
orphanet: hpo, uniprot, mondo, hgnc, clinvar, mim, mesh
mim: clinvar, hpo, mondo, uniprot, ctd_disease_association
hmdb: pubchem, hpo, chebi, uniprot
collectri: hgnc # transcription factor → target gene interactions
esm2_similarity: uniprot # protein structural similarity
diamond_similarity: uniprot # protein sequence similarity
cellphonedb: uniprot, ensembl, hgnc, pubmed # ligand-receptor pairs for cell-cell communication
spliceai: hgnc
pdb: uniprot, go, interpro, pfam, taxonomy, pubmed
fantom5_promoter: ensembl, hgnc, entrez, uniprot, uberon, cl
fantom5_enhancer: ensembl, uberon, cl
fantom5_gene: ensembl, hgnc, entrez
jaspar: uniprot, pubmed, taxonomy
encode_ccre: taxonomy
bao: chembl_activity, chembl_assay, baoparent, baochild
brenda: uniprot, pubmed, brenda_kinetics, brenda_inhibitor
brenda_kinetics: brenda
brenda_inhibitor: brenda
gtopdb: uniprot, hgnc, gtopdb_ligand, gtopdb_interaction # drug targets (GPCRs, ion channels, enzymes)
gtopdb_ligand: pubchem, chebi, chembl_molecule, gtopdb_interaction # ligands/drugs with binding data
gtopdb_interaction: gtopdb, gtopdb_ligand, pubmed # target-ligand binding with affinity values
FILTER SYNTAX: >>dataset[field operator value]
OPERATORS:
== equals >>dataset[field=="value"]
!= not equals >>dataset[field!="value"]
> greater than >>dataset[field>value]
< less than >>dataset[field<value]
>= greater or equal >>dataset[field>=value]
<= less or equal >>dataset[field<=value]
contains string match >>dataset[field.contains("value")]
LOGICAL OPERATORS:
&& AND >>dataset[field1>5 && field2<10]
|| OR >>dataset[field=="A" || field=="B"]
! NOT >>dataset[!field] or >>dataset[!(field=="value")]
TYPE RULES:
- FLOAT: use decimal point (70.0 not 70)
- INT: no decimal (2 not 2.0)
- STRING: quote values ("Pathogenic", "PHASE3")
- BOOL: true/false (no quotes)
EXAMPLES:
>>chembl_molecule[highestDevelopmentPhase==4] # approved drugs
>>chembl_molecule[highestDevelopmentPhase>=3] # Phase 3+
>>clinical_trials[phase=="PHASE3"]
>>go[type=="biological_process"]
>>clinvar[germline_classification=="Pathogenic"]
>>reactome[name.contains("signaling")]
>>gtopdb[type=="gpcr"] # GPCR targets
>>gtopdb[type=="ion_channel"] # ion channel targets
>>gtopdb_ligand[approved==true] # approved drugs only
>>gtopdb_interaction[endogenous==true] # endogenous ligand interactions