hypabase

Official

Overview Schema Related Servers Score Discussions

hypabase
docs
examples

rag-extraction.md•3.3 KiB

# RAG Extraction Pipeline Build a knowledge graph from document extractions, storing entities and relationships with per-source confidence scores. ## Setup ```python from hypabase import Hypabase hb = Hypabase("knowledge.db") ``` ## Extract and store Simulate extracting facts from three documents with different confidence levels: ```python # High-quality academic paper with hb.context(source="doc_arxiv_2401", confidence=0.92): hb.edge(["transformer", "attention", "nlp"], type="concept_link") hb.edge(["bert", "transformer", "pretraining"], type="builds_on") # Blog post — lower confidence with hb.context(source="doc_blog_post", confidence=0.75): hb.edge(["transformer", "gpu", "training"], type="requires") hb.edge(["attention", "memory", "scaling"], type="tradeoff") # Textbook with moderate confidence with hb.context(source="doc_textbook_ch5", confidence=0.5): hb.edge(["rnn", "lstm", "attention"], type="evolution") ``` Each extraction batch gets its own source and confidence. The provenance context manager handles this cleanly. ## Query patterns ### Entity retrieval Find all relationships involving a concept: ```python edges = hb.edges(containing=["transformer"]) # Returns 3 edges: concept_link, builds_on, requires ``` ### Source filtering Retrieve facts from a specific document: ```python edges = hb.edges(source="doc_arxiv_2401") # Returns 2 edges from the arxiv paper ``` ### Confidence-based retrieval Only include high-quality extractions in your RAG context: ```python high_quality = hb.edges(min_confidence=0.8) # Returns 2 edges (arxiv paper), excludes blog post and textbook ``` ### Multi-hop discovery Find paths between concepts across documents: ```python paths = hb.paths("bert", "nlp") # bert → transformer → nlp (across two extraction sources) ``` ### N-ary fact preservation A single edge stores the 3-way concept link: ```python concept_links = hb.edges(type="concept_link") assert len(concept_links[0].node_ids) == 3 # ["transformer", "attention", "nlp"] — not three separate pairs ``` ## Integration with LLM extraction A typical pipeline: ```python import json def extract_and_store(document_id, text, hb): """Extract facts from text using an LLM and store in Hypabase.""" # Your LLM extraction logic here # Returns: [{"entities": [...], "type": "...", "confidence": ...}, ...] extractions = llm_extract(text) with hb.context(source=document_id, confidence=0.85): with hb.batch(): # Single save for all extractions for fact in extractions: hb.edge( fact["entities"], type=fact["type"], confidence=fact.get("confidence"), # Override if LLM provides per-fact score ) ``` ## RAG retrieval function ```python def retrieve_context(query_entities, hb, min_confidence=0.7): """Retrieve structured relationships for RAG context.""" edges = hb.edges( containing=query_entities, min_confidence=min_confidence, ) # Format for LLM context facts = [] for e in edges: facts.append( f"{e.type}: {' + '.join(e.node_ids)} " f"(source={e.source}, confidence={e.confidence})" ) return "\n".join(facts) ``` This gives your LLM structured, provenance-tracked relationships as context.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hypabase/hypabase'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

rag-extraction.md•3.3 KiB