RAGStack-Lambda

Overview Schema Related Servers Score Discussions

README.md•5.06 KiB

# Library Reference Public API for `lib/ragstack_common/`. Internal functions omitted. ## Documentation Structure Library documentation is organized by functional category: ### Core - **[CONFIGURATION.md](./CONFIGURATION.md)** - Configuration management (`config.py`) - **[STORAGE.md](./STORAGE.md)** - S3 utilities and file operations (`storage.py`, `sources.py`) - **[UTILITIES.md](./UTILITIES.md)** - Helper functions and data models (`logging_utils.py`, `auth.py`, `demo_mode.py`, `image.py`, `appsync.py`, `constants.py`, `models.py`) ### Document Processing - **[OCR.md](./OCR.md)** - OCR services and Bedrock client (`ocr.py`, `bedrock.py`) - **[TEXT_EXTRACTORS.md](./TEXT_EXTRACTORS.md)** - Text extraction for HTML, CSV, JSON, XML, EML, EPUB, DOCX, XLSX (`text_extractors/`) - **[MEDIA.md](./MEDIA.md)** - Audio/video transcription and segmentation (`transcribe_client.py`, `media_segmenter.py`) ### Metadata & Retrieval - **[METADATA.md](./METADATA.md)** - Metadata extraction, normalization, and filtering (`metadata_extractor.py`, `metadata_normalizer.py`, `key_library.py`, `filter_generator.py`, `filter_examples.py`) - **[RETRIEVAL.md](./RETRIEVAL.md)** - Knowledge Base retrieval and ingestion (`multislice_retriever.py`, `ingestion.py`) ### Web Scraping - **[SCRAPER.md](./SCRAPER.md)** - Web scraping jobs and configuration (`scraper/`) ## Quick Reference ### Configuration Management ```python from ragstack_common.config import ConfigurationManager config = ConfigurationManager() value = config.get_parameter("chat_primary_model") ``` ### S3 Operations ```python from ragstack_common.storage import read_s3_text, write_s3_text, parse_s3_uri content = read_s3_text("s3://bucket/key") bucket, key = parse_s3_uri("s3://bucket/key") ``` ### OCR Processing ```python from ragstack_common.ocr import OcrService service = OcrService(backend="textract") document = service.process_document(document) ``` ### Metadata Extraction ```python from ragstack_common.metadata_extractor import MetadataExtractor extractor = MetadataExtractor() metadata = extractor.extract_metadata(text, document_id) ``` ### Knowledge Base Retrieval ```python from ragstack_common.multislice_retriever import MultiSliceRetriever retriever = MultiSliceRetriever() results = retriever.retrieve(query, kb_id, ds_id) ``` ### Text Extraction ```python from ragstack_common.text_extractors import extract_text result = extract_text(content_bytes, filename) markdown = result.markdown ``` ### Media Processing ```python from ragstack_common.transcribe_client import TranscribeClient from ragstack_common.media_segmenter import MediaSegmenter client = TranscribeClient() job_name = client.start_transcription_job(doc_id, input_uri, output_bucket) result = client.wait_for_completion(job_name) segmenter = MediaSegmenter(segment_duration=30) segments = segmenter.segment_transcript(words, total_duration) ``` ### Web Scraping ```python from ragstack_common.scraper import ScrapeJob, ScrapeConfig, ScrapeScope config = ScrapeConfig( max_pages=100, max_depth=3, scope=ScrapeScope.HOSTNAME ) ``` ## Environment Variables | Variable | Module | Purpose | |----------|--------|---------| | `AWS_REGION` | Most modules | AWS region for services | | `CONFIGURATION_TABLE_NAME` | config.py | DynamoDB config table | | `METADATA_KEY_LIBRARY_TABLE` | key_library.py, filter_examples.py | Metadata key storage | | `GRAPHQL_ENDPOINT` | appsync.py | AppSync API endpoint for subscriptions | ## Data Models ### Document Main document entity with processing status tracking. ```python from ragstack_common.models import Document, Status, Page doc = Document( document_id="doc-123", filename="example.pdf", input_s3_uri="s3://bucket/input/example.pdf", status=Status.UPLOADED ) ``` ### Status Enums - `Status`: Document processing states - `OcrBackend`: OCR backend types (TEXTRACT, BEDROCK, TEXT_EXTRACTION) - `ImageStatus`: Image processing states - `ScrapeStatus`: Scrape job states ## Error Handling ### Media Processing Exceptions ```python from ragstack_common.exceptions import ( MediaProcessingError, TranscriptionError, UnsupportedMediaFormatError ) ``` ## Constants ```python from ragstack_common.constants import ( MAX_QUERY_LENGTH, PRESIGNED_URL_EXPIRY, DEFAULT_PAGE_SIZE, SUPPORTED_IMAGE_TYPES ) ``` ## Best Practices 1. **Configuration**: Use `ConfigurationManager` for all settings - changes apply immediately without redeployment 2. **S3 Operations**: Always use utility functions instead of boto3 directly for consistent error handling 3. **Metadata**: Enable `update_library=False` when extracting metadata in read-only contexts 4. **Logging**: Use `safe_log_event()` to mask sensitive data before CloudWatch logging 5. **Retries**: Bedrock and ingestion functions have built-in exponential backoff 6. **Media**: Check file format support with `TranscribeClient` before processing ## See Also - [Configuration Guide](../CONFIGURATION.md) - User-facing configuration options - [API Reference](../API_REFERENCE.md) - GraphQL API documentation - [Architecture](../ARCHITECTURE.md) - System design and data flow

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/HatmanStack/RAGStack-Lambda'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•5.06 KiB