RAGStack-Lambda

Overview Schema Related Servers Score Discussions

STORAGE.md•5.92 KiB

# Storage Module S3 utilities and file operations for document management. ## storage.py ```python def parse_s3_uri(s3_uri: str) -> tuple[str, str] # (bucket, key) def read_s3_text(s3_uri: str, encoding: str = "utf-8") -> str def read_s3_binary(s3_uri: str) -> bytes def write_s3_text(s3_uri: str, content: str, content_type: str = "text/plain") -> None def delete_s3_object(s3_uri: str) -> None def generate_presigned_url(s3_uri: str, expiration: int = 3600) -> str def write_metadata_to_s3(s3_uri: str, metadata: dict) -> None def extract_filename_from_s3_uri(s3_uri: str) -> str def get_file_type_from_filename(filename: str) -> str def is_valid_uuid(value: str) -> bool ``` ## Overview Utility functions for S3 operations with consistent error handling and URI parsing. ## Usage ### Parse S3 URI ```python from ragstack_common.storage import parse_s3_uri bucket, key = parse_s3_uri("s3://my-bucket/path/to/file.txt") # bucket: "my-bucket" # key: "path/to/file.txt" ``` ### Read Text Files ```python from ragstack_common.storage import read_s3_text # UTF-8 (default) content = read_s3_text("s3://bucket/file.txt") # Custom encoding content = read_s3_text("s3://bucket/file.txt", encoding="latin-1") ``` ### Read Binary Files ```python from ragstack_common.storage import read_s3_binary image_bytes = read_s3_binary("s3://bucket/image.png") pdf_bytes = read_s3_binary("s3://bucket/document.pdf") ``` ### Write Text Files ```python from ragstack_common.storage import write_s3_text write_s3_text( "s3://bucket/output.txt", "Hello, world!", content_type="text/plain" ) # Markdown write_s3_text( "s3://bucket/output.md", "# Heading\n\nContent", content_type="text/markdown" ) ``` ### Delete Objects ```python from ragstack_common.storage import delete_s3_object delete_s3_object("s3://bucket/file-to-remove.txt") ``` ### Generate Presigned URLs ```python from ragstack_common.storage import generate_presigned_url # Default 1-hour expiry url = generate_presigned_url("s3://bucket/private-file.pdf") # Custom expiry (5 minutes) url = generate_presigned_url("s3://bucket/temporary.txt", expiration=300) ``` **Returns:** HTTPS URL for temporary public access ### Write Metadata ```python from ragstack_common.storage import write_metadata_to_s3 metadata = { "topic": "genealogy", "date_range": "1900-1950", "location": "chicago" } write_metadata_to_s3("s3://bucket/doc/metadata.json", metadata) ``` ### File Type Detection ```python from ragstack_common.storage import get_file_type_from_filename file_type = get_file_type_from_filename("document.pdf") # "pdf" file_type = get_file_type_from_filename("image.jpg") # "image" file_type = get_file_type_from_filename("video.mp4") # "video" ``` **Returns:** `pdf`, `image`, `video`, `audio`, `text`, `unknown` ### Extract Filename ```python from ragstack_common.storage import extract_filename_from_s3_uri filename = extract_filename_from_s3_uri("s3://bucket/path/file.pdf") # Returns: "file.pdf" ``` ### Validate UUID ```python from ragstack_common.storage import is_valid_uuid is_valid_uuid("550e8400-e29b-41d4-a716-446655440000") # True is_valid_uuid("not-a-uuid") # False ``` ## sources.py ```python def extract_source_url_from_content(content_text: str) -> str | None def extract_image_caption_from_content(content_text: str) -> str | None def extract_filename_from_frontmatter(content_text: str) -> str | None def construct_image_uri_from_content_uri(content_s3_uri: str, content_text: str | None = None) -> str | None ``` ### Extract Source URL ```python from ragstack_common.sources import extract_source_url_from_content content = """--- source_url: https://example.com/article --- Content here""" url = extract_source_url_from_content(content) # Returns: "https://example.com/article" ``` **Use case:** Extract original URL from scraped markdown content ### Extract Image Caption ```python from ragstack_common.sources import extract_image_caption_from_content content = """--- caption: A beautiful sunset over mountains ---""" caption = extract_image_caption_from_content(content) # Returns: "A beautiful sunset over mountains" ``` ### Extract Filename ```python from ragstack_common.sources import extract_filename_from_frontmatter content = """--- filename: document.pdf ---""" filename = extract_filename_from_frontmatter(content) # Returns: "document.pdf" ``` ### Construct Image URI ```python from ragstack_common.sources import construct_image_uri_from_content_uri # From caption content URI content_uri = "s3://bucket/images/img-123/caption/content.txt" image_uri = construct_image_uri_from_content_uri(content_uri) # Returns: "s3://bucket/images/img-123/image.jpg" # With content text for image_id extraction content_text = """--- image_id: img-456 ---""" image_uri = construct_image_uri_from_content_uri(content_uri, content_text) # Returns: "s3://bucket/images/img-456/image.jpg" ``` **Use case:** Convert Knowledge Base caption URIs to actual image file URIs for thumbnails ## Error Handling All functions raise exceptions on errors: ```python from ragstack_common.storage import read_s3_text import botocore.exceptions try: content = read_s3_text("s3://bucket/missing-file.txt") except botocore.exceptions.ClientError as e: if e.response['Error']['Code'] == 'NoSuchKey': print("File not found") elif e.response['Error']['Code'] == 'AccessDenied': print("Permission denied") else: raise ``` ## Best Practices 1. **URI Format**: Always use `s3://bucket/key` format, not `https://` URLs 2. **Encoding**: Specify encoding explicitly for non-UTF-8 text files 3. **Content Type**: Set correct content_type for write operations (affects browser behavior) 4. **Presigned URLs**: Use short expiry times for sensitive documents 5. **Error Handling**: Catch `ClientError` and check error codes for specific handling ## See Also - [constants.py](./UTILITIES.md#constants) - File type constants - [models.py](./UTILITIES.md#models) - Document data models

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/HatmanStack/RAGStack-Lambda'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

STORAGE.md•5.92 KiB