RAGStack-Lambda

CHANGELOG.md•11.1 KiB

# Changelog ## [2.3.2] - 2026-02-06 ### Added - **MCP Registry submission**: Prepared ragstack-mcp for the official Model Context Protocol Registry - Added `mcp-name` verification marker for PyPI package validation - Created `server.json` with registry schema, environment variable definitions, and transport config - Registry name: `io.github.hatmanstack/ragstack` - Bumped ragstack-mcp to v0.1.3 ## [2.3.1] - 2026-02-04 ### Added - **AdditionalCorsOrigins parameter**: Parent stacks can now pass additional CORS origins for S3 data bucket access - Enables uploads from Amplify-hosted or other external frontends when RAGStack is deployed as nested stack - Accepts comma-separated URLs (e.g., `https://main.d123.amplifyapp.com,https://example.com`) - **Custom Cognito email templates**: Admin invite and verification emails now include branding and dashboard URL - Invite email includes stack name, dashboard URL, username, and temporary password - Verification email includes stack name and verification code - Email delayed until UI build completes (when BuildDashboard=true) so dashboard link works immediately ### Changed - **`chat_allow_document_access` default**: Changed from `False` to `True` - document sources are now downloadable by default ### Fixed - **ProcessMedia Lambda non-media file crash**: EventBridge S3 events for non-media files no longer crash with `KeyError: 'document_id'` - Lambda now properly detects EventBridge events and returns early for non-media files - Previously logged "Skipping non-media file" but continued execution and crashed - **Ingestion retry for "running" job errors**: Fixed retry logic to also catch "running ingestion job" errors - Previously only retried on "ongoing" keyword, missing "There is at least one running ingestion job" errors - Documents uploaded during active KB sync now properly retry instead of failing with `OCR_COMPLETE` status ## [2.3.0] - 2026-02-04 ### Added - **Comprehensive nested stack support**: StackPrefix parameter now applies to **all** 127+ AWS resources for complete parent/child stack isolation - Lambda functions, DynamoDB tables, Step Functions, log groups, SSM parameters, S3 Vectors index, and all other resources - Enables deploying RAGStack as a nested CloudFormation stack without uppercase naming conflicts - Fully backward compatible: empty StackPrefix uses stack name (existing deployments unaffected) ### Fixed - **Hardcoded stack name references in IAM permissions**: All ARN references in IAM policies now use StackPrefix conditional pattern - CodeBuild log group permissions - Lambda ARN permissions for Knowledge Base updates - Step Functions log groups and execution ARNs - SSM parameter ARNs - Budget name environment variables ### Documentation - **Nested stack deployment guide**: Updated to reflect StackPrefix applies to all resources, not just S3 buckets - Added resource naming examples for Lambda, DynamoDB, Step Functions, log groups - Clarified StackPrefix requirements and warnings apply to all resources - Enhanced troubleshooting section ## [2.2.1] - 2026-01-29 ### Fixed - **Configuration seeder Decimal handling**: Convert float defaults to Decimal when seeding config (fixes deployment failure for `multislice_filtered_boost`) ## [2.2.0] - 2026-01-29 ### Added - **Configurable filtered results boost**: New UI setting in Metadata Query section to adjust score multiplier for filtered results (1.0-2.0 range, default 1.25) - **Boosted scores in search results**: Frontend now displays boosted relevance scores, reflecting actual ranking ### Fixed - **Filter generation for name queries**: Strengthened LLM prompt to always generate `people_mentioned` filters when names are mentioned (e.g., "Pictures of Judy" → `{"people_mentioned": {"$eq": "judy"}}`) - **Filter examples use only allowed keys**: Validation ensures generated examples don't use keys outside the allowlist; prompt explicitly lists allowed keys - **AppSync config write permission**: Changed from `DynamoDBReadPolicy` to `DynamoDBCrudPolicy` so UI can save filter key settings - **DynamoDB Decimal handling**: Convert `float` to `Decimal` when writing config, and `Decimal` to `float` when reading boost values ### Changed - **SchemaVersion bumped to 6**: Existing stacks will re-run seeder on deploy to get new config defaults (`multislice_filtered_boost`, `metadata_filter_examples`, `metadata_filter_keys`) - **Reindex time estimate**: UI now says "several minutes to hours" instead of "several minutes" - **Filter examples help text**: Clarified that disabled examples are replaced when regenerating ## [2.1.0] - 2026-01-28 ### Added - **Filter keys allowlist**: Users can now select which metadata keys are used for filter generation via a new multiselect UI in the Filter Examples section - **Regenerate Examples button**: Manual control over when filter examples are regenerated (decoupled from metadata analysis) - **`regenerateFilterExamples` mutation**: New GraphQL mutation for on-demand filter example generation using only allowlisted keys ### Changed - **Separated key discovery from example generation**: `analyzeMetadata` now only updates key library statistics; filter examples require explicit regeneration - **Automatic filter key cleanup**: When a metadata key is deleted, it's automatically removed from the filter keys allowlist ### Fixed - **Bedrock client timeout consistency**: Filter example generation now uses same timeout config as other Bedrock calls (10s connect, 300s read) - **Safe dictionary access in filter generation**: Prevents KeyError when field analysis has missing keys - **FilterKeyInput error handling**: Added error state display and retry capability; filters out empty key names ## [2.0.2] - 2026-01-28 ### Added - **Delete metadata keys from UI**: New delete button in Metadata Key Statistics table with confirmation modal - **`$listContains` filter operator**: Support for filtering on array fields (e.g., `surnames`, `people_mentioned`) - **Manual keys in filter generation**: When extraction mode is "manual", filter generator only sees configured keys ### Fixed - **Reindex flow simplified**: Extract metadata first, create KB after, single baseline sync (removed redundant per-document API ingestion and finalize sync) - **Reindex URI parsing**: `_list_text_uris_for_reindex` now uses `document_id` directly instead of parsing from `output_s3_uri` (handles corrupted URIs) - **Metadata analyzer preserves counts**: No longer overwrites `occurrence_count` from ingestion with sample counts - **Manual keys in analyzer**: When extraction mode is "manual", only configured keys marked active - **Delete metadata key auth**: Added `@aws_api_key` directive and `DynamoDBCrudPolicy` for AppSync resolver - **Cache attribute typo**: Fixed `_cache_time` → `_active_keys_cache_time` in key library ### Changed - **Multislice merge prioritizes filtered results**: Filtered slice results appear first, then guaranteed minimum from other slices - **Reindex state machine**: Removed `WaitForFinalizeSync` loop, sync happens once after all metadata extraction ### Documentation - Promoted manual metadata keys workflow as recommended approach for better search results ## [2.0.1] - 2026-01-27 ### Fixed - OCR fallback paths writing to `output/` instead of `content/` when `output_s3_uri` not set - Reindex state machine baseline sync using in-Lambda polling (2-minute limit) instead of Step Functions Wait+Poll loop - Reindex finalize timeout from in-Lambda polling exceeding Lambda limits — moved to Step Functions polling - Config table reindex lock using wrong partition key (`config_key` instead of `Configuration`) - Ingestion retry not catching "can't exceed" concurrent request throttle errors - Migration script missing `content_type` backfill for v1 records - Migration script using generic `media` content type instead of specific `video`/`audio` types - `listImages` query failing when any image has null `created_at` (`AWSDateTime!` non-nullable) — added fallback to `updated_at` - Multislice retriever merging results purely by score, causing filtered metadata matches to be buried by visual similarity matches - KB retrieval returning too few results due to Bedrock's stricter relevance cutoff at low `numberOfResults` values ### Added - `scripts/copy_stack_data.py` for copying S3 and DynamoDB data between stacks - Baseline sync polling loop in reindex state machine (`WaitForSync` → `CheckSyncStatus` → `IsSyncComplete`) - Finalize sync polling loop in reindex state machine (`WaitForFinalizeSync` → `CheckFinalizeSyncStatus` → `IsFinalizeSyncComplete`) - `ServiceUnavailableException` retry handling in document ingestion - `handle_check_sync_status` and `handle_check_finalize_sync` actions in reindex Lambda - Guaranteed-minimum merge strategy for multislice retrieval — ensures each slice contributes at least 3 results before filling remaining slots by score - Baseline `.jpg.metadata.json` sidecars for images missing them (content_type, document_id, filename, surnames) ### Changed - Document table filename column now wraps long names across multiple lines (Cloudscape `wrapLines` + CSS override) - KB retrieval `numberOfResults` increased from 5 to 25 for both chat and search to improve recall ## [2.0.0] - 2026-01-10 ### Breaking Changes - **Single Data Source Architecture**: Consolidated from two S3 data sources to one - S3 paths changed: `output/` and `images/` merged into `content/` - All content types now use the same data source with `inclusionPrefixes: ["content/"]` - Removed `TextDataSourceId` and `ImageDataSourceId` CloudFormation outputs - Removed `TEXT_DATA_SOURCE_ID` and `IMAGE_DATA_SOURCE_ID` Lambda environment variables - **Unified Metadata Format**: All content now uses `.metadata.json` files - Images switched from inline attributes (`IN_LINE_ATTRIBUTE`) to S3 files (`S3_LOCATION`) - Image metadata now stored in `content/{imageId}/caption.txt.metadata.json` ### Migration - **Migration script provided**: `scripts/migrate_v1_to_v2.py` handles S3 file copying and tracking table updates - **Reindex handles re-ingestion**: After migration, use Settings UI to trigger reindex with fresh metadata - See [docs/MIGRATION.md](docs/MIGRATION.md) for complete migration guide ### Added - **`content_type` metadata field**: All content now includes a `content_type` field for filtering - Documents: `content_type: "document"` - Images: `content_type: "image"` - Web pages: `content_type: "web_page"` - **Simplified query handlers**: Single unified query instead of dual data source queries - **Optional content_type filtering**: Query by content type using metadata filters - **Universal reindex**: Reindex now processes all content types (documents, images, scraped pages) - **Migration script**: `scripts/migrate_v1_to_v2.py` for migrating v1.x deployments ### Removed - Dual data source architecture (`output/` and `images/` prefixes) - Multi-slice data source filtering by `x-amz-bedrock-kb-data-source-id` - `IN_LINE_ATTRIBUTE` metadata type for images ### Changed - Increased default retrieval results from 5 to 10 for unified queries - Simplified MultiSliceRetriever to not require data_source_id parameter ## [1.0.0] - Previous Initial release with dual data source architecture.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/HatmanStack/RAGStack-Lambda'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CHANGELOG.md•11.1 KiB