Skip to main content
Glama
IRIS_VECTOR_RAG_NEW_ISSUES_FOUND.md11.5 kB
# iris-vector-rag 0.5.2: New Issues Discovered During Testing **Date**: December 12, 2025 **Status**: 🔍 **NEW BUGS FOUND** during AWS integration testing --- ## TL;DR While testing iris-vector-rag 0.5.2 improvements, we discovered **3 NEW critical bugs** that affect AWS deployments: 1. ❌ **Connection utility ignores ConfigurationManager** (uses legacy env vars only) 2. ❌ **SchemaManager dot/colon notation mismatch** (can't read config) 3. ❌ **Class-level caching breaks config reloading** **Good news**: ConfigurationManager improvements ARE working! Tests 1-3 and 6 passed. **Bad news**: ConnectionManager and SchemaManager have integration bugs. --- ## Test Results Summary | Test | Status | Issue | |------|--------|-------| | 1. ConfigurationManager | ✅ PASS | Works perfectly! | | 2. Environment Variables | ✅ PASS | RAG_* prefix working! | | 3. ConnectionManager | ✅ PASS | (with workaround) | | 4. IRISVectorStore | ❌ FAIL | SchemaManager bug | | 5. SchemaManager | ❌ FAIL | SchemaManager bug | | 6. Document Model | ✅ PASS | Correct API usage | **Result**: 4/6 tests passed, 2 fail due to SchemaManager bug --- ## Issue #1: Connection Utility Ignores ConfigurationManager ### Problem The `get_iris_dbapi_connection()` function in `common/iris_dbapi_connector.py` **does not accept any parameters** and **ignores ConfigurationManager settings**. **Code Analysis** (lines 151-191): ```python def get_iris_dbapi_connection(): """ Establishes a connection to InterSystems IRIS using DBAPI. Reads connection parameters from environment variables: - IRIS_HOST - IRIS_PORT - IRIS_NAMESPACE - IRIS_USER - IRIS_PASSWORD """ # Get connection parameters from environment with auto-detection fallback host = os.environ.get("IRIS_HOST", "localhost") port_env = os.environ.get("IRIS_PORT") # ... only reads from IRIS_* env vars, NOT from ConfigurationManager! ``` **ConnectionManager calls this** (line 158): ```python connection = get_iris_dbapi_connection() # No parameters passed! ``` ### Impact - ConfigurationManager settings are **completely ignored** for connections - Must use **legacy environment variables** (`IRIS_*`) instead of new `RAG_*` variables - Cannot use YAML configuration for database connection - Breaks the whole point of having ConfigurationManager! ### Workaround ```bash # Must set legacy env vars manually export IRIS_HOST="3.84.250.46" export IRIS_PORT="1972" export IRIS_NAMESPACE="%SYS" export IRIS_USER="_SYSTEM" export IRIS_PASSWORD="SYS" ``` ### Recommendation **Fix**: Modify `get_iris_dbapi_connection()` to accept ConfigurationManager parameters: ```python def get_iris_dbapi_connection(config_manager=None): """Establishes connection using ConfigurationManager settings.""" if config_manager: host = config_manager.get("database:iris:host", "localhost") port = config_manager.get("database:iris:port", 1972) # ... use config_manager instead of os.environ else: # Fallback to env vars for backward compatibility host = os.environ.get("IRIS_HOST", "localhost") # ... ``` --- ## Issue #2: SchemaManager Dot/Colon Notation Mismatch ### Problem **SchemaManager uses DOT notation** but **ConfigurationManager.get() uses COLON notation**. **SchemaManager code** (`storage/schema_manager.py` lines 48-52): ```python self.base_embedding_model = self.config_manager.get( "embedding_model.name", "sentence-transformers/all-MiniLM-L6-v2" ) self.base_embedding_dimension = self.config_manager.get( "embedding_model.dimension", 384 # Always returns 384! ) ``` **ConfigurationManager.get()** (`config/manager.py` lines 150-171): ```python def get(self, key_string: str, default: Optional[Any] = None) -> Any: """ Retrieves a configuration setting. Keys can be nested using a colon delimiter (e.g., "database:iris:host"). """ keys = [k.lower() for k in key_string.split(":")] # Splits on COLON! # ... ``` ### What Happens When SchemaManager calls `config.get("embedding_model.dimension", 384)`: 1. ConfigurationManager splits on `:` → `["embedding_model.dimension"]` (single element!) 2. Tries to access `config["embedding_model.dimension"]` (single key with dot) 3. Key doesn't exist → returns default `384` 4. **ALWAYS returns 384** regardless of config file or environment variables! ### Impact - **Cannot configure vector dimensions** via SchemaManager - **ALWAYS uses 384-dim vectors** (default) - NVIDIA NIM 1024-dim vectors **impossible to use** - Makes the "configurable dimensions" feature **completely broken** for SchemaManager ### Evidence **Test Output**: ``` Test 1: ConfigurationManager ✅ Embedding Model Dimension: 1024 (SchemaManager uses this) Test 4: IRISVectorStore Vector Dimension: 384 ← Should be 1024! Test 5: SchemaManager ✅ Vector dimension from config: 384 ← Should be 1024! ``` ConfigurationManager correctly loads 1024, but SchemaManager gets 384! ### Workaround **None that works**. Environment variables don't help because: - `RAG_EMBEDDING_MODEL__DIMENSION=1024` → `config['embedding_model']['dimension'] = 1024` - SchemaManager calls `get("embedding_model.dimension")` → splits on `:` → looks for `config["embedding_model.dimension"]` - Doesn't match! ### Recommendation **Fix Option 1**: Use `get_nested()` method (already exists!): ```python # In SchemaManager.__init__ self.base_embedding_dimension = self.config_manager.get_nested( "embedding_model.dimension", 384 # Use get_nested() instead of get() ) ``` **Fix Option 2**: Make `get()` handle both notations: ```python def get(self, key_string: str, default: Optional[Any] = None) -> Any: # Try colon notation first if ":" in key_string: keys = [k.lower() for k in key_string.split(":")] # Try dot notation as fallback elif "." in key_string: keys = [k.lower() for k in key_string.split(".")] else: keys = [key_string.lower()] # ... rest of logic ``` --- ## Issue #3: Class-Level Caching Breaks Config Reloading ### Problem SchemaManager has **class-level caching** that prevents configuration reloading. **Code** (`storage/schema_manager.py` lines 29-32, 42-57): ```python class SchemaManager: # CLASS-LEVEL CACHING (shared across all instances for performance) _schema_validation_cache = {} _config_loaded = False # ← Shared across ALL instances! _tables_validated = set() def __init__(self, connection_manager, config_manager): # Load configuration only if not already loaded if not SchemaManager._config_loaded: self._load_and_validate_config() SchemaManager._config_loaded = True else: # Use cached config from previous instance self.base_embedding_dimension = self.config_manager.get( "embedding_model.dimension", 384 ) ``` ### Impact - First SchemaManager instance loads config - All subsequent instances use **cached values** - Changing ConfigurationManager has **no effect** - Cannot test with different configurations - Makes unit testing **very difficult** ### Workaround ```python # Must manually reset class-level cache before each test from iris_vector_rag.storage.schema_manager import SchemaManager SchemaManager._config_loaded = False SchemaManager._schema_validation_cache = {} SchemaManager._tables_validated = set() ``` ### Recommendation **Fix**: Remove class-level caching or make it instance-level: ```python class SchemaManager: def __init__(self, connection_manager, config_manager): self.connection_manager = connection_manager self.config_manager = config_manager # Instance-level cache (not class-level) self._dimension_cache = {} # ALWAYS load config from config_manager (no caching) self._load_and_validate_config() ``` --- ## Positive Findings ### ✅ ConfigurationManager Works Great! **What Works**: - ✅ YAML configuration loading - ✅ Environment variable overrides with `RAG_` prefix - ✅ Nested key access with `__` delimiter - ✅ Type casting (string → int/float/bool) - ✅ Default values **Test Evidence**: ``` Test 1: ConfigurationManager ✅ Config loaded successfully: Host: 3.84.250.46 Port: 1972 Namespace: %SYS Embedding Model Dimension: 1024 Test 2: Environment Variable Overrides ✅ Environment variable override working: Config file: 3.84.250.46 Env var: test-override.example.com Actual: test-override.example.com ``` ### ✅ Document Model API Clear **What Works**: - ✅ Correct parameter names: `page_content`, `id`, `metadata` - ✅ Embeddings stored separately (not in Document) - ✅ Clean API design --- ## Summary: Original Pain Points vs New Issues ### Original Pain Points (RESOLVED) ✅ 1. ✅ **Hardcoded settings** → ConfigurationManager added 2. ✅ **Inflexible dimensions** → Configurable (but SchemaManager can't use it) 3. ✅ **No config manager** → ConfigurationManager working great ### New Issues (FOUND) ❌ 1. ❌ **ConnectionManager ignores config** → Uses legacy env vars only 2. ❌ **SchemaManager dot/colon mismatch** → Can't read config 3. ❌ **Class-level caching** → Prevents config reloading --- ## Recommendations for iris-vector-rag Team ### Priority 1: Fix SchemaManager Configuration **Impact**: HIGH - Makes "configurable dimensions" feature unusable **Fix**: ```python # In SchemaManager._load_and_validate_config() self.base_embedding_dimension = self.config_manager.get_nested( "embedding_model.dimension", 384 # Use get_nested() instead of get() ) ``` ### Priority 2: Fix ConnectionManager Integration **Impact**: HIGH - ConfigurationManager is pointless without this **Fix**: ```python # In ConnectionManager.get_connection() from iris_vector_rag.common.iris_dbapi_connector import get_iris_dbapi_connection # Pass config_manager settings to connection utility db_config = self.config_manager.get("database:iris", {}) connection = get_iris_dbapi_connection( host=db_config.get("host"), port=db_config.get("port"), namespace=db_config.get("namespace"), username=db_config.get("username"), password=db_config.get("password") ) ``` ### Priority 3: Remove Class-Level Caching **Impact**: MEDIUM - Makes testing difficult **Fix**: Move caching to instance level or remove entirely. --- ## Test Script See `scripts/aws/test-iris-vector-rag-aws.py` for complete test suite that demonstrates all issues. **Run tests**: ```bash python3 scripts/aws/test-iris-vector-rag-aws.py ``` **Expected Results** (with current bugs): - 4/6 tests pass - Tests 4-5 fail due to SchemaManager bug --- ## Conclusion **Good News**: - ConfigurationManager improvements are **excellent**! - Environment variable support **works perfectly** - Core concepts are **solid** **Bad News**: - Integration between components is **broken** - ConnectionManager and SchemaManager **don't use ConfigurationManager** - Makes the improvements **unusable in practice** **Recommendation**: - Fix the dot/colon notation mismatch (1 line change!) - Integrate ConnectionManager with ConfigurationManager - These are **easy fixes** that would make iris-vector-rag **production-ready** --- **Status**: ✅ Issues documented and reproducible **Next Steps**: Share with iris-vector-rag team for fixes **Test Suite**: `scripts/aws/test-iris-vector-rag-aws.py`

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/isc-tdyar/medical-graphrag-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server