Pomera AI Commander

Overview Schema Related Servers Score Discussions

SMART_DIFF_VALIDATION_ANALYSIS.md•10.4 KiB

# Format Validation & Comment Handling Analysis ## Current Implementation Status ### 1. Format Pre-Validation #### Current Behavior **Location**: [`FormatParser.parse()`](file:///p:/Pomera-AI-Commander/core/semantic_diff.py#L95-L150) **Validation Strategy**: **Parse-time validation** (fail-fast) Each format parser attempts to parse the text and raises `ValueError` if parsing fails: ```python if format == 'json': try: return json.loads(text) except json.JSONDecodeError as e: raise ValueError(f"Invalid JSON: {str(e)}") elif format == 'yaml': try: return yaml.safe_load(text) or {} except yaml.YAMLError as e: raise ValueError(f"Invalid YAML: {str(e)}") elif format == 'toml': try: import tomli return tomli.loads(text) except ImportError: raise ValueError("TOML support requires 'tomli' package") except Exception as e: raise ValueError(f"Invalid TOML: {str(e)}") ``` **ENV format**: No strict validation - skips malformed lines silently ```python elif format == 'env': result = {} for line in text.strip().split('\\n'): line = line.strip() if line and not line.startswith('#'): if '=' in line: # Only processes lines with '=' key, value = line.split('=', 1) result[key.strip()] = value.strip() return result ``` #### Issues Identified **❌ No Pre-Validation**: - Validation only happens during parse attempt - No upfront format compatibility check - Users don't get early feedback if format is wrong **❌ ENV Format Too Permissive**: - Silently skips malformed lines (no '=' sign) - Could lead to unexpected missing data in diff - No warning when lines are ignored **❌ No Format Auto-Detection Confidence**: - `detect_format()` makes best guess - No confidence score returned - Could misidentify format (e.g., JSON in YAML) --- ### 2. Comment Handling #### Current Behavior by Format **JSON**: - ❌ **No comment support** - Standard JSON doesn't support comments - ❌ **JSON with comments fails**: `// comment` or `/* comment */` causes `JSONDecodeError` - 💡 **Workaround**: Could use `json5` library for JSON5 format **YAML**: - ✅ **Comments are IGNORED** - `yaml.safe_load()` strips comments automatically - ✅ **Before**: `key: value # This is a comment` - ✅ **After**: `key: value # Different comment` - ✅ **Result**: No difference detected (comments ignored correctly) **ENV**: - ✅ **Comment lines IGNORED** - Lines starting with `#` are skipped - ✅ **Inline comments NOT supported**: - `KEY=value # comment` → Value becomes `"value # comment"` (comment included!) - This is actually **correct** for `.env` format spec **TOML**: - ✅ **Comments are IGNORED** - `tomli` strips comments automatically - ✅ **Both line and inline comments** (`# comment`) #### Comment Handling Matrix | Format | Line Comments | Inline Comments | Diff Behavior | |--------|---------------|-----------------|---------------| | **JSON** | ❌ Not supported | ❌ Not supported | N/A - causes parse error | | **YAML** | ✅ Ignored | ✅ Ignored | ✅ Comment changes ignored correctly | | **ENV** | ✅ Ignored (line start) | ❌ No (becomes part of value) | ✅ Correct per spec | | **TOML** | ✅ Ignored | ✅ Ignored | ✅ Comment changes ignored correctly | --- ## Recommendations ### 1. Add Pre-Validation Function **Purpose**: Validate format compliance BEFORE attempting diff ```python @staticmethod def validate_format(text: str, format: str) -> Dict[str, Any]: """ Validate text conforms to specified format. Returns: { 'valid': bool, 'error': str or None, 'warnings': List[str], # e.g., "line 5 skipped (no '=')" 'line_number': int or None # where error occurred } """ if format == 'auto': format = FormatParser.detect_format(text) try: # Attempt parse FormatParser.parse(text, format) return { 'valid': True, 'error': None, 'warnings': [], 'detected_format': format } except ValueError as e: return { 'valid': False, 'error': str(e), 'warnings': [], 'detected_format': format, 'line_number': _extract_line_number(e) # from error message } ``` **Usage**: ```python # Before diff validation = FormatParser.validate_format(before_text, 'json') if not validation['valid']: return SmartDiffResult( success=False, format='json', error=f"Invalid JSON: {validation['error']}" ) ``` ### 2. Improve ENV Format Parsing **Add warning collection**: ```python elif format == 'env': result = {} warnings = [] for line_num, line in enumerate(text.strip().split('\\n'), 1): line = line.strip() if not line or line.startswith('#'): continue # Skip empty and comment lines if '=' not in line: warnings.append(f"Line {line_num} skipped: missing '=' delimiter") continue key, value = line.split('=', 1) result[key.strip()] = value.strip() # Store warnings for user feedback return result, warnings ``` ### 3. Add Format Detection Confidence **Enhance auto-detection**: ```python @staticmethod def detect_format_with_confidence(text: str) -> Tuple[str, float]: """ Detect format and return confidence score. Returns: (format_name, confidence) # confidence: 0.0 to 1.0 """ scores = { 'json': 0.0, 'yaml': 0.0, 'env': 0.0, 'toml': 0.0 } # JSON indicators if text.strip().startswith(('{', '[')): scores['json'] += 0.5 try: json.loads(text) scores['json'] += 0.5 # Perfect parse except: scores['json'] = max(0.3, scores['json']) # ENV indicators if '=' in text and not text.strip().startswith(('{', '[')): scores['env'] += 0.3 if all(line.strip().startswith('#') or '=' in line or not line.strip() for line in text.split('\\n')): scores['env'] += 0.4 # Return format with highest score best_format = max(scores, key=scores.get) return best_format, scores[best_format] ``` ### 4. Support JSON5 for Comments (Optional) **Add JSON5 support**: ```python elif format == 'json5': try: import json5 # pip install json5 return json5.loads(text) except ImportError: raise ValueError("JSON5 support requires 'json5' package") except Exception as e: raise ValueError(f"Invalid JSON5: {str(e)}") ``` This would allow: ```json5 { "name": "value", // This is a comment /* Multi-line comment */ "other": "data" } ``` ### 5. Add Comment Preservation Option (Advanced) **For formats that support comments** (YAML, TOML): ```python def compare_2way(..., options): preserve_comments = options.get('preserve_comments', False) if preserve_comments and format in ['yaml', 'toml']: # Use ruamel.yaml instead of PyYAML # Use tomlkit instead of tomli # These libraries preserve comments and formatting ``` --- ## Implementation Priority ### High Priority (Recommended) 1. ✅ **Pre-validation function** - Prevents cryptic parse errors 2. ✅ **ENV format warnings** - Alerts users to skipped lines 3. ✅ **Better error messages** - Include line numbers ### Medium Priority 4. ⚠️ **Format detection confidence** - Helps users choose correct format 5. ⚠️ **JSON5 support** - If users need JSON with comments ### Low Priority (Nice to Have) 6. ℹ️ **Comment preservation** - Advanced feature for specific use cases --- ## Current Comment Handling: Summary **✅ Working Correctly**: - YAML comments ignored (semantic diff behavior) - TOML comments ignored (semantic diff behavior) - ENV comment lines ignored **⚠️ Known Limitations**: - JSON doesn't support comments (standard JSON spec) - ENV inline comments become part of value (correct per spec) - No pre-validation before parse attempt **❌ Needs Improvement**: - ENV format silently skips malformed lines (should warn) - No validation feedback before attempting diff - Format auto-detection has no confidence score --- ## Testing Recommendations ### Test Cases to Add **Format Validation Tests**: ```python def test_invalid_json(): result = engine.compare_2way( '{"invalid": }', # Syntax error '{"valid": "json"}', 'json' ) assert not result.success assert "Invalid JSON" in result.error def test_env_malformed_warning(): # Should warn about line without '=' result = engine.compare_2way( 'VALID=value\\nINVALID LINE\\nALSO_VALID=val', 'VALID=value', 'env' ) # Should include warning about skipped line ``` **Comment Handling Tests**: ```python def test_yaml_comments_ignored(): before = "key: value # Comment A" after = "key: value # Comment B" result = engine.compare_2way(before, after, 'yaml') assert len(result.changes) == 0 # Comments ignored def test_env_inline_comment_is_value(): # ENV spec: inline comments are part of value before = "KEY=value # comment" after = "KEY=value" result = engine.compare_2way(before, after, 'env') assert len(result.changes) == 1 # Value changed ``` --- ## Example: Enhanced Validation ```python # User code result = engine.compare_2way( '{"broken": }', # Invalid JSON '{"valid": "json"}', 'json', {'mode': 'semantic'} ) # Current behavior: # - Crashes with ValueError during parse # - User gets: "Invalid JSON: Expecting value: line 1 column 12 (char 11)" # Recommended behavior: # - Pre-validate both inputs # - Return structured error: { 'success': False, 'format': 'json', 'error': 'Invalid JSON in "before" content', 'details': { 'line': 1, 'column': 12, 'message': 'Expecting value after ":"' } } ``` --- ## Conclusion **Current State**: - ✅ Comment handling works correctly for supported formats - ⚠️ No pre-validation (parse-time errors only) - ⚠️ ENV format too lenient (silent failures) - ❌ No JSON comment support (by spec) **Recommended Improvements** (in order): 1. Add `validate_format()` function for pre-validation 2. Enhance ENV parser to collect warnings 3. Improve error messages with line numbers 4. Consider JSON5 support for JSON with comments Would you like me to implement any of these improvements?

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/matbanik/Pomera-AI-Commander'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

SMART_DIFF_VALIDATION_ANALYSIS.md•10.4 KiB