JakartaMigration

Overview Schema Related Servers Score Discussions

SOURCE_CODE_SCANNING_RESEARCH.md•10.5 KiB

# Source Code Scanning Research - Implementation Options ## Executive Summary After researching existing solutions, we have **three viable approaches** for implementing source code scanning: 1. **Use OpenRewrite JavaParser** (Recommended) - Already in dependencies, fast, AST-based 2. **Integrate Code-Index-MCP** - External MCP server for code indexing 3. **Hybrid Approach** - Fast regex pre-filter + OpenRewrite for detailed analysis **Recommendation**: Use **OpenRewrite JavaParser** - it's already in our dependencies, provides AST-based parsing, and is designed for code analysis. --- ## Option 1: OpenRewrite JavaParser ⭐ RECOMMENDED ### Current Status - ✅ **Already in dependencies**: `org.openrewrite:rewrite-java` - ✅ **Already used for refactoring**: Project uses OpenRewrite for Jakarta migration - ✅ **AST-based parsing**: Accurate, handles edge cases - ✅ **Well-maintained**: Active project, widely used ### Performance Characteristics - **Speed**: Fast for typical codebases (parses ~1000 files/second) - **Memory**: Moderate (~50-100MB for large projects) - **I/O**: Can be made non-blocking with parallel streams - **Scalability**: Handles large codebases well ### Implementation Approach ```java import org.openrewrite.java.JavaParser; import org.openrewrite.java.tree.J; import org.openrewrite.java.tree.J.CompilationUnit; public class SourceCodeScanner { private final JavaParser javaParser; public SourceCodeScanner() { this.javaParser = JavaParser.fromJavaVersion() .build(); } public SourceCodeAnalysisResult scanProject(Path projectPath) { List<FileUsage> usages = new ArrayList<>(); // Parallel file discovery List<Path> javaFiles = discoverJavaFiles(projectPath) .parallelStream() .collect(Collectors.toList()); // Parallel parsing (non-blocking) javaFiles.parallelStream() .forEach(file -> { try { FileUsage usage = scanFile(file); if (usage.hasJavaxUsage()) { synchronized(usages) { usages.add(usage); } } } catch (Exception e) { log.warn("Failed to scan file: {}", file, e); } }); return new SourceCodeAnalysisResult(usages); } private FileUsage scanFile(Path file) { try { String content = Files.readString(file); CompilationUnit cu = javaParser.parse(content).get(0); List<ImportStatement> imports = extractJavaxImports(cu); return new FileUsage(file, imports, countLines(content)); } catch (Exception e) { log.error("Error scanning file: {}", file, e); return new FileUsage(file, List.of(), 0); } } private List<ImportStatement> extractJavaxImports(CompilationUnit cu) { List<ImportStatement> imports = new ArrayList<>(); cu.getImports().forEach(imp -> { String importName = imp.getQualid().toString(); if (importName.startsWith("javax.")) { String jakartaEquivalent = importName.replace("javax.", "jakarta."); imports.add(new ImportStatement( importName, jakartaEquivalent, imp.getCoordinates().getLineNumber() )); } }); return imports; } } ``` ### Pros - ✅ Already in dependencies (no new dependencies) - ✅ AST-based (accurate, handles edge cases) - ✅ Fast parsing - ✅ Can be parallelized - ✅ Consistent with existing refactoring code ### Cons - ⚠️ Moderate memory usage (acceptable for our use case) - ⚠️ Requires parsing entire file (but fast) ### Performance Optimization Tips 1. **Parallel Processing**: Use `parallelStream()` for file discovery and parsing 2. **Early Filtering**: Skip files in `target/`, `build/`, `.git/` directories 3. **Caching**: Cache parsed ASTs if scanning multiple times 4. **Batch Processing**: Process files in batches to control memory --- ## Option 2: Code-Index-MCP Integration ### Overview - **External MCP Server**: Separate service for code indexing - **SCIP-based**: Uses Source Code Indexing Protocol - **Multi-language**: Supports Java, Python, JavaScript, etc. - **Tree-Sitter**: Uses Tree-Sitter for parsing ### Integration Approach 1. Install Code-Index-MCP as separate MCP server 2. Configure in MCP client (Cursor) 3. Use MCP tools from Code-Index-MCP for indexing 4. Our Jakarta Migration MCP queries Code-Index-MCP for source info ### Pros - ✅ Specialized for code indexing - ✅ Multi-language support - ✅ Can be shared across multiple MCP servers - ✅ Optimized for large codebases ### Cons - ❌ Requires separate MCP server installation - ❌ Additional dependency/configuration - ❌ Network overhead (MCP-to-MCP communication) - ❌ Less control over scanning logic - ❌ May not be necessary for our use case ### When to Use - If we need multi-language support - If we want to share indexing across multiple tools - If codebase is extremely large (>100k files) ### Recommendation **Not recommended** for initial implementation - adds complexity without clear benefit for Java-only scanning. --- ## Option 3: Hybrid Approach (Fast Pre-filter + OpenRewrite) ### Strategy 1. **Fast Regex Pre-filter**: Quickly identify files with `javax.*` imports 2. **OpenRewrite Deep Analysis**: Only parse files that have javax usage ### Implementation ```java public class SourceCodeScanner { private static final Pattern JAVAX_IMPORT_PATTERN = Pattern.compile("^import\\s+(javax\\.\\w+.*);", Pattern.MULTILINE); public SourceCodeAnalysisResult scanProject(Path projectPath) { // Step 1: Fast regex pre-filter List<Path> candidateFiles = discoverJavaFiles(projectPath) .parallelStream() .filter(file -> hasJavaxImports(file)) // Quick regex check .collect(Collectors.toList()); // Step 2: Deep AST analysis only on candidates List<FileUsage> usages = candidateFiles.parallelStream() .map(this::scanFileWithAST) .filter(FileUsage::hasJavaxUsage) .collect(Collectors.toList()); return new SourceCodeAnalysisResult(usages); } private boolean hasJavaxImports(Path file) { try { String content = Files.readString(file); return JAVAX_IMPORT_PATTERN.matcher(content).find(); } catch (IOException e) { return false; } } private FileUsage scanFileWithAST(Path file) { // Use OpenRewrite for detailed analysis // ... } } ``` ### Pros - ✅ Fastest overall (skips parsing files without javax) - ✅ Lower memory usage (only parses relevant files) - ✅ Best of both worlds ### Cons - ⚠️ More complex implementation - ⚠️ Regex pre-filter may miss some cases (but OpenRewrite catches them) ### Recommendation **Good optimization** for very large codebases, but may be overkill initially. --- ## Option 4: Simple Regex (Quick Start) ### Overview - Simple regex-based scanning - No AST parsing - Fastest to implement ### Implementation ```java public class SourceCodeScanner { private static final Pattern JAVAX_IMPORT_PATTERN = Pattern.compile("^import\\s+(javax\\.\\w+.*);", Pattern.MULTILINE); public SourceCodeAnalysisResult scanProject(Path projectPath) { List<FileUsage> usages = discoverJavaFiles(projectPath) .parallelStream() .map(this::scanFile) .filter(FileUsage::hasJavaxUsage) .collect(Collectors.toList()); return new SourceCodeAnalysisResult(usages); } private FileUsage scanFile(Path file) { try { String content = Files.readString(file); List<ImportStatement> imports = extractJavaxImports(content); return new FileUsage(file, imports, countLines(content)); } catch (IOException e) { return new FileUsage(file, List.of(), 0); } } } ``` ### Pros - ✅ Simplest implementation - ✅ Very fast - ✅ Low memory ### Cons - ❌ Less accurate (may miss edge cases) - ❌ Doesn't handle comments, string literals well - ❌ Not consistent with OpenRewrite approach ### Recommendation **Not recommended** - we already have OpenRewrite, should use it for consistency. --- ## Performance Comparison | Approach | Speed | Memory | Accuracy | Complexity | |----------|-------|--------|----------|------------| | OpenRewrite | Fast | Moderate | High | Low | | Code-Index-MCP | Fast | Low | High | High | | Hybrid | Very Fast | Low | High | Medium | | Simple Regex | Very Fast | Low | Low | Low | --- ## Final Recommendation ### Primary Approach: OpenRewrite JavaParser **Why**: 1. ✅ Already in dependencies 2. ✅ AST-based (accurate) 3. ✅ Consistent with existing code 4. ✅ Fast enough for our needs 5. ✅ Can be parallelized **Implementation Plan**: 1. Create `SourceCodeScanner` using OpenRewrite JavaParser 2. Use parallel streams for file discovery and parsing 3. Filter out build directories (`target/`, `build/`, `.git/`) 4. Add caching if needed for repeated scans **Optimization (if needed)**: - Add regex pre-filter for very large codebases - Implement incremental scanning (only scan changed files) ### Future Enhancement: Hybrid Approach If performance becomes an issue with very large codebases: - Add regex pre-filter - Only parse files with javax imports using OpenRewrite --- ## Implementation Steps 1. **Create Domain Classes** (1 hour) - `SourceCodeAnalysisResult` - `FileUsage` - `ImportStatement` 2. **Implement SourceCodeScanner** (4-6 hours) - Use OpenRewrite JavaParser - Parallel file discovery - Parallel parsing - Filter build directories 3. **Add MCP Tool** (1 hour) - `scanSourceCode` tool - JSON response formatting 4. **Add Tests** (2-3 hours) - Unit tests for scanner - Integration tests with sample projects 5. **Performance Testing** (2 hours) - Test with large codebase - Optimize if needed **Total Estimated Time**: 10-14 hours (1.5-2 days) --- ## References - [OpenRewrite Documentation](https://docs.openrewrite.org/) - [Code-Index-MCP GitHub](https://github.com/ViperJuice/Code-Index-MCP) - [SCIP Protocol](https://github.com/sourcegraph/scip) - [Java NIO Performance](https://docs.oracle.com/javase/tutorial/essential/io/fileio.html)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/adrianmikula/JakartaMigration'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

SOURCE_CODE_SCANNING_RESEARCH.md•10.5 KiB