Crawl4AI+SearXNG MCP Server

MULTI_LANGUAGE_PARSING.md•13.8 KiB

# Multi-Language Repository Parsing

The Crawl4AI MCP server now supports comprehensive multi-language repository parsing, enabling developers to analyze codebases written in multiple programming languages simultaneously. This feature provides intelligent code structure extraction, cross-language code search, and enhanced AI hallucination detection across diverse technology stacks.

## Overview

The multi-language parsing system extends beyond Python to support modern polyglot development environments. It can analyze repositories containing code in Python, JavaScript, TypeScript, Go, and other languages, creating a unified knowledge graph that enables powerful cross-language analysis and validation.

### Key Benefits

- **Polyglot Analysis**: Parse repositories containing multiple programming languages
- **Unified Knowledge Graph**: Store code structure from all languages in a single Neo4j graph
- **Cross-Language Code Search**: Find similar patterns across different languages
- **Enhanced AI Validation**: Detect hallucinations in multi-language AI-generated code
- **Repository Size Safety**: Built-in validation to prevent resource exhaustion
- **Performance Optimization**: Batched processing for large repositories

## Supported Languages

### Currently Supported

| Language | File Extensions | Features |
|----------|----------------|----------|
| **Python** | `.py` | Classes, functions, methods, imports, docstrings |
| **JavaScript** | `.js`, `.jsx`, `.mjs`, `.cjs` | Classes, functions, ES6+ features, imports/exports |
| **TypeScript** | `.ts`, `.tsx` | Interfaces, types, enums, generics |
| **Go** | `.go` | Structs, interfaces, functions, methods, packages |

### Language-Specific Features

#### Python

- Class definitions with inheritance
- Function and method signatures
- Import statements and dependencies
- Docstring extraction
- Decorator support

#### JavaScript/TypeScript

- ES6 classes and methods
- Arrow functions and generators
- Import/export statements (ES6 and CommonJS)
- TypeScript interfaces and type definitions
- React component detection
- JSDoc comment extraction

#### Go

- Struct definitions and fields
- Interface specifications
- Methods with receivers
- Package management
- Exported symbol detection

## Tools and Commands

### Repository Parsing Tools

#### `parse_github_repository`

Parse a remote GitHub repository with multi-language support.

```json
{
  "tool": "parse_github_repository",
  "arguments": {
    "repo_url": "https://github.com/username/multi-lang-project"
  }
}
```

**Features:**

- Automatic language detection based on file extensions
- Repository size validation (default 500MB limit)
- Batch processing for large repositories
- Git metadata extraction (branches, tags, commits)

#### `parse_local_repository`

Parse a local Git repository directly from the filesystem.

```json
{
  "tool": "parse_local_repository", 
  "arguments": {
    "local_path": "/home/user/projects/my-repo"
  }
}
```

**Security Features:**

- Path validation restricted to safe directories
- Git repository verification
- Sandboxed execution

#### `parse_repository_branch`

Parse a specific branch of a repository for version-specific analysis.

```json
{
  "tool": "parse_repository_branch",
  "arguments": {
    "repo_url": "https://github.com/username/project",
    "branch": "feature/new-api"
  }
}
```

### Analysis and Search Tools

#### `analyze_code_cross_language`

Perform semantic search across multiple programming languages.

```json
{
  "tool": "analyze_code_cross_language",
  "arguments": {
    "query": "authentication middleware",
    "languages": ["python", "javascript", "go"],
    "match_count": 10
  }
}
```

**Use Cases:**

- Find similar patterns across languages
- Compare implementation approaches
- Discover code reuse opportunities
- Understand architectural patterns

#### `query_knowledge_graph`

Explore the multi-language knowledge graph with Cypher queries.

```json
{
  "tool": "query_knowledge_graph",
  "arguments": {
    "command": "classes python-api"
  }
}
```

**Available Commands:**

- `repos` - List all parsed repositories
- `classes <repo_name>` - List classes in a repository
- `method <method_name>` - Search for methods across languages
- `query <cypher>` - Execute custom Cypher queries

## Configuration Options

### Repository Size Limits

Control resource usage with configurable size limits:

```bash
# Maximum repository size in MB (default: 500)
export REPO_MAX_SIZE_MB=1000

# Maximum number of files (default: 10000)
export REPO_MAX_FILE_COUNT=15000

# Minimum free disk space in GB (default: 1.0)
export REPO_MIN_FREE_SPACE_GB=2.0
```

### Neo4j Batch Processing

Optimize performance for large repositories:

```bash
# Batch size for Neo4j operations (default: 50)
export NEO4J_BATCH_SIZE=100

# Batch timeout in seconds (default: 120)
export NEO4J_BATCH_TIMEOUT=180
```

### Language Detection

The system automatically detects languages based on file extensions:

```python
# Language mapping (internal configuration)
LANGUAGE_MAP = {
    ".py": "Python",
    ".js": "JavaScript", 
    ".ts": "TypeScript",
    ".jsx": "JavaScript",
    ".tsx": "TypeScript", 
    ".go": "Go"
}
```

## Usage Examples

### Example 1: Analyzing a Full-Stack Repository

Parse a repository containing frontend JavaScript, backend Python, and microservices in Go:

```json
{
  "tool": "parse_github_repository",
  "arguments": {
    "repo_url": "https://github.com/company/full-stack-app"
  }
}
```

Expected output structure:

```json
{
  "success": true,
  "repository_name": "full-stack-app",
  "languages_detected": ["Python", "JavaScript", "TypeScript", "Go"],
  "statistics": {
    "total_files": 247,
    "python_files": 89,
    "javascript_files": 134,
    "go_files": 24,
    "classes_created": 45,
    "methods_created": 312,
    "functions_created": 128
  },
  "processing_summary": {
    "batch_count": 5,
    "processing_time_seconds": 42,
    "memory_usage_mb": 156
  }
}
```

### Example 2: Cross-Language Code Search

Find authentication patterns across your entire stack:

```json
{
  "tool": "analyze_code_cross_language",
  "arguments": {
    "query": "JWT token validation middleware",
    "languages": ["python", "javascript", "go"],
    "match_count": 5,
    "include_file_context": true
  }
}
```

Expected response:

```json
{
  "success": true,
  "query": "JWT token validation middleware",
  "results_by_language": {
    "python": [
      {
        "content": "def verify_jwt_token(token):\n    try:\n        payload = jwt.decode(token, SECRET_KEY)\n        return payload\n    except jwt.ExpiredSignatureError:\n        raise AuthenticationError('Token expired')",
        "similarity_score": 0.89,
        "source": "auth-service",
        "file_context": {
          "url": "neo4j://repository/auth-service/function/verify_jwt_token",
          "metadata": {"language": "Python", "file_path": "auth/validators.py"},
          "language": "python"
        }
      }
    ],
    "javascript": [
      {
        "content": "const validateJWT = (token) => {\n  try {\n    const decoded = jwt.verify(token, process.env.JWT_SECRET);\n    return decoded;\n  } catch (error) {\n    throw new Error('Invalid token');\n  }\n}",
        "similarity_score": 0.85,
        "source": "frontend-app",
        "file_context": {
          "url": "neo4j://repository/frontend-app/function/validateJWT",
          "metadata": {"language": "JavaScript", "file_path": "middleware/auth.js"},
          "language": "javascript"
        }
      }
    ],
    "go": [
      {
        "content": "func ValidateJWT(tokenString string) (*Claims, error) {\n    token, err := jwt.ParseWithClaims(tokenString, &Claims{}, func(token *jwt.Token) (interface{}, error) {\n        return []byte(secret), nil\n    })\n    if err != nil {\n        return nil, err\n    }\n    return token.Claims.(*Claims), nil\n}",
        "similarity_score": 0.82,
        "source": "api-gateway",
        "file_context": {
          "url": "neo4j://repository/api-gateway/function/ValidateJWT", 
          "metadata": {"language": "Go", "file_path": "auth/middleware.go"},
          "language": "go"
        }
      }
    ]
  },
  "summary": {
    "total_results": 3,
    "languages_found": ["python", "javascript", "go"],
    "average_similarity": 0.853
  }
}
```

### Example 3: Repository Exploration

Explore the structure of a parsed multi-language repository:

```json
{
  "tool": "query_knowledge_graph",
  "arguments": {
    "command": "explore my-project"
  }
}
```

Response includes:

- File count by language
- Class and function distribution
- Import/dependency analysis
- Code complexity metrics

## Performance Considerations

### Repository Size Management

Large repositories are automatically validated before processing:

1. **Size Check**: Repository size estimated before cloning
2. **File Count**: Prevents processing repositories with excessive files
3. **Disk Space**: Ensures sufficient free space (2x repository size)
4. **Memory Usage**: Batch processing prevents memory exhaustion

### Processing Optimization

Multi-language parsing is optimized for performance:

```bash
# Recommended settings for large repositories
export NEO4J_BATCH_SIZE=100
export NEO4J_BATCH_TIMEOUT=300
export REPO_MAX_SIZE_MB=1000
```

### Concurrent Processing

The system uses concurrent processing where possible:

- File analysis runs in parallel
- Database operations are batched
- Network requests are throttled

## Troubleshooting Guide

### Common Issues

#### 1. Repository Too Large

**Error**: `Repository too large: 750.2MB exceeds limit of 500MB`

**Solution**:

```bash
export REPO_MAX_SIZE_MB=1000
# Restart the MCP server
```

#### 2. Insufficient Disk Space

**Error**: `Insufficient disk space: 0.8GB available, 2.0GB required`

**Solution**: Free up disk space or increase available storage

#### 3. Language Not Detected

**Issue**: Files not being analyzed

**Solution**: Check if file extensions are supported:

- Verify file extensions in repository
- Check analyzer factory configuration
- Submit feature request for new languages

#### 4. Neo4j Connection Issues

**Error**: `Repository extractor not available`

**Solution**:

```bash
# Ensure Neo4j is enabled
export USE_KNOWLEDGE_GRAPH=true
# Verify Neo4j connection settings
export NEO4J_URI=bolt://localhost:7687
```

#### 5. Batch Processing Timeouts

**Error**: `Batch processing timeout after 120 seconds`

**Solution**:

```bash
# Increase timeout for large repositories
export NEO4J_BATCH_TIMEOUT=300
```

### Performance Tuning

For optimal performance with multi-language repositories:

1. **Batch Size**: Increase for faster processing of large repos
2. **Concurrent Sessions**: Adjust based on system resources
3. **Memory Limits**: Monitor and adjust Docker memory limits
4. **Disk I/O**: Use SSD storage for better performance

### Debugging Tools

Use these tools to debug parsing issues:

```bash
# Check repository statistics
curl -X POST http://localhost:3000/tools/query_knowledge_graph \
  -d '{"command": "repos"}'

# Analyze specific repository
curl -X POST http://localhost:3000/tools/get_repository_info \
  -d '{"repo_name": "my-project"}'

# Test cross-language search
curl -X POST http://localhost:3000/tools/analyze_code_cross_language \
  -d '{"query": "test query", "languages": ["python", "javascript"]}'
```

## Advanced Features

### Custom Cypher Queries

Execute advanced queries across the multi-language knowledge graph:

```cypher
// Find all methods that call external APIs across languages
MATCH (m:Method)-[:CALLS]->(api:ExternalAPI)
RETURN m.name, m.language, api.endpoint
ORDER BY m.language, m.name
```

### Code Pattern Analysis

Identify common patterns across languages:

```cypher
// Find similar class structures across languages
MATCH (c1:Class), (c2:Class)
WHERE c1.language <> c2.language
  AND c1.name = c2.name
RETURN c1.name, c1.language, c2.language
```

### Dependency Mapping

Analyze dependencies across the entire stack:

```cypher
// Map dependencies across languages
MATCH (r:Repository)-[:CONTAINS]->(f:File)-[:IMPORTS]->(dep:Dependency)
RETURN r.name, f.language, dep.name
ORDER BY r.name, f.language
```

## Future Language Support

The system is designed for extensibility. Future language support includes:

- **Java** - Classes, interfaces, annotations
- **C#** - Classes, properties, LINQ
- **Rust** - Structs, traits, macros
- **C++** - Classes, templates, namespaces
- **PHP** - Classes, traits, namespaces
- **Ruby** - Classes, modules, gems
- **Swift** - Classes, protocols, extensions
- **Kotlin** - Classes, data classes, coroutines

To request support for additional languages, please submit an issue with:

- Language name and common file extensions
- Key language constructs to analyze
- Sample repository for testing
- Use case description

## Best Practices

### Repository Selection

Choose repositories that benefit from multi-language analysis:

- Full-stack applications
- Microservice architectures  
- Monorepos with multiple languages
- API implementations across languages

### Query Optimization

Structure cross-language queries effectively:

- Use specific language filters when possible
- Limit match count for faster responses
- Include file context for better understanding
- Use semantic queries rather than exact matches

### Integration Workflow

Integrate multi-language parsing into your development workflow:

1. **Parse repositories** after major releases
2. **Update local repos** when switching branches
3. **Cross-language search** during code reviews
4. **Validate AI code** before committing
5. **Explore patterns** during architecture decisions

## Conclusion

Multi-language repository parsing enables comprehensive analysis of modern polyglot codebases. By combining structural analysis with semantic search, developers can better understand, validate, and improve their multi-language applications.

The system scales from small projects to enterprise repositories while maintaining performance and accuracy through intelligent batching, size validation, and concurrent processing.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AI-enthusiasts/crawl4ai-rag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

MULTI_LANGUAGE_PARSING.md•13.8 KiB

# Multi-Language Repository Parsing

The Crawl4AI MCP server now supports comprehensive multi-language repository parsing, enabling developers to analyze codebases written in multiple programming languages simultaneously. This feature provides intelligent code structure extraction, cross-language code search, and enhanced AI hallucination detection across diverse technology stacks.

## Overview

The multi-language parsing system extends beyond Python to support modern polyglot development environments. It can analyze repositories containing code in Python, JavaScript, TypeScript, Go, and other languages, creating a unified knowledge graph that enables powerful cross-language analysis and validation.

### Key Benefits

- **Polyglot Analysis**: Parse repositories containing multiple programming languages
- **Unified Knowledge Graph**: Store code structure from all languages in a single Neo4j graph
- **Cross-Language Code Search**: Find similar patterns across different languages
- **Enhanced AI Validation**: Detect hallucinations in multi-language AI-generated code
- **Repository Size Safety**: Built-in validation to prevent resource exhaustion
- **Performance Optimization**: Batched processing for large repositories

## Supported Languages

### Currently Supported

| Language | File Extensions | Features |
|----------|----------------|----------|
| **Python** | `.py` | Classes, functions, methods, imports, docstrings |
| **JavaScript** | `.js`, `.jsx`, `.mjs`, `.cjs` | Classes, functions, ES6+ features, imports/exports |
| **TypeScript** | `.ts`, `.tsx` | Interfaces, types, enums, generics |
| **Go** | `.go` | Structs, interfaces, functions, methods, packages |

### Language-Specific Features

#### Python

- Class definitions with inheritance
- Function and method signatures
- Import statements and dependencies
- Docstring extraction
- Decorator support

#### JavaScript/TypeScript

- ES6 classes and methods
- Arrow functions and generators
- Import/export statements (ES6 and CommonJS)
- TypeScript interfaces and type definitions
- React component detection
- JSDoc comment extraction

#### Go

- Struct definitions and fields
- Interface specifications
- Methods with receivers
- Package management
- Exported symbol detection

## Tools and Commands

### Repository Parsing Tools

#### `parse_github_repository`

Parse a remote GitHub repository with multi-language support.

```json
{
  "tool": "parse_github_repository",
  "arguments": {
    "repo_url": "https://github.com/username/multi-lang-project"
  }
}
```

**Features:**

- Automatic language detection based on file extensions
- Repository size validation (default 500MB limit)
- Batch processing for large repositories
- Git metadata extraction (branches, tags, commits)

#### `parse_local_repository`

Parse a local Git repository directly from the filesystem.

```json
{
  "tool": "parse_local_repository", 
  "arguments": {
    "local_path": "/home/user/projects/my-repo"
  }
}
```

**Security Features:**

- Path validation restricted to safe directories
- Git repository verification
- Sandboxed execution

#### `parse_repository_branch`

Parse a specific branch of a repository for version-specific analysis.

```json
{
  "tool": "parse_repository_branch",
  "arguments": {
    "repo_url": "https://github.com/username/project",
    "branch": "feature/new-api"
  }
}
```

### Analysis and Search Tools

#### `analyze_code_cross_language`

Perform semantic search across multiple programming languages.

```json
{
  "tool": "analyze_code_cross_language",
  "arguments": {
    "query": "authentication middleware",
    "languages": ["python", "javascript", "go"],
    "match_count": 10
  }
}
```

**Use Cases:**

- Find similar patterns across languages
- Compare implementation approaches
- Discover code reuse opportunities
- Understand architectural patterns

#### `query_knowledge_graph`

Explore the multi-language knowledge graph with Cypher queries.

```json
{
  "tool": "query_knowledge_graph",
  "arguments": {
    "command": "classes python-api"
  }
}
```

**Available Commands:**

- `repos` - List all parsed repositories
- `classes <repo_name>` - List classes in a repository
- `method <method_name>` - Search for methods across languages
- `query <cypher>` - Execute custom Cypher queries

## Configuration Options

### Repository Size Limits

Control resource usage with configurable size limits:

```bash
# Maximum repository size in MB (default: 500)
export REPO_MAX_SIZE_MB=1000

# Maximum number of files (default: 10000)
export REPO_MAX_FILE_COUNT=15000

# Minimum free disk space in GB (default: 1.0)
export REPO_MIN_FREE_SPACE_GB=2.0
```

### Neo4j Batch Processing

Optimize performance for large repositories:

```bash
# Batch size for Neo4j operations (default: 50)
export NEO4J_BATCH_SIZE=100

# Batch timeout in seconds (default: 120)
export NEO4J_BATCH_TIMEOUT=180
```

### Language Detection

The system automatically detects languages based on file extensions:

```python
# Language mapping (internal configuration)
LANGUAGE_MAP = {
    ".py": "Python",
    ".js": "JavaScript", 
    ".ts": "TypeScript",
    ".jsx": "JavaScript",
    ".tsx": "TypeScript", 
    ".go": "Go"
}
```

## Usage Examples

### Example 1: Analyzing a Full-Stack Repository

Parse a repository containing frontend JavaScript, backend Python, and microservices in Go:

```json
{
  "tool": "parse_github_repository",
  "arguments": {
    "repo_url": "https://github.com/company/full-stack-app"
  }
}
```

Expected output structure:

```json
{
  "success": true,
  "repository_name": "full-stack-app",
  "languages_detected": ["Python", "JavaScript", "TypeScript", "Go"],
  "statistics": {
    "total_files": 247,
    "python_files": 89,
    "javascript_files": 134,
    "go_files": 24,
    "classes_created": 45,
    "methods_created": 312,
    "functions_created": 128
  },
  "processing_summary": {
    "batch_count": 5,
    "processing_time_seconds": 42,
    "memory_usage_mb": 156
  }
}
```

### Example 2: Cross-Language Code Search

Find authentication patterns across your entire stack:

```json
{
  "tool": "analyze_code_cross_language",
  "arguments": {
    "query": "JWT token validation middleware",
    "languages": ["python", "javascript", "go"],
    "match_count": 5,
    "include_file_context": true
  }
}
```

Expected response:

```json
{
  "success": true,
  "query": "JWT token validation middleware",
  "results_by_language": {
    "python": [
      {
        "content": "def verify_jwt_token(token):\n    try:\n        payload = jwt.decode(token, SECRET_KEY)\n        return payload\n    except jwt.ExpiredSignatureError:\n        raise AuthenticationError('Token expired')",
        "similarity_score": 0.89,
        "source": "auth-service",
        "file_context": {
          "url": "neo4j://repository/auth-service/function/verify_jwt_token",
          "metadata": {"language": "Python", "file_path": "auth/validators.py"},
          "language": "python"
        }
      }
    ],
    "javascript": [
      {
        "content": "const validateJWT = (token) => {\n  try {\n    const decoded = jwt.verify(token, process.env.JWT_SECRET);\n    return decoded;\n  } catch (error) {\n    throw new Error('Invalid token');\n  }\n}",
        "similarity_score": 0.85,
        "source": "frontend-app",
        "file_context": {
          "url": "neo4j://repository/frontend-app/function/validateJWT",
          "metadata": {"language": "JavaScript", "file_path": "middleware/auth.js"},
          "language": "javascript"
        }
      }
    ],
    "go": [
      {
        "content": "func ValidateJWT(tokenString string) (*Claims, error) {\n    token, err := jwt.ParseWithClaims(tokenString, &Claims{}, func(token *jwt.Token) (interface{}, error) {\n        return []byte(secret), nil\n    })\n    if err != nil {\n        return nil, err\n    }\n    return token.Claims.(*Claims), nil\n}",
        "similarity_score": 0.82,
        "source": "api-gateway",
        "file_context": {
          "url": "neo4j://repository/api-gateway/function/ValidateJWT", 
          "metadata": {"language": "Go", "file_path": "auth/middleware.go"},
          "language": "go"
        }
      }
    ]
  },
  "summary": {
    "total_results": 3,
    "languages_found": ["python", "javascript", "go"],
    "average_similarity": 0.853
  }
}
```

### Example 3: Repository Exploration

Explore the structure of a parsed multi-language repository:

```json
{
  "tool": "query_knowledge_graph",
  "arguments": {
    "command": "explore my-project"
  }
}
```

Response includes:

- File count by language
- Class and function distribution
- Import/dependency analysis
- Code complexity metrics

## Performance Considerations

### Repository Size Management

Large repositories are automatically validated before processing:

1. **Size Check**: Repository size estimated before cloning
2. **File Count**: Prevents processing repositories with excessive files
3. **Disk Space**: Ensures sufficient free space (2x repository size)
4. **Memory Usage**: Batch processing prevents memory exhaustion

### Processing Optimization

Multi-language parsing is optimized for performance:

```bash
# Recommended settings for large repositories
export NEO4J_BATCH_SIZE=100
export NEO4J_BATCH_TIMEOUT=300
export REPO_MAX_SIZE_MB=1000
```

### Concurrent Processing

The system uses concurrent processing where possible:

- File analysis runs in parallel
- Database operations are batched
- Network requests are throttled

## Troubleshooting Guide

### Common Issues

#### 1. Repository Too Large

**Error**: `Repository too large: 750.2MB exceeds limit of 500MB`

**Solution**:

```bash
export REPO_MAX_SIZE_MB=1000
# Restart the MCP server
```

#### 2. Insufficient Disk Space

**Error**: `Insufficient disk space: 0.8GB available, 2.0GB required`

**Solution**: Free up disk space or increase available storage

#### 3. Language Not Detected

**Issue**: Files not being analyzed

**Solution**: Check if file extensions are supported:

- Verify file extensions in repository
- Check analyzer factory configuration
- Submit feature request for new languages

#### 4. Neo4j Connection Issues

**Error**: `Repository extractor not available`

**Solution**:

```bash
# Ensure Neo4j is enabled
export USE_KNOWLEDGE_GRAPH=true
# Verify Neo4j connection settings
export NEO4J_URI=bolt://localhost:7687
```

#### 5. Batch Processing Timeouts

**Error**: `Batch processing timeout after 120 seconds`

**Solution**:

```bash
# Increase timeout for large repositories
export NEO4J_BATCH_TIMEOUT=300
```

### Performance Tuning

For optimal performance with multi-language repositories:

1. **Batch Size**: Increase for faster processing of large repos
2. **Concurrent Sessions**: Adjust based on system resources
3. **Memory Limits**: Monitor and adjust Docker memory limits
4. **Disk I/O**: Use SSD storage for better performance

### Debugging Tools

Use these tools to debug parsing issues:

```bash
# Check repository statistics
curl -X POST http://localhost:3000/tools/query_knowledge_graph \
  -d '{"command": "repos"}'

# Analyze specific repository
curl -X POST http://localhost:3000/tools/get_repository_info \
  -d '{"repo_name": "my-project"}'

# Test cross-language search
curl -X POST http://localhost:3000/tools/analyze_code_cross_language \
  -d '{"query": "test query", "languages": ["python", "javascript"]}'
```

## Advanced Features

### Custom Cypher Queries

Execute advanced queries across the multi-language knowledge graph:

```cypher
// Find all methods that call external APIs across languages
MATCH (m:Method)-[:CALLS]->(api:ExternalAPI)
RETURN m.name, m.language, api.endpoint
ORDER BY m.language, m.name
```

### Code Pattern Analysis

Identify common patterns across languages:

```cypher
// Find similar class structures across languages
MATCH (c1:Class), (c2:Class)
WHERE c1.language <> c2.language
  AND c1.name = c2.name
RETURN c1.name, c1.language, c2.language
```

### Dependency Mapping

Analyze dependencies across the entire stack:

```cypher
// Map dependencies across languages
MATCH (r:Repository)-[:CONTAINS]->(f:File)-[:IMPORTS]->(dep:Dependency)
RETURN r.name, f.language, dep.name
ORDER BY r.name, f.language
```

## Future Language Support

The system is designed for extensibility. Future language support includes:

- **Java** - Classes, interfaces, annotations
- **C#** - Classes, properties, LINQ
- **Rust** - Structs, traits, macros
- **C++** - Classes, templates, namespaces
- **PHP** - Classes, traits, namespaces
- **Ruby** - Classes, modules, gems
- **Swift** - Classes, protocols, extensions
- **Kotlin** - Classes, data classes, coroutines

To request support for additional languages, please submit an issue with:

- Language name and common file extensions
- Key language constructs to analyze
- Sample repository for testing
- Use case description

## Best Practices

### Repository Selection

Choose repositories that benefit from multi-language analysis:

- Full-stack applications
- Microservice architectures  
- Monorepos with multiple languages
- API implementations across languages

### Query Optimization

Structure cross-language queries effectively:

- Use specific language filters when possible
- Limit match count for faster responses
- Include file context for better understanding
- Use semantic queries rather than exact matches

### Integration Workflow

Integrate multi-language parsing into your development workflow:

1. **Parse repositories** after major releases
2. **Update local repos** when switching branches
3. **Cross-language search** during code reviews
4. **Validate AI code** before committing
5. **Explore patterns** during architecture decisions

## Conclusion

Multi-language repository parsing enables comprehensive analysis of modern polyglot codebases. By combining structural analysis with semantic search, developers can better understand, validate, and improve their multi-language applications.

The system scales from small projects to enterprise repositories while maintaining performance and accuracy through intelligent batching, size validation, and concurrent processing.