Skip to main content
Glama
siltecon
by siltecon

CodeWalker

Walk your codebase before writing new code.

CodeWalker is an MCP server that gives Claude Code real-time access to your Python codebase structure, enabling AI-assisted development that reuses existing code instead of duplicating it.


The Problem: AI Code Duplication

What Happens Without CodeWalker

When Claude Code writes code, it can't see what already exists in your codebase. This causes a cascade of problems:

Day 1: You ask Claude to add CSV loading functionality

# Claude creates: src/data_loader.py
def load_csv_file(path):
    return pd.read_csv(path)

Day 5: Different feature needs CSV loading

# Claude creates: src/importer.py (Claude has no memory of data_loader.py)
def load_csv_data(filepath):
    df = pd.read_csv(filepath)
    return df

Day 10: Another feature, another duplicate

# Claude creates: src/utils.py (Claude still doesn't know about the others)
def read_csv(file_path):
    return pd.read_csv(file_path, low_memory=False)  # Now with different behavior!

Result after 2 weeks:

  • 🔴 7 different CSV loading functions across your codebase

  • 🔴 Inconsistent behavior (one uses low_memory=False, others don't)

  • 🔴 Impossible to maintain (bug fixes need to be applied 7 times)

  • 🔴 Unpredictable behavior (which implementation gets called depends on imports)

  • 🔴 Code review nightmare (reviewing duplicate implementations wastes time)

The Cost of Code Duplication

This isn't just messy - it's expensive:

Impact

Cost

Development Time

30-40% wasted rewriting existing code

Bug Fixes

Same bug appears in multiple places, fixed multiple times

Code Reviews

Reviewers waste time on duplicate implementations

Onboarding

New developers confused by inconsistent patterns

Technical Debt

Duplicates diverge over time, creating maintenance burden

Testing

Same logic tested multiple times (or worse, inconsistently)

Real Example: A codebase with 800 functions had 52.7% duplication rate - 422 functions were duplicates. That's thousands of wasted lines of code.


How CodeWalker Solves This

CodeWalker indexes your codebase and lets Claude search before writing:

With CodeWalker

Day 1: You ask Claude to add CSV loading

Claude (internal): Let me check if CSV loading already exists...
> search_functions("load csv")

Found: load_csv_file() in src/data_loader.py

Claude: "I found an existing CSV loader. Let me use it instead of creating a new one."

Result:

# Claude imports existing function
from src.data_loader import load_csv_file

data = load_csv_file(path)

Day 5, 10, 15...: Same pattern - Claude finds and reuses existing code

Result after 2 weeks:

  • 1 canonical CSV loading function (not 7)

  • Consistent behavior across entire codebase

  • Easy to maintain (fix bugs once, fixed everywhere)

  • Predictable behavior (one implementation = one behavior)

  • Fast code reviews (reviewers see reuse, not duplication)


Why This Problem Exists

LLMs Lack Architectural Awareness

Claude Code (and all LLMs) have a fundamental limitation:

Can't see your codebase structureCan't search across filesCan't remember what existsCan't detect duplicates

The technical reason: When Claude writes code, it only sees:

  1. The current file you're editing

  2. Recent conversation context

  3. Maybe a few related files you showed it

What Claude DOESN'T see:

  • That load_csv_file() already exists in src/data_loader.py

  • That 3 other files have similar functions

  • That your team has a canonical implementation

  • Your codebase architecture and patterns

Result: Claude invents new implementations instead of reusing existing ones.

The "10 Developers, 0 Communication" Problem

Working with AI without CodeWalker is like having 10 developers who never talk to each other:

Developer 1 (Monday):    Creates load_csv_file()
Developer 2 (Tuesday):   Doesn't know about it, creates load_csv_data()
Developer 3 (Wednesday): Doesn't know about either, creates read_csv()
Developer 4 (Thursday):  Creates import_csv()
... and so on

Each "developer" (AI session) works in isolation, creating duplicates because they can't see what others did.

CodeWalker fixes this by giving AI a "shared memory" of your entire codebase.


Real-World Impact

Case Study: Elisity Project

Before CodeWalker:

  • 800 total functions

  • 422 duplicates (52.7% duplication rate)

  • 33 direct pd.read_csv() calls (should use centralized loader)

  • 11 duplicate print_summary() implementations

  • 3 duplicate load_flow_data() functions with diverging behavior

With CodeWalker:

  • Claude finds existing implementations before writing new code

  • Duplication rate drops to near-zero for new code

  • Codebase becomes more maintainable over time

Time Saved:

  • Development: 30-40% less time rewriting existing code

  • Code Review: Reviewers focus on new logic, not duplicate detection

  • Bug Fixes: Fix once instead of hunting down 3-7 duplicates


How It Works

Architecture

┌─────────────────────┐
│   Your Codebase     │
│  (Python files)     │
└──────────┬──────────┘
           │
           │ AST Parser extracts
           │ function metadata
           ▼
┌─────────────────────┐
│   SQLite Index      │
│  (functions.db)     │
│                     │
│  • Function names   │
│  • Parameters       │
│  • Locations        │
│  • Docstrings       │
└──────────┬──────────┘
           │
           │ Claude queries via
           │ MCP protocol
           ▼
┌─────────────────────┐
│   Claude Code       │
│                     │
│  "Does load_csv     │
│   already exist?"   │
│                     │
│  → Yes! Use it      │
└─────────────────────┘

What Gets Indexed

For each function in your codebase:

  • Name - load_csv_file

  • Location - src/data_loader.py:42

  • Parameters - (path, encoding='utf-8')

  • Docstring - First line for quick understanding

  • Type - Regular function, async function, or class method

  • Decorators - @staticmethod, @cached, etc.

What's NOT stored: Function bodies, comments, string literals (only structural metadata).

Search Performance

  • Parsing: ~100-200 files/second

  • Indexing: ~1000 functions/second

  • Search: Sub-millisecond SQLite queries

  • Database size: ~1 KB per function

Example: 800 functions = ~800 KB database, indexed in < 5 seconds, searched in < 1ms.


Features

🔍 Search Before Writing

Tool: search_functions(query, exact=False)

Find existing functions before Claude writes new code:

> search_functions("load csv")

Found 3 functions:

• load_csv_file(path, encoding='utf-8')
  Location: src/data_loader.py:42
  Docs: Load CSV file with proper encoding handling

• FlowDataLoader.load_flows(flow_path, site_label)
  Location: modules/flow_loader.py:98
  Docs: Load flow data from CSV with site labeling

• read_raw_csv(filepath)
  Location: legacy/importer.py:156
  Docs: Legacy CSV reader (deprecated)

Claude sees these results and chooses to import the canonical implementation instead of creating a new one.


🔁 Detect Duplicates

Tool: find_duplicates()

Find functions with the same name in multiple files:

> find_duplicates()

⚠️  Found 3 function names with multiple implementations:

**load_flow_data** (3 implementations):
  - cohesion_analyzer.py:253
  - legacy/community_detector.py:440
  - policy_group_clustering.py:497

**format_bytes** (2 implementations):
  - utils.py:88
  - helpers.py:124

💡 Recommendation: Consolidate into single canonical implementations.

Use this to audit your codebase and identify consolidation opportunities.


🎯 Similar Signatures

Tool: find_similar_signatures(min_params=2)

Find functions with the same parameters (might be doing the same thing):

> find_similar_signatures(min_params=2)

Found 2 signature groups:

**Signature: (data, output_path)** - 4 functions:
  • save_to_csv in exporter.py:67
  • write_csv_file in writer.py:134
  • export_data in utils.py:203
  • save_results in analyzer.py:445

💡 These functions likely do the same thing with different names.

Catches semantic duplicates - functions that do the same thing but have different names.


📂 Multi-Project Support

Work on multiple projects without reconfiguring:

# One-time setup
> register_project("project-a", "/Users/jose/Projects/project-a")
> register_project("project-b", "/Users/jose/Projects/project-b")

# Daily use - auto-detects from your current directory
cd ~/Projects/project-a
> search_functions("auth")
[Auto-detected: project-a]
Found 5 functions...

cd ~/Projects/project-b
> search_functions("auth")
[Auto-detected: project-b]
Found 3 functions...

Features:

  • ✅ Register unlimited projects

  • ✅ Auto-detection from working directory

  • ✅ Isolated indexes (no cross-contamination)

  • ✅ Zero configuration switching


📊 Codebase Statistics

Tool: get_index_stats()

Understand your codebase at a glance:

> get_index_stats()

📊 CodeWalker Statistics:

Total Functions: 800
Total Files: 60
Unique Names: 765
Methods: 423
Async Functions: 67
Avg Parameters: 2.3

Duplication Rate: 4.4% (35 duplicates)
Last Indexed: 2026-03-18 10:35:00

Track duplication rate over time to measure improvement.


Quick Start

1. Install

git clone https://github.com/[username]/codewalker.git
cd codewalker
pip install -r requirements.txt

2. Configure Claude Code

Add to ~/.config/claude-code/mcp.json:

{
  "mcpServers": {
    "codewalker": {
      "command": "python3",
      "args": ["/absolute/path/to/codewalker/src/server.py"]
    }
  }
}

3. Register Your Projects

Restart Claude Code, then:

> register_project("my-project", "/absolute/path/to/your/project")

🔄 Registering project: my-project
📁 Path: /absolute/path/to/your/project

⏳ Indexing project...
Found 800 functions

✅ Indexing complete!

Total Functions: 800
Total Files: 60
Unique Names: 765

4. Start Using

CodeWalker now automatically prevents duplicate code:

You: "Add functionality to load CSV files"

Claude (internal):
  > search_functions("load csv")
  Found: load_csv_file() in src/data_loader.py

Claude: "I found an existing CSV loader at src/data_loader.py:42.
Let me use that instead of creating a new one:

from src.data_loader import load_csv_file
data = load_csv_file(path)

Available Tools

Project Management

  • register_project(name, path) - Add a project to CodeWalker

  • list_projects() - View all registered projects

  • unregister_project(name) - Remove a project

  • get_current_project() - Show which project is detected

  • search_functions(query, exact) - Find functions by name

  • find_duplicates() - Detect duplicate function names

  • find_similar_signatures(min_params) - Find functions with similar parameters

  • get_file_functions(file_path) - List all functions in a file

  • get_index_stats() - View codebase statistics

  • reindex_repository() - Rebuild index after major changes


Use Cases

1. Prevent Duplication During Development

Before every implementation:

You: "Add user authentication"

Claude: Let me check if auth code already exists...
> search_functions("auth")
Found: authenticate_user() in src/auth.py

Claude: "I found existing auth code. Let me use it..."

2. Onboard to New Codebases

Explore unfamiliar code:

> search_functions("export")
Found 12 functions with "export" in the name

> get_file_functions("src/exporter.py")
Lists all 8 functions in the file with signatures and docs

Quickly understand what exists before writing new code.


3. Refactoring and Cleanup

Find consolidation opportunities:

> find_duplicates()
Found 15 duplicate function names

> find_similar_signatures()
Found 8 signature groups (functions with same params)

Systematically eliminate duplication.


4. Code Review

Reviewers can verify reuse:

Reviewer: "Why didn't you use the existing loader?"

Developer: "Let me check..."
> search_functions("load")
Found 3 loaders I didn't know about!

Catch missed reuse opportunities during review.


Comparison: With vs Without CodeWalker

Scenario

Without CodeWalker

With CodeWalker

Add CSV loading

Creates 7th duplicate load_csv()

Finds and reuses existing load_csv_file()

Authentication needed

Creates new auth from scratch

Imports existing authenticate_user()

Format bytes

Creates 3rd format_bytes()

Uses canonical implementation

Code review

"Why is this duplicated?"

"Good reuse of existing code"

Bug in duplicates

Fix bug in 7 different places

Fix once, fixed everywhere

Onboarding

"Which loader should I use?"

Clear: one canonical implementation

Duplication rate

40-60% (typical for AI projects)

< 5% (with CodeWalker)


Graph Theory Connection

CodeWalker treats your codebase as a graph:

  • Vertices - Functions, classes, modules

  • Edges - Imports, function calls, dependencies

  • Walking - Traversing the graph to discover existing code

Graph concepts:

  • Graph walk - Sequence of vertices (functions) and edges (calls)

  • Traversal - Systematic exploration of the graph structure

  • Random walks - Discovery algorithms (like PageRank)

  • Tree walks - AST traversal (what the parser does)

This isn't just a metaphor - CodeWalker literally walks your Abstract Syntax Tree (AST) to build the function graph.


Roadmap & Future Development

CodeWalker v2.0.0 solves the core AI code duplication problem for Python projects. Future versions will add deeper analysis, broader language support, and smarter automation.

🔥 High Priority

Why these matter: These features provide immediate value for existing users and are most frequently requested.

  • Incremental indexing - Currently, reindexing rebuilds the entire database. Incremental indexing would only update changed files, making reindexing 10-100x faster for large codebases. Impact: Seconds instead of minutes for 10k+ function codebases.

  • Near-duplicate detection - Functions like load_csv, load_csv_data, and read_csv_file are semantically duplicates but have different names. Levenshtein distance matching would catch these "near-duplicates" that current exact/partial matching misses. Impact: Catch 20-30% more duplicates.

  • Cross-project search - Search across all registered projects simultaneously. Useful for teams with shared utilities across multiple repos or monorepo users who want to find reusable code anywhere. Impact: Prevent reinventing wheels across project boundaries.

  • Call graph analysis - Track what calls what to enable "blast radius" analysis ("what breaks if I change this function?") and identify unused code. Impact: Safer refactoring, dead code detection.


🎯 Medium Priority

Why these matter: These features enhance CodeWalker's intelligence and reduce manual effort.

  • Semantic similarity (ML-based) - Detect functions that do the same thing with completely different names and signatures using embedding-based similarity. Example: save_to_csv(data, path) and export_results(df, filename) might be doing the same thing. Impact: Catch duplicates current signature matching misses.

  • Auto-reindexing on file changes - Watch filesystem and automatically reindex when Python files change. No more manual reindex_repository() calls. Impact: Zero-maintenance index that's always current.

  • Multi-language support - Extend beyond Python to JavaScript, TypeScript, Go, Rust, Java. Same duplication prevention for polyglot codebases. Impact: Unified duplication prevention across entire stack.

  • Blast radius visualization - Show dependency trees and impact analysis when considering changes. "If I modify function X, these 15 functions are affected." Impact: Confident refactoring.


💡 Lower Priority

Why these matter: Nice-to-have features that improve developer experience but aren't critical to core functionality.

  • Web UI - Visual interface for browsing functions, viewing call graphs, and exploring codebase structure in a browser. Alternative to CLI-only workflow. Impact: Better onboarding experience, visual learners benefit.

  • VS Code extension - Native VS Code integration with inline suggestions ("⚠️ Similar function exists: use load_csv_file() instead"). Impact: Proactive duplicate prevention during typing.

  • Import suggestions - When Claude is about to write new code, automatically suggest existing imports. "You're about to write X, but Y already exists - import it?" Impact: Even less manual searching.

  • GitHub Action - CI/CD integration that fails PRs introducing duplicates above a threshold. Enforce duplication standards via automation. Impact: Prevent duplicates from ever being merged.


📊 Current Capabilities

What works today:

Language Support:

  • ✅ Python (full support for functions, methods, async functions, decorators)

  • 🚧 JavaScript, TypeScript, Go, Rust (on roadmap)

Analysis:

  • ✅ Function names, signatures, locations, docstrings

  • ✅ Parameter matching and signature comparison

  • ✅ Duplicate detection (exact name matches)

  • 🚧 Call graph analysis (planned)

  • 🚧 Semantic similarity (planned)

  • 🚧 Near-duplicate detection via Levenshtein distance (planned)

Indexing:

  • ✅ Full repository indexing (~5 seconds for 800 functions)

  • ✅ Manual reindexing on demand

  • 🚧 Incremental updates (only changed files - planned)

  • 🚧 Auto-reindexing on file changes (planned)

Search:

  • ✅ Exact and partial name matching

  • ✅ Parameter signature matching

  • ✅ Multi-project support with auto-detection

  • 🚧 Semantic search by behavior (planned)

  • 🚧 Cross-project search (planned)


FAQ

Q: Does this work with other AI assistants?

Yes! CodeWalker uses the Model Context Protocol (MCP), which is an open standard. Any AI tool that supports MCP can use CodeWalker:

  • Claude Code (tested)

  • Claude Desktop (should work)

  • Other MCP-compatible tools

Q: How much overhead does indexing add?

Very little:

  • Initial indexing: ~5 seconds for 800 functions

  • Reindexing: ~5 seconds (full rebuild)

  • Search queries: < 1ms

  • Memory: ~10 MB for typical projects

You barely notice it's there.

Q: What if my codebase is huge?

CodeWalker scales well:

  • Tested on 800 functions / 60 files

  • Should handle 10,000+ functions easily (SQLite scales)

  • For massive codebases (100k+ functions), consider:

    • Incremental indexing (planned feature)

    • Multiple project registrations (already supported)

    • Excluding test files or generated code

Q: Can I use this on proprietary code?

Yes! Everything is local:

  • ✅ Index stored locally (~/.codewalker)

  • ✅ No data sent to external services

  • ✅ No network requests during search

  • ✅ Your code never leaves your machine

CodeWalker is 100% private.

Q: How is this different from IDE autocomplete?

Complementary, not competing:

IDE autocomplete:

  • Works in single file

  • Shows available imports

  • Type-aware suggestions

  • Real-time as you type

CodeWalker:

  • Works across entire codebase

  • Searches by semantic intent ("load csv")

  • Finds duplicates proactively

  • Used by AI during code generation

Use both - IDE for writing, CodeWalker for AI-assisted development.

Q: What about private/internal functions?

CodeWalker indexes everything:

  • Public functions: ✅ Indexed

  • Private functions (_private): ✅ Indexed

  • Internal functions (__internal): ✅ Indexed

Why? Because you might want to reuse private functions too. Claude respects Python conventions (won't use _private from other modules without good reason), but knowing they exist prevents duplication.


Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Areas we need help:

  • Multi-language support (JavaScript, TypeScript, Go)

  • Incremental indexing

  • Semantic similarity detection

  • Performance optimization


License

MIT License - see LICENSE for details.

Free to use in personal and commercial projects.


Credits

Built to solve a real problem: Claude Code was creating duplicate implementations across a 60-file, 800-function codebase. CodeWalker eliminated the duplication.

Inspired by: Pharaoh (commercial tool for codebase intelligence)

Built with: Claude Sonnet 4.5 (dogfooding - using AI to build tools that improve AI)


Support


Summary

Problem: AI assistants can't see your codebase, causing massive code duplication.

Solution: CodeWalker indexes your codebase and lets AI search before writing.

Result: 40-60% reduction in duplicate code, faster development, cleaner codebase.

Get Started:

pip install -r requirements.txt
# Configure MCP (see Quick Start above)
> register_project("my-project", "/path/to/project")
> search_functions("whatever you're about to write")

Stop duplicating code. Start walking your codebase. 🚀

-
security - not tested
A
license - permissive license
-
quality - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/siltecon/codewalker'

If you have feedback or need assistance with the MCP directory API, please join our Discord server