Skip to main content
Glama

License Discussions

AI coding assistants work better when they have access to real examples from your codebase. Kodit indexes your repositories, splits source files into searchable snippets, and serves them to any MCP-compatible assistant. When your assistant needs to write new code, it queries Kodit first and gets back relevant, up-to-date examples drawn from your own projects.

Kodit also handles documents. PDFs, Word files, PowerPoint decks, and spreadsheets are rasterized and indexed so you can search across both code and documentation in one place.

What you get:

  • Multiple search strategies including BM25 keyword search, semantic vector search, regex grep, and visual document search, each exposed as a separate MCP tool so your assistant picks the right approach for each query

  • MCP server that works with Claude Code, Cursor, Cline, Kilo Code, and any other MCP-compatible assistant

  • REST API for programmatic access to search, repositories, enrichments, and indexing status

  • AI enrichments (optional) including architecture docs, API docs, database schema detection, cookbook examples, and commit summaries, all generated by an LLM

  • Document intelligence with visual search across PDF pages, Office documents, and images using multimodal embeddings

  • No external dependencies required for basic operation, with a built-in embedding model and SQLite storage

Quickstart

docker run -p 8080:8080 registry.helix.ml/helix/kodit:latest

This starts Kodit with SQLite storage and a built-in embedding model. No API keys needed.

Pre-built binaries

Download a binary from the releases page, then:

chmod +x kodit
./kodit serve

Verify it works

Open the interactive API docs at http://localhost:8080/docs.

Or index a small repository and run a search:

# Index a repository
curl http://localhost:8080/api/v1/repositories \
  -X POST -H "Content-Type: application/json" \
  -d '{
    "data": {
      "type": "repository",
      "attributes": {
        "remote_uri": "https://gist.github.com/philwinder/7aa38185e20433c04c533f2b28f4e217.git"
      }
    }
  }'

# Check indexing progress
curl http://localhost:8080/api/v1/repositories/1/status

# Search (once indexing is complete)
curl http://localhost:8080/api/v1/search \
  -X POST -H "Content-Type: application/json" \
  -d '{
    "data": {
      "type": "search",
      "attributes": {
        "keywords": ["orders"],
        "text": "code to get all orders"
      }
    }
  }'

Related MCP server: Vibe Coder MCP

Connecting to AI Assistants

Kodit exposes an MCP endpoint at /mcp. Connect your assistant to start using Kodit as a code search tool.

Claude Code

claude mcp add --transport http kodit http://localhost:8080/mcp

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "kodit": {
      "url": "http://localhost:8080/mcp"
    }
  }
}

Cline

Add to the MCP Servers configuration (Remote Servers tab):

{
  "mcpServers": {
    "kodit": {
      "autoApprove": [],
      "disabled": false,
      "timeout": 60,
      "type": "streamableHttp",
      "url": "http://localhost:8080/mcp"
    }
  }
}

Kilo Code

Add to the MCP configuration (Edit Project/Global MCP):

{
  "mcpServers": {
    "kodit": {
      "type": "streamable-http",
      "url": "http://localhost:8080/mcp",
      "alwaysAllow": [],
      "disabled": false
    }
  }
}

Replace http://localhost:8080 with your server URL if running remotely.

Encouraging assistants to use Kodit

Some assistants may not call Kodit tools automatically. Add this to your project rules or system prompt to enforce usage:

For every request that involves writing or modifying code, the assistant's first
action must be to call the kodit search MCP tools. Only produce or edit code after
the tool call returns results.

In Cursor, save this as .cursor/rules/kodit.mdc with alwaysApply: true frontmatter.

MCP Tools

Kodit exposes these tools to connected AI assistants:

Tool

Description

kodit_repositories

List all indexed repositories

kodit_semantic_search

Semantic similarity search across code

kodit_keyword_search

BM25 keyword search

kodit_visual_search

Search document page images

kodit_grep

Regex pattern matching

kodit_ls

List files by glob pattern

kodit_read_resource

Read file content by URI

kodit_architecture_docs

Architecture documentation for a repo

kodit_api_docs

Public API documentation

kodit_database_schema

Database schema documentation

kodit_cookbook

Usage examples and patterns

kodit_commit_description

Commit description

kodit_wiki

Wiki table of contents

kodit_wiki_page

Read a specific wiki page

kodit_version

Server version

The enrichment tools (architecture_docs, api_docs, database_schema, cookbook, wiki, commit_description) require an LLM provider to be configured. See Enrichment Providers under Configuration Reference.

Go Library

Kodit can be embedded directly as a Go library. This is how Helix integrates Kodit into its platform.

import "github.com/helixml/kodit"

client, err := kodit.New(
    kodit.WithSQLite(".kodit/data.db"),
)
if err != nil {
    log.Fatal(err)
}
defer client.Close()

// Index a repository
_, _, err = client.Repositories.Add(ctx, &service.RepositoryAddParams{
    URL: "https://github.com/kubernetes/kubernetes",
})

// Search
results, err := client.Search.Query(ctx, "create a deployment",
    service.WithLimit(10),
)

for _, result := range results.Enrichments() {
    fmt.Println(result.Subtype(), result.Content())
}

Library options

Option

Description

WithSQLite(path)

Use SQLite for storage

WithPostgresVectorchord(dsn)

Use PostgreSQL with VectorChord

WithOpenAI(apiKey)

OpenAI for embeddings and text

WithAnthropic(apiKey)

Anthropic Claude for text (needs separate embedding provider)

WithTextProvider(p)

Custom text generation provider

WithEmbeddingProvider(p)

Custom embedding provider

WithRAGPipeline()

Skip LLM enrichments, index and search only

WithFullPipeline()

Require all enrichments (errors without a text provider)

WithDataDir(dir)

Data directory (default: ~/.kodit)

WithCloneDir(dir)

Repository clone directory

WithAPIKeys(keys...)

API keys for HTTP authentication

WithWorkerCount(n)

Number of background workers (default: 1)

WithPeriodicSyncConfig(cfg)

Automatic repository sync settings

Search options

Option

Description

WithSemanticWeight(w)

Weight for semantic vs keyword search (0.0 to 1.0)

WithLimit(n)

Maximum number of results

WithOffset(n)

Offset for pagination

WithLanguages(langs...)

Filter by programming languages

WithRepositories(ids...)

Filter by repository IDs

WithMinScore(score)

Minimum score threshold

WithEnrichmentTypes(types...)

Filter results to specific enrichment types

WithSnippets(include)

Include code snippets in results

WithDocuments(include)

Include enrichment documents in results

Go HTTP client

A generated HTTP client is available for calling a remote Kodit server from Go:

go get github.com/helixml/kodit/clients/go
import koditclient "github.com/helixml/kodit/clients/go"

client, err := koditclient.NewClient("https://kodit.example.com")

// List repositories
resp, err := client.GetRepositories(ctx, nil)

// Search
text := "create a deployment"
resp, err := client.PostSearch(ctx, koditclient.PostSearchJSONRequestBody{
    Data: &koditclient.DtoSearchData{
        Attributes: &koditclient.DtoSearchAttributes{
            Text: &text,
        },
    },
})

Types are auto-generated from the OpenAPI spec. See the interactive API docs at /docs for the full endpoint list.

Production Deployment

For production use, deploy with PostgreSQL (VectorChord) for scalable vector search and a dedicated LLM provider for enrichments.

Docker Compose

Save this as docker-compose.yaml:

services:
  kodit:
    image: registry.helix.ml/helix/kodit:latest
    ports:
      - "8080:8080"
    command: ["serve"]
    restart: unless-stopped
    depends_on:
      - vectorchord
    environment:
      DATA_DIR: /data
      DB_URL: postgresql://postgres:mysecretpassword@vectorchord:5432/kodit

      # Enrichment LLM (optional, enables AI-generated docs)
      ENRICHMENT_ENDPOINT_BASE_URL: http://ollama:11434
      ENRICHMENT_ENDPOINT_MODEL: ollama/qwen3:1.7b

      # External embedding provider (optional, replaces built-in model)
      # EMBEDDING_ENDPOINT_API_KEY: sk-proj-xxxx
      # EMBEDDING_ENDPOINT_MODEL: openai/text-embedding-3-small

      LOG_LEVEL: INFO
      API_KEYS: ${KODIT_API_KEYS:-}
    volumes:
      - kodit-data:/data

  vectorchord:
    image: tensorchord/vchord-suite:pg17-20250601
    environment:
      POSTGRES_DB: kodit
      POSTGRES_PASSWORD: mysecretpassword
    volumes:
      - vectorchord-data:/var/lib/postgresql/data
    restart: unless-stopped

volumes:
  kodit-data:
  vectorchord-data:

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vectorchord
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vectorchord
  template:
    metadata:
      labels:
        app: vectorchord
    spec:
      containers:
        - name: vectorchord
          image: tensorchord/vchord-suite:pg17-20250601
          env:
            - name: POSTGRES_DB
              value: kodit
            - name: POSTGRES_PASSWORD
              value: mysecretpassword
          ports:
            - containerPort: 5432
---
apiVersion: v1
kind: Service
metadata:
  name: vectorchord
spec:
  selector:
    app: vectorchord
  ports:
    - port: 5432
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kodit
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kodit
  template:
    metadata:
      labels:
        app: kodit
    spec:
      containers:
        - name: kodit
          image: registry.helix.ml/helix/kodit:latest # pin to a specific version
          args: ["serve"]
          env: [] # see Configuration Reference for environment variables
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: kodit
spec:
  type: LoadBalancer
  selector:
    app: kodit
  ports:
    - port: 8080

Authentication

Set the API_KEYS environment variable to a comma-separated list of keys. Write endpoints (creating repositories, triggering syncs) require a valid key in the Authorization: Bearer <key> header. Search endpoints are open by default.

Configuration Reference

Configuration is done through environment variables. You can also use a .env file:

kodit serve --env-file .env

Server

Variable

Default

Description

HOST

0.0.0.0

Listen address

PORT

8080

Listen port

DATA_DIR

~/.kodit

Data directory for models, clones, and database

DB_URL

(empty)

PostgreSQL connection string (uses SQLite if empty)

LOG_LEVEL

INFO

Logging verbosity: DEBUG, INFO, WARN, ERROR

LOG_FORMAT

pretty

Log format: pretty or json

API_KEYS

(empty)

Comma-separated API keys for write endpoints

WORKER_COUNT

1

Number of background workers

SEARCH_LIMIT

10

Default search result limit

DISABLE_TELEMETRY

false

Disable anonymous usage telemetry

HTTP_CACHE_DIR

(empty)

Directory for caching HTTP POST responses to disk; avoids repeated API calls during development

REPORTING_LOG_TIME_INTERVAL

5

Progress reporting interval in seconds

Embedding Provider

These configure an external embedding model. If unset, Kodit uses its built-in model.

Variable

Default

Description

EMBEDDING_ENDPOINT_BASE_URL

(empty)

Base URL of embedding service

EMBEDDING_ENDPOINT_MODEL

(empty)

Model identifier

EMBEDDING_ENDPOINT_API_KEY

(empty)

API key

EMBEDDING_ENDPOINT_MAX_TOKENS

0

Max tokens per request (0 = provider default)

EMBEDDING_ENDPOINT_MAX_BATCH_CHARS

16000

Max total characters per embedding batch

EMBEDDING_ENDPOINT_MAX_BATCH_SIZE

1

Max items per batch

EMBEDDING_ENDPOINT_TIMEOUT

60

Request timeout in seconds

EMBEDDING_ENDPOINT_NUM_PARALLEL_TASKS

1

Concurrent embedding requests

EMBEDDING_ENDPOINT_EXTRA_PARAMS

(empty)

JSON-encoded extra parameters for the embedding provider

EMBEDDING_ENDPOINT_QUERY_INSTRUCTION

(empty)

Instruction prepended to queries for asymmetric retrieval

EMBEDDING_ENDPOINT_DOCUMENT_INSTRUCTION

(empty)

Instruction prepended to documents for asymmetric retrieval

EMBEDDING_ENDPOINT_SOCKET_PATH

(empty)

Unix socket path for local provider (alternative to BASE_URL)

EMBEDDING_ENDPOINT_MAX_RETRIES

5

Maximum retry attempts on request failure

EMBEDDING_ENDPOINT_INITIAL_DELAY

2.0

Initial retry delay in seconds

EMBEDDING_ENDPOINT_BACKOFF_FACTOR

2.0

Retry backoff multiplier

Vision Embedding Provider

These configure a remote service for image and text vision embeddings. If unset, Kodit uses its built-in SigLIP2 model.

Variable

Default

Description

VISION_EMBEDDING_ENDPOINT_BASE_URL

(empty)

Base URL of vision embedding service

VISION_EMBEDDING_ENDPOINT_MODEL

(empty)

Model identifier

VISION_EMBEDDING_ENDPOINT_API_KEY

(empty)

API key

VISION_EMBEDDING_ENDPOINT_MAX_TOKENS

0

Max tokens per request (0 = provider default)

VISION_EMBEDDING_ENDPOINT_MAX_BATCH_CHARS

16000

Max total characters per embedding batch

VISION_EMBEDDING_ENDPOINT_MAX_BATCH_SIZE

1

Max items per batch

VISION_EMBEDDING_ENDPOINT_TIMEOUT

60

Request timeout in seconds

VISION_EMBEDDING_ENDPOINT_NUM_PARALLEL_TASKS

1

Concurrent vision embedding requests

VISION_EMBEDDING_ENDPOINT_EXTRA_PARAMS

(empty)

JSON-encoded extra parameters for the vision embedding provider

VISION_EMBEDDING_ENDPOINT_QUERY_INSTRUCTION

(empty)

Instruction prepended to queries for asymmetric retrieval

VISION_EMBEDDING_ENDPOINT_DOCUMENT_INSTRUCTION

(empty)

Instruction prepended to documents for asymmetric retrieval

VISION_EMBEDDING_ENDPOINT_SOCKET_PATH

(empty)

Unix socket path for local provider (alternative to BASE_URL)

VISION_EMBEDDING_ENDPOINT_MAX_RETRIES

5

Maximum retry attempts on request failure

VISION_EMBEDDING_ENDPOINT_INITIAL_DELAY

2.0

Initial retry delay in seconds

VISION_EMBEDDING_ENDPOINT_BACKOFF_FACTOR

2.0

Retry backoff multiplier

Enrichment Providers

These configure an LLM for generating architecture docs, API docs, database schemas, cookbooks, commit summaries, and wiki pages. Without this, Kodit indexes and searches code but does not generate any AI documentation.

Variable

Default

Description

ENRICHMENT_ENDPOINT_BASE_URL

(empty)

Base URL of LLM service

ENRICHMENT_ENDPOINT_MODEL

(empty)

Model identifier

ENRICHMENT_ENDPOINT_API_KEY

(empty)

API key

ENRICHMENT_ENDPOINT_NUM_PARALLEL_TASKS

1

Concurrent enrichment requests

ENRICHMENT_ENDPOINT_TIMEOUT

60

Request timeout in seconds

ENRICHMENT_ENDPOINT_EXTRA_PARAMS

(empty)

JSON-encoded extra parameters for the LLM

ENRICHMENT_ENDPOINT_MAX_TOKENS

0

Max tokens per response (0 = provider default)

ENRICHMENT_ENDPOINT_SOCKET_PATH

(empty)

Unix socket path for local provider (alternative to BASE_URL)

ENRICHMENT_ENDPOINT_MAX_RETRIES

5

Maximum retry attempts on request failure

ENRICHMENT_ENDPOINT_INITIAL_DELAY

2.0

Initial retry delay in seconds

ENRICHMENT_ENDPOINT_BACKOFF_FACTOR

2.0

Retry backoff multiplier

ENRICHMENT_ENDPOINT_MAX_BATCH_CHARS

16000

Max total characters per batch

ENRICHMENT_ENDPOINT_MAX_BATCH_SIZE

1

Max items per batch

ENRICHMENT_ENDPOINT_QUERY_INSTRUCTION

(empty)

Instruction prepended to queries for asymmetric retrieval

ENRICHMENT_ENDPOINT_DOCUMENT_INSTRUCTION

(empty)

Instruction prepended to documents for asymmetric retrieval

Enrichment is typically the slowest part of indexing because each enrichment requires a round-trip to the LLM provider. Increase NUM_PARALLEL_TASKS to speed things up, but respect your provider's rate limits. Start low and increase over time.

Provider examples:

# OpenAI
ENRICHMENT_ENDPOINT_BASE_URL=https://api.openai.com/v1
ENRICHMENT_ENDPOINT_MODEL=gpt-4o-mini
ENRICHMENT_ENDPOINT_API_KEY=sk-proj-xxxx

# Ollama (local)
ENRICHMENT_ENDPOINT_BASE_URL=http://localhost:11434
ENRICHMENT_ENDPOINT_MODEL=ollama/qwen3:1.7b

# Helix (private cloud)
ENRICHMENT_ENDPOINT_BASE_URL=https://app.helix.ml/v1
ENRICHMENT_ENDPOINT_MODEL=Qwen/Qwen3-8B
ENRICHMENT_ENDPOINT_API_KEY=your-helix-key

Periodic Sync

Variable

Default

Description

PERIODIC_SYNC_ENABLED

true

Auto-sync repositories on an interval

PERIODIC_SYNC_INTERVAL_SECONDS

1800

Sync interval (default: 30 minutes)

PERIODIC_SYNC_RETRY_ATTEMPTS

3

Retry count on sync failure

Chunking

Variable

Default

Description

CHUNK_SIZE

1500

Characters per chunk

CHUNK_OVERLAP

200

Overlap between adjacent chunks

CHUNK_MIN_SIZE

50

Minimum chunk size

REST API

The full API is documented interactively at /docs on a running Kodit instance. The OpenAPI 3.0 specification is available at /docs/openapi.json.

Key endpoints:

Method

Path

Description

POST

/api/v1/repositories

Add a repository for indexing

GET

/api/v1/repositories

List indexed repositories

GET

/api/v1/repositories/{id}/status

Indexing progress

POST

/api/v1/repositories/{id}/sync

Trigger a sync

DELETE

/api/v1/repositories/{id}

Remove a repository

POST

/api/v1/search

Combined search (keyword + semantic)

GET

/api/v1/search/semantic

Semantic search only

GET

/api/v1/search/keyword

Keyword search only

GET

/api/v1/search/visual

Visual search on document pages

GET

/api/v1/search/grep

Regex pattern search

GET

/api/v1/search/ls

List files by glob

All write endpoints require an Authorization: Bearer <key> header when API_KEYS is set.

How Indexing Works

When you add a repository, Kodit runs a pipeline:

  1. Clone the Git repository to local storage

  2. Scan commits, branches, and tags to extract metadata

  3. Extract snippets by splitting source files into overlapping text chunks

  4. Build search indexes with BM25 (keyword) and vector embeddings (semantic)

  5. Generate enrichments (if an LLM provider is configured): architecture docs, API docs, database schemas, cookbook examples, commit summaries, and wiki pages

Kodit tracks which files have changed between syncs and only reprocesses modified content. Repositories sync automatically on a configurable interval (default: every 30 minutes).

Supported sources

Kodit indexes any Git repository accessible via HTTPS, SSH, or the Git protocol. This includes GitHub, GitLab, Bitbucket, Azure DevOps, and self-hosted servers.

Private repositories

Private repositories are supported through personal access tokens or SSH keys:

# HTTPS with token
https://username:token@github.com/username/repo.git

# SSH (ensure your SSH key is configured)
git@github.com:username/repo.git

Privacy

Kodit respects .gitignore and .noindex files. Files matching these patterns are excluded from indexing.

Storage Backends

SQLite (default)

No configuration needed. Kodit creates a SQLite database in the data directory with FTS5 for keyword search and in-process vector storage. Good for single-user and small-team deployments.

PostgreSQL with VectorChord

For larger deployments, use PostgreSQL with the VectorChord extension. This provides scalable vector search and concurrent access. Set the DB_URL environment variable to your connection string.

The recommended Docker image is tensorchord/vchord-suite:pg17-20250601, which bundles PostgreSQL 17 with VectorChord, vchord_bm25, and pg_tokenizer.

Building from Source

git clone https://github.com/helixml/kodit.git
cd kodit
make tools          # Install development tools
make download-model # Download the built-in embedding model
make build          # Build the binary
./bin/kodit version
./bin/kodit serve

Run the tests:

make test                         # All tests
make test PKG=./internal/foo/...  # Specific package
make check                        # Format, vet, lint, and test

Troubleshooting

MCP connection error after restart: If you see No valid session ID provided after restarting the Kodit server, reload the MCP client in your assistant. MCP sessions do not survive server restarts.

No search results: Check that indexing has completed by calling GET /api/v1/repositories/{id}/status. If status shows errors, check the server logs with LOG_LEVEL=DEBUG.

Enrichments not generating: Enrichments require an LLM provider. Check that ENRICHMENT_ENDPOINT_BASE_URL and ENRICHMENT_ENDPOINT_MODEL are set. Without these, Kodit indexes and searches code but does not generate AI documentation.

Telemetry

Kodit collects limited anonymous telemetry (usage metadata only, no user data) to guide development. Disable it with:

DISABLE_TELEMETRY=true

Commercial Support

Helix provides a managed platform built on Kodit with additional features including a management UI, repository browsing, team collaboration, and hosted infrastructure. For commercial support or enterprise integration, contact founders@helix.ml.

Contributing

See CONTRIBUTING.md for guidelines.

License

Apache 2.0

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
1hResponse time
3dRelease cycle
99Releases (12mo)
Issues opened vs closed

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/helixml/kodit'

If you have feedback or need assistance with the MCP directory API, please join our Discord server