Skip to main content
Glama

Magector

Technology-aware MCP server for Magento 2 and Adobe Commerce with intelligent indexing and search.

Magector is a Model Context Protocol (MCP) server that deeply understands Magento 2 and Adobe Commerce. It builds a semantic vector index of your entire codebase — 18,000+ files across hundreds of modules — and exposes 47 tools that let AI assistants search, navigate, and understand the code with domain-specific intelligence. Instead of grepping for keywords, your AI asks "how are checkout totals calculated?" and gets ranked, relevant results in under 50ms, enriched with Magento pattern detection (plugins, observers, controllers, DI preferences, layout XML, and 20+ more).

Rust Node.js Magento Adobe Commerce Accuracy License: MIT


Why Magector

Magento 2 and Adobe Commerce have 18,000+ PHP, XML, JS, PHTML, and GraphQL files spread across hundreds of modules. The codebase relies heavily on indirection — plugins intercept methods defined in other modules, observers react to events dispatched elsewhere, di.xml rewires interfaces to concrete classes, and layout XML stitches blocks and templates together. No single file tells the full story.

Generic search tools — grep, IDE search, or the keyword matching built into AI assistants — can't bridge this gap. They find literal strings but can't connect "how does checkout calculate totals?" to TotalsCollector.php when the word "totals" appears in hundreds of unrelated files.

Magector solves this with three layers of intelligence:

  1. Semantic vector index — every file is embedded into a 384-dimensional space (ONNX, all-MiniLM-L6-v2) where meaning matters more than keywords. A search for "payment capture" returns CaptureOperation.php because the embeddings are close, not because the file contains the word "capture".

  2. Magento technology awareness — 20+ pattern detectors identify plugins, observers, controllers, blocks, cron jobs, GraphQL resolvers, DI preferences, layout XML, and more. Every search result is enriched with what kind of Magento component it is, so the AI client understands the code's role in the system.

  3. Adaptive learning (SONA) — Magector tracks which results you actually use and adjusts future rankings with MicroLoRA feedback, getting smarter over time without any API calls.

The result: your AI assistant calls one MCP tool and gets ranked, pattern-enriched results in 10-45ms — instead of burning tokens grepping through dozens of wrong files. High relevance accuracy means the AI reads fewer, more targeted files, which optimizes context window usage, reduces API costs, and accelerates development cycles.

Approach

Semantic matches

Magento-aware

Speed (18K files)

grep / ripgrep

No

No

100-500ms

IDE search

No

No

200-1000ms

GitHub search

Partial

No

500-2000ms

Magector

Yes

Yes

10-45ms


Features

  • Semantic search -- find code by meaning, not exact keywords

  • 99.2% accuracy -- validated with 101 E2E test queries across 16 tool categories, plus 557 Rust-level test cases

  • Hybrid search -- combines semantic vector similarity with keyword re-ranking for best-of-both-worlds results

  • Structured JSON output -- results include file path, class name, methods list, role badges, and content snippets for minimal round-trips

  • Persistent serve mode -- keeps ONNX model and HNSW index resident in memory, eliminating cold-start latency

  • Incremental re-indexing -- background file watcher detects changes and updates the index without restart (tombstone + compact strategy)

  • ONNX embeddings -- native 384-dim transformer embeddings via ONNX Runtime

  • 36K+ vectors -- indexes the complete Magento 2 / Adobe Commerce codebase including framework internals

  • Magento-aware -- understands controllers, plugins, observers, blocks, resolvers, repositories, and 20+ Magento patterns

  • Adobe Commerce compatible -- works with both Magento Open Source and Adobe Commerce (B2B, Staging, and all Commerce-specific modules)

  • AST-powered -- tree-sitter parsing for PHP and JavaScript extracts classes, methods, namespaces, and inheritance

  • Cross-tool discovery -- tool descriptions include keywords and "See also" references so AI clients find the right tool on the first try

  • SONA feedback learning -- self-adjusting search that learns from MCP tool call patterns (e.g., search → find_plugin refines future rankings for similar queries)

  • SONA v2 with MicroLoRA + EWC++ -- rank-2 low-rank adapter (1536 params, ~6KB) adjusts query embeddings based on learned patterns; Elastic Weight Consolidation prevents catastrophic forgetting during online learning

  • Diff analysis -- risk scoring and change classification for git commits and staged changes

  • Complexity analysis -- cyclomatic complexity, function count, and hotspot detection across modules

  • Fast -- 10-45ms queries via persistent serve process, batched ONNX embedding with adaptive thread scaling

  • LLM description enrichment -- generate natural-language descriptions of di.xml files using Claude, stored in SQLite, and prepend them to embedding text so descriptions influence vector search ranking (not just post-retrieval display)

  • MCP server -- 47 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool

  • Clean architecture -- Rust core handles all indexing/search, Node.js MCP server delegates to it


Architecture

flowchart LR
  subgraph node ["Node.js Layer"]
    direction TB
    G["CLI<br/>init · index · search · describe"]
    E["MCP Server<br/>47 tools · LRU cache"]
    F["Persistent Serve Process"]
    G --> F
    E --> F
  end

  F -->|"stdin/stdout JSON"| rust

  subgraph rust ["Rust Core"]
    direction TB
    A["AST Parser<br/>PHP · JS · XML"]
    B["Pattern Detection<br/>20+ Magento patterns"]
    B2["Description Enrichment<br/>LLM-powered di.xml summaries"]
    C["ONNX Embedder<br/>all-MiniLM-L6-v2 · 384d"]
    D["HNSW Vector Search<br/>hybrid reranking · SONA"]
    A --> B --> B2 --> C --> D
  end

  style rust fill:#f4a460,color:#000
  style node fill:#68b684,color:#000

Indexing Pipeline

flowchart LR
  A["Source File"] --> B["AST Parser"]
  B --> C["Pattern Detection"]
  C --> D["Text Enrichment"]
  D --> D2{"Descriptions DB?"}
  D2 -->|Yes| D3["Prepend LLM Description"]
  D2 -->|No| E["ONNX Embedding"]
  D3 --> E
  E --> F[("HNSW Index")]
  A --> G["Metadata"] --> F

Search Pipeline

flowchart LR
  Q["Query"] --> E1["Synonym Enrichment"]
  E1 --> E2["ONNX Embedding"]
  E2 --> H["HNSW Search"]
  H --> R["Hybrid Reranking"]
  R --> SA["SONA Adjustment"]
  SA --> J["Structured JSON"]

Components

Component

Technology

Purpose

Embeddings

ort (ONNX Runtime)

all-MiniLM-L6-v2, 384 dimensions

Vector search

hnsw_rs + hybrid reranking

Approximate nearest neighbor + keyword boosting

PHP parsing

tree-sitter-php

Class, method, namespace extraction

JS parsing

tree-sitter-javascript

AMD/ES6 module detection

Pattern detection

Custom Rust

20+ Magento-specific patterns

CLI

clap

Command-line interface (index, search, serve, validate)

Unified metadata

rusqlite (bundled SQLite)

LLM descriptions, method-chain enrichment, process state, cache — all in .magector/data.db

SONA

Custom Rust

Feedback learning with MicroLoRA + EWC++

MCP server

@modelcontextprotocol/sdk

AI tool integration with structured JSON output

Config data

JSON exports in .magector/config-data/

One-time core_config_data exports per environment for config tracing


Security

Magector operates on source code indexed from potentially-untrusted vendor/ dependencies and is driven by an LLM that may be manipulated via prompt injection in indexed comments, docblocks, or markdown. The following hardening applies as of v2.15.1:

Path traversal protection

All tools that accept a path argument (magento_read, magento_grep, magento_ast_search, magento_find_dataobject_issues) route the input through safePath() / safeRelPath() helpers in src/mcp-server.js. These:

  1. Resolve the argument against MAGENTO_ROOT with path.resolve() (normalizes .., symlinks are not followed during validation).

  2. Reject any resolved path that does not lie inside MAGENTO_ROOT.

This prevents a hostile vendor/ comment from instructing the LLM to e.g. magento_read ../../home/user/.ssh/id_rsa. Both the standalone case handlers and their magento_batch counterparts share the same chokepoint.

Shell injection hardening in auto-update

src/update.js fetches the latest field from the npm registry and re-execs itself with the new version string. Previously this was interpolated into a shell command; a tampered registry response could inject shell metacharacters. As of v2.15.1:

  • The re-exec passes argv as an array to a no-shell spawner (no intermediate shell).

  • A semver-strict isSafeVersion() validator rejects any version string containing metacharacters or that does not match X.Y.Z / X.Y.Z-prerelease form.

  • Fails closed: the auto-update is silently skipped rather than run a malformed version.

Unix socket permissions

The serve-proxy Unix socket at .magector/serve.sock is created with chmod 0600 immediately after listen(). On multi-user systems, another local account can no longer connect and query the vector index (which would leak indexed source snippets). The chmod is best-effort on platforms that don't support it (logged to .magector/magector.log).

Reporting vulnerabilities

If you find a security issue, please open an issue on the GitHub repo and mark it as security-related. Do not post reproducers that leak actual source contents from private codebases.


Quick Start

Prerequisites

1. Initialize in Your Project

cd /path/to/your/magento2  # or Adobe Commerce project
npx magector init

This single command handles the entire setup:

flowchart LR
  A["npx magector init"] --> B["Verify<br/>Project"]
  B --> C["Download<br/>ONNX Model"]
  C --> D["Index<br/>Codebase"]
  D --> E["Detect IDE<br/>Cursor · Claude Code"]
  E --> E2["API Key<br/>(optional)"]
  E2 --> F["Write MCP<br/>Config"]
  F --> G["Update<br/>.gitignore"]
npx magector search "product price calculation"
npx magector search "checkout totals collector" -l 20

3. Re-index After Changes

npx magector index

4. IDE Setup Only (Skip Indexing)

npx magector setup

CLI Reference

Rust Core CLI

magector-core <COMMAND>

Commands:
  index       Index a Magento codebase
  search      Search the index semantically
  serve       Start persistent server mode (stdin/stdout JSON protocol)
  describe    Generate LLM descriptions for di.xml files (requires ANTHROPIC_API_KEY)
  validate    Run validation suite (downloads Magento if needed)
  download    Download Magento 2 Open Source
  stats       Show index statistics
  embed       Generate embedding for text

index

magector-core index [OPTIONS]

Options:
  -m, --magento-root <PATH>          Path to Magento root directory
  -d, --database <PATH>              Index database path [default: ./.magector/index.db]
  -c, --model-cache <PATH>           Model cache directory [default: ./models]
      --descriptions-db <PATH>       Path to descriptions SQLite DB (descriptions are prepended to embeddings)
  -v, --verbose                      Enable verbose output

When --descriptions-db is provided (or auto-detected as data.db next to the index), descriptions are prepended to the embedding text as "Description: {text}\n\n" before the raw file content. This places semantic terms within the 256-token ONNX window, significantly improving retrieval of di.xml files for natural-language queries.

magector-core search <QUERY> [OPTIONS]

Options:
  -d, --database <PATH>   Index database path [default: ./.magector/index.db]
  -l, --limit <N>         Number of results [default: 10]
  -f, --format <FORMAT>   Output format: text, json [default: text]

describe

magector-core describe [OPTIONS]

Options:
  -m, --magento-root <PATH>   Path to Magento root directory
  -o, --output <PATH>         Output SQLite database [default: ./.magector/data.db]
      --force                 Re-describe all files (ignore cache)

Generates natural-language descriptions of di.xml files using the Anthropic API (Claude Sonnet). Requires ANTHROPIC_API_KEY environment variable. Descriptions are stored in a SQLite database and used during indexing to enrich embeddings. Only files with changed content hashes are re-described (incremental by default).

serve

magector-core serve [OPTIONS]

Options:
  -d, --database <PATH>            Index database path [default: ./.magector/index.db]
  -c, --model-cache <PATH>         Model cache directory [default: ./models]
  -m, --magento-root <PATH>        Magento root (enables file watcher)
      --descriptions-db <PATH>     Path to descriptions SQLite DB
      --watch-interval <SECS>      File watcher poll interval [default: 60]

Starts a persistent process that reads JSON queries from stdin and writes JSON responses to stdout. Keeps the ONNX model and HNSW index resident in memory for fast repeated queries.

When --magento-root is provided, a background file watcher polls for changed files every --watch-interval seconds and incrementally re-indexes them without restart. Modified and deleted files are soft-deleted (tombstoned) in the HNSW index; new vectors are appended. When tombstoned entries exceed 20% of total vectors, the index is automatically compacted by rebuilding the HNSW graph.

Protocol (one JSON object per line):

// Request:
{"command":"search","query":"product price","limit":10}

// Response:
{"ok":true,"data":[{"id":123,"score":0.85,"metadata":{...}}]}

// Stats request:
{"command":"stats"}

// Watcher status:
{"command":"watcher_status"}
// Response:
{"ok":true,"data":{"running":true,"tracked_files":18234,"last_scan_changes":3,"interval_secs":60}}

// Descriptions (all LLM descriptions from SQLite DB):
{"command":"descriptions"}
// Response:
{"ok":true,"data":{"app/code/Magento/Catalog/etc/di.xml":{"hash":"...","description":"...","model":"claude-sonnet-4-5-20250929","timestamp":1769875137},...}}

// Describe (generate descriptions + auto-reindex affected files):
{"command":"describe"}
// Response:
{"ok":true,"data":{"files_found":371,"described":5,"skipped":366,"errors":0,"described_paths":["app/code/..."]}}

// SONA feedback:
{"command":"feedback","signals":[{"type":"refinement_to_plugin","query":"checkout totals","timestamp":1700000000000}]}
// Response:
{"ok":true,"data":{"learned":1}}

// SONA status:
{"command":"sona_status"}
// Response:
{"ok":true,"data":{"learned_patterns":5,"total_observations":12}}

Node.js CLI

npx magector init [path]        # Full setup: index + IDE config
npx magector index [path]       # Index (or re-index) Magento codebase
npx magector search <query>     # Search indexed code
npx magector describe [path]    # Generate LLM descriptions for di.xml files
npx magector stats              # Show indexer statistics
npx magector setup [path]       # IDE setup only (no indexing)
npx magector mcp                # Start MCP server
npx magector help               # Show help

The describe command and magento_describe MCP tool require an Anthropic API key. During npx magector init, you are prompted to paste your key (optional). If provided, it is stored in the MCP config file as the ANTHROPIC_API_KEY environment variable so the MCP server can use it automatically. You can also set it manually later by adding "ANTHROPIC_API_KEY": "sk-..." to the env section in .mcp.json or ~/.cursor/mcp.json.

Environment Variables

Variable

Description

Default

MAGENTO_ROOT

Path to Magento installation

Current directory

MAGECTOR_DB

Path to index database

./.magector/index.db

MAGECTOR_BIN

Path to magector-core binary

Auto-detected

MAGECTOR_MODELS

Path to ONNX model directory

~/.magector/models/

MAGECTOR_INDEX_TIMEOUT

Indexing wall-clock timeout in milliseconds. Override for very large codebases or CPU-constrained environments.

14400000 (4 h)

MAGECTOR_THREADS

Max ONNX intra-op + rayon parsing threads. Equivalent to the --threads CLI flag.

Half of CPU cores

OMP_NUM_THREADS

Fallback thread limit if MAGECTOR_THREADS is not set (de facto standard for ONNX/OpenMP).

MAGECTOR_BATCH_SIZE

Embedding batch size (higher = faster, more RAM). Equivalent to --batch-size.

256

ANTHROPIC_API_KEY

API key for description generation (describe command)

Constraining CPU usage during indexing

Indexing a large enterprise codebase (~80K files) can saturate CPU during PHASE 2 (ONNX embedding generation). To keep a developer machine responsive while indexing, lower the thread count:

npx magector index --threads 2                  # use only 2 cores for both parsing and embedding
MAGECTOR_THREADS=2 npx magector index           # equivalent via env var
OMP_NUM_THREADS=2 npx magector index            # also honored as a fallback

The --threads flag and MAGECTOR_THREADS / OMP_NUM_THREADS env vars constrain both the rayon thread pool used by PHASE 1 (parallel AST parsing) and the ONNX intra-op thread pool used by PHASE 2 (embedding inference). The active thread source is logged at startup so you can verify it took effect:

INFO Rayon global pool: 2 threads (available: 16)
INFO ONNX intra_threads: 2 (available: 16, source: --threads flag)

For very large or CPU-constrained runs, you may also need to extend the wall-clock timeout (default 4 hours):

MAGECTOR_INDEX_TIMEOUT=28800000 npx magector index --threads 2   # 8 h timeout, 2 threads

Resume after timeout or interrupt

Indexing writes a crash-safe checkpoint to disk every 50 batches (~12,800 files). If the process is killed or times out mid-run, just re-run npx magector index — it auto-resumes from the last checkpoint:

npx magector index
# ♻️  Resuming from previous run: 38400 vectors across 12200 files already indexed
# ✓ Found 79771 total files; 12200 already indexed, 67571 remaining to process

The indexer collects already-embedded file paths from the existing DB, filters them out of file discovery, preserves the existing HNSW state, and only parses/embeds the files that aren't in the DB yet. Partial resume also picks up new files added to the tree since the previous run.

To force a full rebuild (e.g. after a schema change or if you want to discard stale vectors), pass --force:

npx magector index --force

MCP Server Tools

The MCP server exposes 47 tools for AI-assisted Magento 2 and Adobe Commerce development. All search tools return structured JSON with file paths, class names, methods, role badges, and content snippets -- enabling AI clients to parse results programmatically and minimize file-read round-trips.

Output Format

All search tools return structured JSON:

{
  "results": [
    {
      "rank": 1,
      "score": 0.892,
      "path": "vendor/magento/module-catalog/Model/ProductRepository.php",
      "module": "Magento_Catalog",
      "className": "ProductRepository",
      "namespace": "Magento\\Catalog\\Model",
      "methods": ["save", "getById", "getList", "delete", "deleteById"],
      "magentoType": "repository",
      "fileType": "php",
      "badges": ["repository"],
      "snippet": "class ProductRepository implements ProductRepositoryInterface..."
    }
  ],
  "count": 1
}

Key fields:

  • methods -- list of method names in the class (avoids needing to read the file)

  • badges -- role indicators: plugin, controller, observer, repository, graphql-resolver, model, block

  • snippet -- first 300 characters of indexed content for quick assessment

Search Tools

Tool

Description

magento_search

Semantic search -- find any PHP class, method, XML config, template, or GraphQL schema by natural language

magento_find_class

Find PHP class, interface, abstract class, or trait by name

magento_find_method

Find method implementations across the codebase

Magento-Specific Finders

Tool

Description

magento_find_config

Find XML configuration (di.xml, events.xml, routes.xml, system.xml, webapi.xml, module.xml, layout)

magento_find_template

Find PHTML template files for frontend or admin rendering

magento_find_plugin

Find interceptor plugins (before/after/around methods) and di.xml declarations. Resolves plugin PHP files and extracts interceptor method signatures (v2.5)

magento_find_fieldset

Find fieldset.xml definitions controlling data copy between entities (order→quote, quote→order). Shows fields per aspect (to_order, to_edit) (v2.5)

magento_find_observer

Find event observers and events.xml declarations

magento_find_preference

Find DI preference overrides -- which class implements an interface

magento_find_controller

Find MVC controllers by frontend or admin route path

magento_find_block

Find Block classes for view rendering

magento_find_graphql

Find GraphQL schema definitions, resolvers, types, queries, and mutations

magento_find_api

Find REST/SOAP API endpoints in webapi.xml

magento_find_cron

Find cron job definitions in crontab.xml

magento_find_db_schema

Find database table definitions in db_schema.xml (declarative schema)

Flow & Dependency Tracing

Tool

Description

magento_trace_flow

Trace execution flow from an entry point (route, API, GraphQL, event, cron) -- maps controller → plugins → observers → templates with code snippets (v2.5)

magento_trace_shipping_chain

Trace the complete shipping rate chain: carriers → collectRates plugins → rate modifiers → totals collectors → fieldset mappings (v2.5)

magento_trace_dependency

Trace DI graph for a class/interface -- preferences, plugins, virtualTypes, argument overrides (parses all di.xml, no index needed)

magento_find_event_flow

Trace complete event chain: dispatchers → observers → handler PHP classes (parses events.xml + vector search)

magento_find_event_dispatchers

Find all PHP locations where a specific event is dispatched -- exact grep matching with method context and surrounding code (v2.3)

magento_find_layout

Find layout XML files by handle or content -- lists blocks, containers, and referenceBlock declarations

magento_trace_data_flow

Trace how a data attribute flows: find all setters (magic setter, setData, addData) and getters (magic getter, getData) across PHP and XML (v2.3)

magento_trace_call_chain

Trace internal method call chain: follows $this->method(), $this->dep->method(), and dispatch() calls to build an execution tree (v2.2)

Auto-detects entry type from pattern (/V1/... → API, snake_case → event, camelCase → GraphQL, path/segments → route), or override with entryType. Use depth: "shallow" (entry + config + plugins) or depth: "deep" (adds observers, layout, templates, DI preferences).

Impact & Testing

Tool

Description

magento_impact_analysis

Analyze impact of changing a class -- finds use statements, DI references, direct instantiations, and type hints across the codebase

magento_find_test

Find PHPUnit tests for a given class/method -- searches Test/ directories for coverage, mocks, and assertions

magento_find_implementors

Find all classes implementing a given PHP interface -- scans implements keywords and di.xml <preference> declarations (v2.2)

magento_find_callers

Find all call sites of a method across PHP and XML files -- ->method() and ::method() calls (v2.2)

magento_find_di_wiring

Complete DI picture for a class: preferences, plugins, constructor args, virtual types, and argument overrides from di.xml (v2.2)

Diagnostics

Tool

Description

magento_error_parser

Parse Magento error messages and map to root cause, affected files, and fix suggestions (10 known patterns)

magento_performance_profile

Profile a Magento subsystem (checkout_totals, order_place, product_save, etc.) for performance bottlenecks -- plugins, observers, and complexity hotspots

Analysis Tools

Tool

Description

magento_analyze_diff

Analyze git diffs for risk scoring and change classification

magento_complexity

Analyze cyclomatic complexity, function count, and line count

Utility Tools

Tool

Description

magento_module_structure

Show complete module structure -- controllers, models, blocks, plugins, observers, configs

magento_index

Trigger re-indexing of the codebase (also kicks off background enrichment)

magento_describe

Generate LLM descriptions for di.xml files (requires ANTHROPIC_API_KEY), stored in .magector/data.db, auto-reindexes affected files

magento_stats

View index statistics

magento_batch

Execute multiple tool queries in parallel in one MCP roundtrip. Supports all search, find, grep, read, and null-risk tools. Use to avoid N×3-5s roundtrip overhead.

magento_grep

Exact text/regex search across PHP/XML/PHTML files (grep -rn -E internally). Supports filesOnly mode (like grep -l), context lines, ignoreCase, include patterns. (v2.9)

magento_read

Read a specific file with optional methodName extraction (~10× fewer tokens than reading the whole file) and startLine/endLine range. (v2.10)

magento_trace_api

Trace REST/GraphQL API endpoint from URL to implementation: webapi.xml → service interface → DI preference → method body. One call replaces 4-5 grep+read steps. (v2.11)

magento_trace_config

Trace a config path end-to-end: system.xml admin definition → PHP classes that consume the value → actual DB values from config-data exports. Accepts exact path or keyword search. (v2.17)

magento_find_trigger

Find database triggers across the codebase

magento_find_table_usage

Find all PHP code referencing a specific database table

Null-Safety Analysis (v2.12–v2.15)

Tool

Description

magento_ast_search

Structural PHP code search using tree-sitter. Named patterns: dataobject-set-null (detect setX(null) anti-pattern), unchecked-method-chain (detect $this->dep->method() chains). Pattern arg is an enum, not free-text. Executed in Rust serve process — no external dependency. (v2.16)

magento_enrich

Build the method-chain enrichment index. Scans all vendor/ PHP files for ->firstMethod()->secondMethod() chains and detects null guards in surrounding code. Stores results in .magector/data.db (SQLite, via Rust serve). Runs automatically after magento_index. (v2.13, moved to Rust v2.16)

magento_find_null_risks

Query the enrichment index for method chains without null guards. O(1) SQLite query instead of file scanning. Pass firstMethod to filter (e.g., "getPayment" → all ->getPayment()->anything() without null guard). Requires magento_enrich. (v2.13)

magento_find_dataobject_issues

Detect setX(null) anti-pattern on Magento DataObject subclasses. setX(null) stores ['x' => null] in _datahasX() (via array_key_exists) returns true even when the value is null, creating false-positive guard conditions. Use during field-lifecycle audits or when debugging "value persists but shouldn't" bugs. Uses tree-sitter. (v2.15, tree-sitter v2.16)

Search Enhancements (v2.1)

  • Hybrid BM25+vector search -- combines text frequency scoring with semantic vector similarity for better exact class name matches

  • Query expansion -- automatically expands queries with Magento domain synonyms (plugin → interceptor, checkout → cart/quote/totals, etc.)

  • Module filtering -- moduleFilter parameter on magento_search to limit results by vendor/module pattern. Accepts a single string or array of strings. Supports wildcards, e.g., "Vendor_*" or ["Acme_PaymentGateway", "Acme_FreeShipping"]

  • Non-blocking reindex -- old index stays usable during background rebuild; new index is built to a temp path and swapped in atomically on completion

Deep Code Analysis (v2.2)

  • magento_find_implementors -- find all classes implementing a PHP interface (PHP implements + di.xml <preference>)

  • magento_find_callers -- find all call sites of a method across PHP and XML files

  • magento_find_di_wiring -- complete DI picture: preferences, plugins, constructor args, virtual types, argument overrides

  • magento_trace_call_chain -- trace internal method execution chain: $this->method(), $this->dep->method(), and dispatch() calls with event→observer resolution

Data Flow & Event Tracing (v2.3)

  • magento_trace_data_flow -- trace all setters and getters for a data attribute (magic methods, setData/getData, addData, constants, XML references). Answers "who writes/reads custom_discounted_price_incl_tax on Quote\Address?"

  • magento_find_event_dispatchers -- grep-based exact search for all PHP locations dispatching a specific event, with method context and surrounding code. Complements magento_find_event_flow with higher precision.

  • magento_find_plugin area context -- enriched output shows DI area (frontend/adminhtml/global/graphql) and explicit di.xml plugin registrations when targetClass is provided

Tool Cross-References

Each tool description includes "See also" hints to help AI clients chain tools effectively:

graph LR
  cls["find_class"] --> plg["find_plugin"]
  cls --> prf["find_preference"]
  cls --> mtd["find_method"]
  cfg["find_config"] --> obs["find_observer"]
  cfg --> prf
  cfg --> api["find_api"]
  plg --> cls
  plg --> mtd
  tpl["find_template"] --> blk["find_block"]
  blk --> tpl
  blk --> cfg
  dbs["find_db_schema"] --> cls
  gql["find_graphql"] --> cls
  gql --> mtd
  ctl["find_controller"] --> cfg
  trc["trace_flow"] -.-> ctl
  trc -.-> plg
  trc -.-> obs
  trc -.-> tpl
  trc -.-> api
  trc -.-> gql
  dep["trace_dependency"] --> prf
  dep --> plg
  evf["find_event_flow"] --> obs
  imp["impact_analysis"] --> dep
  imp --> cls
  tst["find_test"] --> cls
  err["error_parser"] --> dep
  lay["find_layout"] --> blk

  style cls fill:#4a90d9,color:#fff
  style mtd fill:#4a90d9,color:#fff
  style cfg fill:#e8a838,color:#000
  style plg fill:#d94a4a,color:#fff
  style obs fill:#d94a4a,color:#fff
  style prf fill:#e8a838,color:#000
  style api fill:#e8a838,color:#000
  style tpl fill:#68b684,color:#000
  style blk fill:#68b684,color:#000
  style dbs fill:#9b59b6,color:#fff
  style gql fill:#9b59b6,color:#fff
  style ctl fill:#4a90d9,color:#fff
  style trc fill:#2ecc71,color:#000

Query Examples

magento_search("how are checkout totals calculated")
magento_search("product price with tier pricing and catalog rules")
magento_find_class("ProductRepositoryInterface")
magento_find_method("getById")
magento_find_config("di.xml plugin for ProductRepository")
magento_find_plugin({ targetClass: "Topmenu" })
magento_find_observer("sales_order_place_after")
magento_find_preference("StoreManagerInterface")
magento_find_api("/V1/orders")
magento_find_controller("catalog/product/view")
magento_find_graphql("placeOrder")
magento_find_db_schema("sales_order")
magento_find_cron("indexer")
magento_find_block("cart totals")
magento_find_template("minicart")
magento_analyze_diff({ commitHash: "abc123" })
magento_complexity({ module: "Magento_Catalog", threshold: 10 })
magento_describe()
magento_trace_flow({ entryPoint: "checkout/cart/add", depth: "deep" })
magento_trace_flow({ entryPoint: "/V1/products" })
magento_trace_flow({ entryPoint: "placeOrder", entryType: "graphql" })
magento_trace_flow({ entryPoint: "sales_order_place_after" })
magento_trace_data_flow({ attributeKey: "custom_discounted_price_incl_tax", modelClass: "Quote\\Address" })
magento_find_event_dispatchers({ eventName: "custom_discount_rule_validation_before" })
magento_find_implementors({ interfaceName: "ProductRepositoryInterface" })
magento_find_callers({ methodName: "collectTotals", className: "TotalsCollector" })
magento_find_di_wiring({ className: "CartManagementInterface" })
magento_trace_call_chain({ className: "Magento\\Quote\\Model\\QuoteManagement", methodName: "submit" })

Supported Platforms

Pre-built binaries are provided for the following platforms:

Platform

Architecture

Package

macOS

ARM64 (Apple Silicon)

@magector/cli-darwin-arm64

Linux

x86_64

@magector/cli-linux-x64

Linux

ARM64

@magector/cli-linux-arm64

Windows

x86_64

@magector/cli-win32-x64

Note: macOS Intel (x86_64) is not supported as a pre-built binary. Intel Mac users can build from source.


Validation

Magector is validated at two levels:

  1. E2E MCP accuracy tests -- 101 queries across 16 tool categories via stdio JSON-RPC

  2. Rust-level validation -- 557 test cases across 50+ categories against Magento 2.4.7

E2E Accuracy (MCP Tools)

---
config:
  themeVariables:
    pie1: "#4caf50"
    pie2: "#f44336"
---
pie title Test Pass Rate (101 queries)
  "Passed (101)" : 101
  "Failed (0)" : 0

Metric

Value

Grade

A+ (99.2/100)

Pass rate

101/101 (100%)

Precision

98.7%

MRR

99.3%

NDCG@10

98.7%

Index size

35,795 vectors

Query time

10-45ms

Integration Tests

66 integration tests covering MCP protocol compliance, tool schemas, tool calls (including magento_describe), analysis tools, and stdout JSON integrity.

Running Tests

# E2E accuracy tests (101 queries, requires indexed codebase)
npm run test:accuracy
npm run test:accuracy:verbose

# Integration tests (66 tests)
npm test

# SONA/MicroLoRA benefit evaluation (180 queries, baseline vs post-training)
npm run test:sona-eval
npm run test:sona-eval:verbose

# Rust validation (557 test cases)
cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index

Project Structure

magector/
├── src/                          # Node.js source
│   ├── cli.js                    # CLI entry point (npx magector <command>)
│   ├── mcp-server.js             # MCP server (47 tools, structured JSON output)
│   ├── binary.js                 # Platform binary resolver
│   ├── model.js                  # ONNX model resolver/downloader
│   ├── init.js                   # Full init command (index + IDE config)
│   ├── magento-patterns.js       # Magento pattern detection (JS)
│   ├── templates/                # IDE rules templates
│   │   ├── cursorrules.js        # .cursorrules content
│   │   └── claude-md.js          # CLAUDE.md content
│   └── validation/               # JS validation suite
│       ├── validator.js
│       ├── benchmark.js
│       ├── test-queries.js
│       ├── test-data-generator.js
│       └── accuracy-calculator.js
├── tests/                        # Automated tests
│   ├── mcp-server.test.js        # Integration tests (64 tests)
│   ├── mcp-accuracy.test.js      # E2E accuracy tests (101 queries)
│   ├── mcp-sona.test.js          # SONA feedback integration tests (8 tests)
│   ├── mcp-sona-eval.test.js     # SONA/MicroLoRA benefit evaluation (180 queries)
│   ├── describe-benefit-eval.test.js  # Description enrichment benefit evaluation
│   └── results/                  # Test result artifacts
│       ├── accuracy-report.json
│       └── sona-eval-report.json
├── platforms/                    # Platform-specific binary packages
│   ├── darwin-arm64/             # macOS ARM (Apple Silicon)
│   ├── linux-x64/                # Linux x64
│   ├── linux-arm64/              # Linux ARM64
│   └── win32-x64/                # Windows x64
├── rust-core/                    # Rust high-performance core
│   ├── Cargo.toml
│   ├── src/
│   │   ├── main.rs               # Rust CLI (index, search, serve, validate)
│   │   ├── lib.rs                # Library exports
│   │   ├── indexer.rs             # Core indexing with progress output
│   │   ├── embedder.rs            # ONNX embedding (MiniLM-L6-v2)
│   │   ├── vectordb.rs            # HNSW vector database + hybrid search + tombstones
│   │   ├── watcher.rs             # File watcher for incremental re-indexing
│   │   ├── ast.rs                 # Tree-sitter AST (PHP + JS)
│   │   ├── magento.rs             # Magento pattern detection (Rust)
│   │   ├── describe.rs            # LLM description generation + SQLite storage
│   │   ├── sona.rs                # SONA feedback learning + MicroLoRA + EWC++
│   │   └── validation.rs          # 557 test cases, validation framework
│   └── models/                   # ONNX model files (auto-downloaded)
│       ├── all-MiniLM-L6-v2.onnx
│       └── tokenizer.json
├── .github/
│   └── workflows/
│       └── release.yml           # Cross-compile + publish CI
├── scripts/
│   └── setup.sh                  # Claude Code MCP setup script
├── config/
│   └── mcp-config.json           # MCP server configuration template
├── package.json
├── .gitignore
├── LICENSE
└── README.md

How It Works

1. Indexing

Magector scans every .php, .js, .xml, .phtml, and .graphqls file in a Magento 2 or Adobe Commerce codebase:

  1. AST parsing -- Tree-sitter extracts class names, namespaces, methods, inheritance, and interface implementations from PHP and JavaScript files

  2. Pattern detection -- Identifies Magento-specific patterns: controllers, models, repositories, plugins, observers, blocks, GraphQL resolvers, admin grids, cron jobs, and more

  3. Search text enrichment -- Combines AST metadata with Magento pattern keywords to create semantically rich text representations

  4. Description enrichment -- If a descriptions SQLite DB is present, LLM-generated natural-language descriptions are prepended to the embedding text as "Description: {text}\n\n", placing semantic DI concepts (preferences, plugins, virtual types, subsystem names) within the 256-token ONNX window

  5. Embedding -- ONNX Runtime generates 384-dimensional vectors using all-MiniLM-L6-v2

  6. Indexing -- Vectors are stored in an HNSW index for sub-millisecond approximate nearest neighbor search

2. Searching

  1. Query text is enriched with pattern synonyms (e.g., "controller" adds "action execute http request dispatch")

  2. The enriched query is embedded into the same 384-dimensional vector space

  3. HNSW finds the nearest neighbors by cosine similarity

  4. Hybrid reranking boosts results with keyword matches in path and search text

  5. SONA adjustment -- MicroLoRA adapts the query embedding based on learned patterns; EWC++ prevents forgetting earlier learning

  6. Results are returned as structured JSON with file path, class name, methods, role badges, and content snippet

3. Persistent Serve Mode

The MCP server spawns a persistent Rust process (magector-core serve) that keeps the ONNX model and HNSW index loaded in memory. Queries are sent as JSON over stdin and responses returned via stdout -- eliminating the ~2.6s cold-start overhead of loading the model per query. Falls back to single-shot execFileSync if the serve process is unavailable.

flowchart LR
  subgraph startup ["Startup (once)"]
    S1["Load Model"] --> S2["Load Index"] --> S3["Ready Signal"]
  end
  startup --> query
  subgraph query ["Per Query (10-45ms)"]
    Q1["stdin JSON"] --> Q2["Embed"] --> Q3["HNSW Search"] --> Q4["Rerank"] --> Q5["stdout JSON"]
  end
  subgraph fallback ["Fallback"]
    F1["execFileSync ~2.6s"]
  end

  style startup fill:#e8f4e8,color:#000
  style query fill:#e8e8f4,color:#000
  style fallback fill:#f4e8e8,color:#000

4. File Watcher (Incremental Re-indexing)

When the serve process is started with --magento-root, a background thread polls the filesystem for changes every 60 seconds (configurable via --watch-interval). Changed files are incrementally re-indexed without restarting the server.

Since hnsw_rs does not support point deletion, Magector uses a tombstone strategy: old vectors for modified/deleted files are marked as tombstoned and filtered out of search results. New vectors are appended. When tombstoned entries exceed 20% of total vectors, the HNSW graph is automatically rebuilt (compacted) to reclaim memory and restore search performance.

flowchart LR
  W1["Sleep 60s"] --> W2["Scan Filesystem"] --> W3{"Changes?"}
  W3 -->|No| W1
  W3 -->|Yes| W4["Tombstone Old Vectors"] --> W5["Parse + Embed New Files"] --> W6["Append to HNSW"] --> W7{"Tombstone > 20%?"}
  W7 -->|Yes| W8["Compact / Rebuild HNSW"] --> W9["Save to Disk"]
  W7 -->|No| W9
  W9 --> W1

  style W4 fill:#f4e8e8,color:#000
  style W5 fill:#e8f4e8,color:#000
  style W8 fill:#e8e8f4,color:#000

5. MCP Integration

The MCP server delegates all search/index operations to the Rust core binary. Analysis tools (diff, complexity) use ruvector JS modules directly.

sequenceDiagram
  participant Dev
  participant AI
  participant MCP
  participant Rust
  participant HNSW

  Dev->>AI: "checkout totals?"
  AI->>MCP: magento_search(...)
  MCP->>Rust: JSON query
  Rust->>HNSW: embed + search
  HNSW-->>Rust: candidates
  Rust-->>MCP: JSON results
  MCP-->>AI: paths, methods, badges
  AI-->>Dev: TotalsCollector.php

6. SONA Feedback Learning

The MCP server tracks sequences of tool calls and sends feedback signals to the Rust process. Over time, this adjusts search result rankings based on observed usage patterns.

How it works: The Node.js SessionTracker watches for follow-up tool calls after magento_search. If a user searches and then immediately calls magento_find_plugin, SONA learns that similar queries should boost plugin results. The learned weights are persisted to a .sona file alongside the index.

MCP Call Sequence

Signal

Effect on Future Searches

magento_searchmagento_find_plugin (within 30s)

refinement_to_plugin

Boosts plugin results

magento_searchmagento_find_class (within 30s)

refinement_to_class

Boosts class matches

magento_searchmagento_find_config (within 30s)

refinement_to_config

Boosts config/XML results

magento_searchmagento_find_observer (within 30s)

refinement_to_observer

Boosts observer results

magento_searchmagento_find_controller (within 30s)

refinement_to_controller

Boosts controller results

magento_searchmagento_find_block (within 30s)

refinement_to_block

Boosts block results

magento_searchmagento_trace_flow (within 30s)

trace_after_search

Boosts controller results

magento_search(Q1)magento_search(Q2) (within 60s)

query_refinement

Tracked for analysis

Characteristics:

  • Score adjustments are capped at ±0.15 to avoid overwhelming semantic similarity

  • Learning rate decays with repeated observations (diminishing returns)

  • Learned weights are keyed by normalized, order-independent query term hashes

  • Always active -- no feature flags or build-time opt-in required

  • Persisted via bincode to <db_path>.sona (e.g., .magector/index.db.sona)

SONA v2: MicroLoRA + EWC++

SONA v2 adds embedding-level adaptation via a MicroLoRA adapter and Elastic Weight Consolidation:

Component

Parameters

Purpose

MicroLoRA

1536 (rank-2, 2×384×2)

Adjusts query embeddings before HNSW search

EWC++

Fisher matrix (384 values)

Prevents catastrophic forgetting during online learning

  • adjust_query_embedding() applies the LoRA transform + L2 normalization before vector search; cosine similarity guard (≥0.90) skips destructive adjustments

  • learn_with_embeddings() updates LoRA weights from feedback signals with EWC regularization (λ=2000) and decaying learning rate

  • 3-tier scoring with negative learning: positive signals boost the followed feature type, mild negative learning (0.1×) demotes unrelated types

  • V1→V2 persistence format is backward-compatible (auto-upgrades on load)

cd rust-core && cargo build --release

7. LLM Description Enrichment

Magector can generate natural-language descriptions of di.xml files using the Anthropic API and embed them directly into the vector index. This significantly improves search ranking for semantic queries about dependency injection.

Workflow:

# 1. Generate descriptions (one-time, incremental — only re-describes changed files)
ANTHROPIC_API_KEY=sk-... npx magector describe /path/to/magento

# 2. Re-index with descriptions embedded into vectors
npx magector index /path/to/magento

Or via the MCP tool: magento_describe() generates descriptions and auto-reindexes affected files in one step.

How it works: Each di.xml file is sent to Claude Sonnet with a prompt optimized for semantic search retrieval. The resulting description (~70 words) is stored in a SQLite database (.magector/data.db). During indexing, descriptions are prepended to the embedding text as "Description: {text}\n\n" before the raw file content, placing semantic terms (preferences, plugins, virtual types, subsystem names) within the ONNX model's 256-token window.

Measured impact (A/B experiment, 25 queries, Magento 2.4.7, 17,891 vectors, 371 described files):

Metric

Without Descriptions

With Descriptions

Delta

Precision@K

1.6%

20.3%

+18.7%

MRR

0.031

0.330

+0.30

NDCG@10

0.037

0.369

+0.33

di.xml results/query

0.2

3.0

+2.8

Query win rate

76%


Magento Patterns Detected

mindmap
  root((Patterns))
    PHP
      Controller
      Model
      Repository
      Block
      Helper
      ViewModel
    Interception
      Plugin
      Observer
      Preference
    XML
      di.xml
      events.xml
      webapi.xml
      routes.xml
      crontab.xml
      db_schema.xml
    Frontend
      Template
      JavaScript
      GraphQL

Magector understands these Magento 2 architectural patterns:

Pattern

Detection Method

Example

Controller

Path + execute() method

Controller/Adminhtml/Order/View.php

Model

Path + extends AbstractModel

Model/Product.php

Repository

Path + implements RepositoryInterface

Model/ProductRepository.php

Block

Path + extends AbstractBlock

Block/Product/View.php

Plugin

Path + before/after/around methods

Plugin/Product/SavePlugin.php

Observer

Path + implements ObserverInterface

Observer/ProductSaveObserver.php

GraphQL Resolver

Path + implements ResolverInterface

Model/Resolver/Products.php

Helper

Path under Helper/

Helper/Data.php

Cron

Path under Cron/

Cron/CleanExpiredQuotes.php

Console Command

Path + extends Command

Console/Command/IndexerReindex.php

Data Provider

Path + DataProvider

Ui/DataProvider/Product/Listing.php

ViewModel

Path + implements ArgumentInterface

ViewModel/Product/Breadcrumbs.php

Setup Patch

Path + Patch/Data or Patch/Schema

Setup/Patch/Data/AddAttribute.php

di.xml

Path matching

etc/di.xml, etc/frontend/di.xml

events.xml

Path matching

etc/events.xml

webapi.xml

Path matching

etc/webapi.xml

layout XML

Path under layout/

view/frontend/layout/catalog_product_view.xml

Template

.phtml extension

view/frontend/templates/product/view.phtml

JavaScript

.js with AMD/ES6 detection

view/frontend/web/js/view/minicart.js

GraphQL Schema

.graphqls extension

etc/schema.graphqls


Configuration

Cursor IDE Rules

Copy .cursorrules to your Magento project root for optimized AI-assisted development. The rules instruct the AI to:

  1. Use Magector MCP tools before reading files manually

  2. Write effective semantic queries

  3. Follow Magento development patterns

  4. Interpret search results correctly

Excluding Directories (.magectorignore)

Magector automatically skips common non-project directories during indexing:

  • vendor/ — Composer dependencies (100K-500K files)

  • node_modules/ — npm packages

  • generated/ — DI-compiled files

  • var/ — cache, logs, sessions

  • pub/static/ — deployed static assets

  • dev/tests/, dev/tools/ — Magento development tools

  • Test/, Tests/, test/, tests/ — test directories

  • .git/ — version control

For project-specific exclusions, create a .magectorignore file in your Magento project root:

# .magectorignore — additional directories to exclude from Magector indexing
# One pattern per line, gitignore-like syntax

# Custom exclusions
pub/media
setup
update
phpserver
bin
lib/internal

Pattern rules:

  • Lines starting with # are comments

  • Empty lines are ignored

  • Trailing slashes are stripped (vendor/vendor)

  • Patterns without / match directory names anywhere in the tree

  • Patterns with / match relative paths from the project root

Config Data (core_config_data exports)

The magento_trace_config tool can show actual database config values alongside code analysis. Export your core_config_data table as JSON and place files in .magector/config-data/:

# MySQL 8.0+ with --json flag
mysql -u user -p magento_db -e "SELECT scope, scope_id, path, value FROM core_config_data" --json > .magector/config-data/CZ-production.json

# Older MySQL (no --json): pipe through python3
mysql -u user -p magento_db -B -e "SELECT scope, scope_id, path, value FROM core_config_data" | \
  python3 -c "import sys,json; lines=sys.stdin.read().strip().split('\n'); h=lines[0].split('\t'); \
  rows=[dict(zip(h,l.split('\t'))) for l in lines[1:]]; [r.update({'scope_id':int(r['scope_id'])}) for r in rows]; \
  json.dump(rows,sys.stdout,indent=2)" > .magector/config-data/CZ-production.json

# Or from n8n/API/any tool that produces:
# [{scope, scope_id, path, value}, ...]

File naming: Use {country}-{environment}.json, e.g.:

  • CZ-production.json

  • SK-staging.json

  • IT-production.json

When magento_trace_config traces a config path, it automatically looks up values from all available exports and shows them per environment.

Model Configuration

The ONNX model (all-MiniLM-L6-v2) is automatically downloaded on first run to ~/.magector/models/. To use a different location:

magector-core index -m /path/to/magento -c /custom/model/path

Development

Building from Source

git clone https://github.com/krejcif/magector.git
cd magector

# Install Node.js dependencies
npm install

# Build the Rust core
cd rust-core
cargo build --release
cd ..

# The CLI will automatically find the dev binary at rust-core/target/release/magector-core
node src/cli.js help

Building

# Rust core
cd rust-core
cargo build --release

# Run unit tests
cargo test

# Run validation
cargo run --release -- validate

Testing

# Integration tests (66 tests, requires indexed codebase)
npm test

# E2E accuracy tests (101 queries)
npm run test:accuracy
npm run test:accuracy:verbose

# Run without index (unit + schema tests only)
npm run test:no-index

# Rust unit tests (37 tests including SONA + descriptions)
cd rust-core && cargo test

# SONA integration tests (8 tests)
node tests/mcp-sona.test.js

# SONA/MicroLoRA benefit evaluation (180 queries)
npm run test:sona-eval

# Rust validation (557 test cases)
cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index

Adding New Magento Patterns

  1. Add pattern detection in rust-core/src/magento.rs

  2. Add search text enrichment in rust-core/src/indexer.rs

  3. Add validation test cases in rust-core/src/validation.rs

  4. Add E2E accuracy test cases in tests/mcp-accuracy.test.js

  5. Rebuild and run validation to verify:

cargo build --release
./target/release/magector-core validate -m ./magento2 --skip-index
npm run test:accuracy

Adding MCP Tools

  1. Define the tool schema in src/mcp-server.js (ListToolsRequestSchema handler)

  2. Include keyword-rich descriptions and cross-tool "See also" references

  3. Implement the handler in the CallToolRequestSchema handler

  4. Return structured JSON via formatSearchResults()

  5. Add E2E test cases in tests/mcp-accuracy.test.js

  6. Test with Claude Code or the MCP inspector


Technical Details

Embedding Model

  • Model: all-MiniLM-L6-v2

  • Dimensions: 384

  • Pooling: Mean pooling with attention mask

  • Normalization: L2 normalized

  • Runtime: ONNX Runtime (via ort crate)

Vector Index

  • Algorithm: HNSW (Hierarchical Navigable Small World)

  • Library: hnsw_rs

  • Parameters: M=32, max_layers=16, ef_construction=200

  • Distance metric: Cosine similarity

  • Hybrid search: Semantic nearest-neighbor + keyword reranking in path and search text + SONA/MicroLoRA feedback adjustments

  • Incremental updates: Tombstone soft-delete + periodic HNSW rebuild (compact)

  • Persistence: Bincode V2 binary serialization (backward-compatible with V1)

Index Structure

Each indexed file produces a vector entry with metadata:

struct IndexMetadata {
    path: String,
    file_type: String,          // php, xml, js, template, graphql
    magento_type: String,       // controller, model, block, plugin, ...
    class_name: Option<String>,
    namespace: Option<String>,
    methods: Vec<String>,       // extracted method names
    search_text: String,        // enriched searchable text
    is_controller: bool,
    is_plugin: bool,
    is_observer: bool,
    is_model: bool,
    is_block: bool,
    is_repository: bool,
    is_resolver: bool,
    // ... 20+ pattern flags
}

Performance Characteristics

Operation

Time

Notes

Full index (36K vectors)

~1 min

Parallel parsing + batched ONNX embedding

Single query (warm)

10-45ms

Persistent serve process, HNSW + rerank

Single query (cold)

~2.6s

Includes ONNX model + index load

Embedding generation

~2ms

ONNX Runtime with CoreML/CUDA

Batch embedding (32)

~30ms

Batched ONNX inference

Model load

~500ms

One-time at startup

Index save/load

<1s

Bincode binary serialization

Performance Optimizations

  • Persistent serve mode -- Rust process keeps ONNX model + HNSW index in memory via stdin/stdout JSON protocol

  • Query cache -- LRU cache (200 entries) avoids re-embedding identical queries

  • Hybrid reranking -- combines semantic similarity with keyword matching for better precision

  • Batched ONNX embedding -- 32 texts per inference call (vs. 1-at-a-time), 3-5x faster embedding

  • Dynamic thread scaling -- ONNX intra-op threads scale to CPU core count

  • Thread-local AST parsers -- each rayon thread gets its own tree-sitter parser (no mutex contention)

  • Bincode persistence -- binary serialization replaces JSON (3-5x faster save/load, ~5x smaller files)

  • Adaptive HNSW capacity -- pre-sized to actual vector count

  • Parallel HNSW insert -- batch insert uses hnsw_rs parallel insertion on load and index

  • Tuned ef_search -- optimized search parameters for 36K vector index (ef_search=50 for search, 64 for hybrid)

  • SONA feedback learning -- learns from MCP tool call patterns to adjust search rankings; MicroLoRA adapts query embeddings, EWC++ prevents forgetting


Roadmap

gantt
  title Roadmap
  dateFormat YYYY-MM
  axisFormat %b
  section Done
    Hybrid search       :done, 2025-01, 30d
    Serve mode          :done, 2025-02, 30d
    JSON output         :done, 2025-03, 15d
    Cross-tool hints    :done, 2025-03, 15d
    E2E tests           :done, 2025-03, 15d
    Adobe Commerce      :done, 2025-03, 15d
  section Next
    SONA feedback       :done, 2025-04, 30d
    Incremental index   :done, 2025-04, 30d
    SONA v2 MicroLoRA   :done, 2025-05, 15d
    LLM descriptions    :done, 2025-06, 30d
    Method chunking     :active, 2025-07, 30d
    Intent detection    :2025-08, 30d
    Type filtering      :2025-09, 30d
  section Future
    VSCode extension    :2025-10, 60d
    Web UI              :2025-12, 60d
  • Hybrid search (semantic + keyword re-ranking)

  • Persistent serve mode (eliminates cold-start latency)

  • Structured JSON output (methods, badges, snippets)

  • Cross-tool discovery hints for AI clients

  • E2E accuracy test suite (101 queries)

  • Adobe Commerce support (B2B, Staging, and all Commerce-specific modules)

  • SONA feedback learning (search rankings adapt to MCP tool call patterns)

  • SONA v2 with MicroLoRA + EWC++ (embedding-level adaptation, prevents catastrophic forgetting)

  • LLM description enrichment (generate di.xml descriptions via Claude, store in SQLite, embed into vectors for improved search ranking)

  • Method-level chunking (per-method vectors for direct method search)

  • Query intent classification (auto-detect "give me XML" vs "give me PHP")

  • Filtered search by file type at the vector level

  • Incremental indexing (background file watcher with tombstone + compact strategy)

  • VSCode extension

  • Web UI for browsing results


Troubleshooting

All MCP server activity is logged to .magector/magector.log in the Magento project root. The log persists across MCP restarts and uses the format:

[2026-04-12T18:30:00.000Z] [LEVEL] message

Log Levels

Level

Meaning

INFO

Normal operations: startup config, tool completion, search fallbacks, enrichment progress

WARN

Recoverable issues: slow grep queries (>5s), missing data.db, file read errors, serve process disconnects

ERR

Failures: AST query errors, transaction rollbacks, serve process errors, tool execution errors

REQ

Every tool call with full input parameters (JSON)

RES

Tool completion with elapsed time in milliseconds

QUERY

Rust serve process queries (search, feedback)

CACHE

Search cache hits

INDEX

Background reindex progress

SERVE

Rust serve process stderr (watcher events, model loading)

FATAL

Server startup failures

Common Diagnostic Commands

# Recent errors
grep '\[ERR\]\|\[FATAL\]' .magector/magector.log | tail -20

# Tool timing (find slow tools)
grep '\[RES\]' .magector/magector.log | tail -20

# Enrichment/null-risk analysis
grep 'enrich:\|null_risks:' .magector/magector.log | tail -20

# AST search (tree-sitter) issues
grep 'ast_search:' .magector/magector.log | tail -20

# Batch query breakdown (per-tool timing)
grep 'batch\[' .magector/magector.log | tail -20

# Slow grep queries
grep 'grep: slow\|grep: timed' .magector/magento.log | tail -20

# Full startup sequence
grep 'server starting\|Config:\|primary\|Serve process' .magector/magector.log | tail -30

What Gets Logged (v2.14+)

Every tool call logs [REQ] with input parameters and [RES] with elapsed time. Additionally:

  • magento_ast_search — tree-sitter pattern, target path, execution time, result count, query errors

  • magento_enrich — file count, progress every 10k files, read errors, transaction failures, final summary

  • magento_find_null_risks — query parameters, result count, query timing, missing DB warnings

  • magento_batch — query list on entry, per-sub-tool timing and errors

  • magento_grep — slow query warnings (>5s), timeout detection

  • magento_read — file-not-found with error codes, failed method extractions


License

MIT License. See LICENSE for details.


Contributing

Contributions are welcome. Please:

  1. Fork the repository

  2. Create a feature branch (git checkout -b feature/improvement)

  3. Add tests for new functionality

  4. Run validation to ensure accuracy doesn't regress: npm run test:accuracy

  5. Submit a pull request


Built with Rust and Node.js for the Magento and Adobe Commerce community.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
0dRelease cycle
2Releases (12mo)

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/krejcif/magector'

If you have feedback or need assistance with the MCP directory API, please join our Discord server