chuk-mcp-lazarus
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@chuk-mcp-lazarusLoad gpt2 and generate a sentence."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
chuk-mcp-lazarus
Mechanistic interpretability MCP server wrapping chuk-lazarus.
Load any model, extract activations, train probes, steer generation, and ablate components -- all via MCP tools that Claude (or any MCP client) can call autonomously.
Quick Start
# Clone and install
git clone https://github.com/chuk-ai/chuk-mcp-lazarus.git
cd chuk-mcp-lazarus
uv sync
# Run the smoke test (53 tests on SmolLM2-135M, ~3 seconds)
uv run python examples/smoke_test.py
# Run the full 15-step language transition demo
uv run python examples/language_transition_demo.pyRelated MCP server: MCP 101 Example
Claude Desktop
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"lazarus": {
"command": "uv",
"args": ["run", "chuk-mcp-lazarus", "stdio"],
"cwd": "/path/to/chuk-mcp-lazarus"
}
}
}Tools (64)
Group | Tool | Purpose |
Model |
| Load any HuggingFace model into memory |
Model |
| Return architecture metadata |
Generation |
| Generate text from the loaded model |
Generation |
| Top-k next-token predictions with probabilities |
Generation |
| Show how text is tokenized |
Generation |
| Layer-by-layer prediction evolution (calibrated logit lens) |
Generation |
| Track a specific token's probability across layers |
Generation |
| Race N candidate tokens across layers with crossing detection |
Generation |
| Find nearest tokens in embedding space (cosine similarity) |
Activations |
| Hidden states at specific layers and positions |
Activations |
| Cosine similarity + PCA across prompts |
Attention |
| Per-head attention weights at specified layers |
Attention |
| Per-head entropy and focus analysis |
Probing |
| Train a classifier on activations |
Probing |
| Evaluate on held-out data |
Probing |
| Find the crossover layer |
Probing |
| Run a trained probe during autoregressive generation |
Probing |
| List all trained probes |
Steering |
| Contrastive activation addition |
Steering |
| Generate with steering applied |
Steering |
| List all computed vectors |
Ablation |
| Zero out layers, measure disruption |
Ablation |
| Swap activations between prompts |
Causal |
| Which layers are causally necessary for a prediction |
Causal |
| Position × layer causal heatmap (Meng et al. style) |
Residual |
| Attention vs MLP contribution per layer |
Residual |
| Representation similarity and cluster separation across layers |
Residual |
| Direct logit attribution: per-layer component contributions to predicted token |
Residual |
| Per-head logit attribution: which attention heads push toward the target token |
Residual |
| Per-neuron MLP identification: which neurons push toward the target token |
Attribution |
| Batch logit attribution across prompts with per-prompt summary |
Intervention |
| Zero/scale attention, FFN, or individual heads at a layer |
Neuron |
| Auto-find neurons that discriminate between prompt groups |
Neuron |
| Profile specific neurons: activation stats across prompts |
Neuron |
| Trace a neuron's influence through downstream layers |
Direction |
| Find directions via mean-diff, LDA, PCA, or probe weights |
Experiment |
| Create a named experiment for result persistence |
Experiment |
| Add a step result to an experiment |
Experiment |
| Retrieve an experiment and its results |
Experiment |
| List all saved experiments |
Comparison |
| Load a second model for side-by-side analysis |
Comparison |
| Frobenius norm + cosine sim per layer per component |
Comparison |
| Per-layer activation divergence across prompts |
Comparison |
| Per-head JS divergence in attention patterns |
Comparison |
| Side-by-side text output from both models |
Comparison |
| Free VRAM from comparison model |
Geometry |
| Angles between token unembed vectors and residual stream at a layer |
Geometry |
| Pairwise angles between any directions (tokens, neurons, heads, residual, FFN, attention, steering vectors) |
Geometry |
| Decompose a vector into basis direction components + orthogonal residual |
Geometry |
| Track residual rotation through layers by angles to reference tokens |
Geometry |
| PCA spectrum + classification-by-dimension for a feature |
Geometry |
| Decode residual stream into vocabulary space: raw vs normalised rankings, gap analysis, mean direction |
Geometry |
| Complete prediction flow: geometry, attribution, logit lens race, top heads/neurons in one call |
Geometry |
| Inject donor residual into recipient at a layer and continue generation (Markov property test). |
Geometry |
| Find candidate prompts with most similar residual streams to a target at a layer |
Geometry |
| PCA subspace from model activations across varied prompts — stores basis in SubspaceRegistry |
Geometry |
| List all named PCA subspaces stored in the SubspaceRegistry |
Geometry |
| Map residual stream via PCA on diverse prompts: variance spectrum, vocab-decoded principal components |
Geometry |
| Map supply side: head/neuron push directions through unembedding, effective supply rank |
Geometry |
| Compact per-layer variance spectrum across the full model (no vocab projection) |
Geometry |
| Non-collapsing superposition: inject donor residual into multiple templates, evolve independently, collapse to highest confidence |
Geometry |
| All-position subspace replacement: swap entity subspace at every position while preserving orthogonal complement (donor/coordinates/lookup modes) |
Geometry |
| Precompute dark coordinate lookup table: project reference prompts onto a subspace for zero-pass injection |
Geometry |
| List all dark tables in the DarkTableRegistry |
Resources (4)
URI | Description |
| Current model metadata |
| All trained probes and accuracy metrics |
| All computed steering vectors |
| Comparison model state |
Supported Models
Works with any model chuk-lazarus supports:
Gemma -- Gemma 3 (270M--27B), TranslateGemma 4B/12B
Llama -- Llama 2/3, Mistral, SmolLM2
Qwen -- Qwen 2/3
Granite -- IBM Granite 3.x/4.x (hybrid Mamba-2/Transformer)
Jamba -- AI21 Jamba (hybrid Mamba-Transformer MoE)
Mamba -- Pure SSM models
StarCoder2 -- Code generation
GPT-2 -- GPT-2 and compatible
Default demo target: TranslateGemma 4B (34 layers, fits on Apple Silicon). Smoke tests use SmolLM2-135M for speed.
Demos
Script | Tools Covered | Default Model |
| 17 tools -- flagship 15-step workflow (probing, steering, causal tracing) | gemma-3-4b-it |
| 8 tools -- two-model comparison (Gemma 3 vs TranslateGemma) | gemma-3-4b-it |
| 8 tools -- full interpretability pipeline (logit attribution → heads → neurons) | SmolLM2-135M |
| 3 tools -- batch attribution with prompt summary tables | SmolLM2-135M |
| 1 tool -- multi-candidate logit trajectory with crossing detection | SmolLM2-135M |
| 1 tool -- surgical component intervention (zero/scale attention, FFN) | SmolLM2-135M |
| 4 tools -- experiment persistence (create, add results, retrieve, list) | SmolLM2-135M |
| 4 tools -- layer ablation and activation patching | SmolLM2-135M |
| 4 tools -- attention patterns and head entropy analysis | SmolLM2-135M |
| 4 tools -- residual decomposition and layer clustering | SmolLM2-135M |
| 3 tools -- direct logit attribution (knowledge localization) | SmolLM2-135M |
| 3 tools -- causal tracing (observation vs intervention) | SmolLM2-135M |
| 6 tools -- angles, trajectories, dimensionality in activation space | SmolLM2-135M |
| 12 tools -- PCA subspaces, residual injection, surgery, dark tables | SmolLM2-135M |
| 8 tools -- copy circuit hypothesis (DLA, head output, KV vectors) | SmolLM2-135M |
| 7 tools -- direction extraction, steering, probing | SmolLM2-135M |
| 4 tools -- neuron discovery, analysis, and downstream tracing | SmolLM2-135M |
| 53 tests -- validates all tools with error envelope coverage | SmolLM2-135M |
The Demo: Language Transition Probing
The flagship experiment follows a 15-step workflow:
Load model --
load_model("google/gemma-3-4b-it")Inspect architecture --
get_model_info()reveals 34 layersTokenize -- see how the prompt breaks into tokens
Generate text -- see baseline model output
Sanity-check activations -- verify activations are non-trivial
Compare at early layer -- language representations are distinct
Compare at late layer -- representations converge
Logit lens -- see how predictions evolve through layers
Track token -- watch a specific token's probability rise across layers
Scan probes across layers -- find where language identity becomes decodable
Evaluate best probe -- confirm on held-out data
Compute steering vector -- French-to-German direction
Steer generation -- redirect a French translation to German
Alpha sweep -- iterate with different steering strengths
Causal tracing -- prove which layers are necessary for the prediction
Run it: uv run python examples/language_transition_demo.py
The Demo: Model Comparison
Compare a base model against its fine-tuned variant. First see actual
output differences with compare_generations, then find where
fine-tuning changed weights, activations, and attention patterns.
Designed for Gemma 3 4B vs TranslateGemma 4B using low-resource
languages (Icelandic, Swahili, Estonian, Marathi) where TranslateGemma
shows 25-30% improvement
Run it: uv run python examples/comparison_demo.py
Architecture
See ARCHITECTURE.md for the 10 design principles.
Key points:
Async-native -- all tools are
async def, CPU-bound work wrapped inasyncio.to_threadPydantic-native -- every data structure is a typed
BaseModelModel-agnostic -- works with 9+ model families
Error envelopes -- tools never raise; always return structured errors
JSON-safe boundary -- MLX arrays converted at the tool return
Project Structure
src/chuk_mcp_lazarus/
├── server.py # ChukMCPServer instance
├── main.py # Entry point (stdio / http)
├── model_state.py # ModelState singleton
├── probe_store.py # ProbeRegistry singleton
├── steering_store.py # SteeringVectorRegistry singleton
├── comparison_state.py # ComparisonState singleton (2nd model)
├── experiment_store.py # ExperimentStore singleton
├── subspace_registry.py # SubspaceRegistry singleton
├── dark_table_registry.py # DarkTableRegistry singleton
├── resources.py # MCP resources (4 resources)
├── errors.py # Error types + envelope helper (17 error types)
├── _bootstrap.py # Optional dependency stubs
├── _serialize.py # MLX/NumPy -> JSON-safe
├── _generate.py # Shared text generation
├── _compare.py # Shared comparison kernels
├── _extraction.py # Shared activation extraction
├── _residual_helpers.py # Shared residual-stream helpers
└── tools/
├── model/ # load_model, get_model_info
├── generation/ # generate_text, predict_next_token, tokenize,
│ # logit_lens, track_token, track_race, embedding_neighbors
├── activation/ # extract_activations, compare_activations
├── attention/ # attention_pattern, attention_heads
├── residual/ # residual_decomposition, layer_clustering,
│ # logit_attribution, head_attribution, top_neurons
├── neuron/ # discover_neurons, analyze_neuron, neuron_trace
├── probe/ # train_probe, evaluate_probe, scan_probe_across_layers,
│ # probe_at_inference, list_probes
├── steering/ # compute_steering_vector, steer_and_generate,
│ # list_steering_vectors, extract_direction
├── causal/ # trace_token, full_causal_trace,
│ # ablate_layers, patch_activations
├── comparison/ # load_comparison_model, compare_weights,
│ # compare_representations, compare_attention,
│ # compare_generations, unload_comparison_model
├── attribution/ # attribution_sweep
├── intervention/ # component_intervention
├── experiment/ # create_experiment, add_experiment_result,
│ # get_experiment, list_experiments
└── geometry/ # Geometry tools (per-tool subpackage, 18+ tools)
├── _helpers.py # Shared enums, math, direction extraction
├── _injection_helpers.py # Shared injection/generation helpers
└── (one .py per tool)Development
# Install with dev dependencies
uv sync --extra dev
# Run smoke tests
uv run python examples/smoke_test.py
# Run with a different model
uv run python examples/smoke_test.py --model TinyLlama/TinyLlama-1.1B-Chat-v1.0
# HTTP mode for development
uv run chuk-mcp-lazarus http --port 8765Requirements
Python >= 3.11
Apple Silicon Mac (for MLX)
chuk-lazarus >= 0.4
chuk-mcp-server >= 0.25
License
Apache 2.0
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/chrishayuk/chuk-mcp-lazarus'
If you have feedback or need assistance with the MCP directory API, please join our Discord server