How do I use chuk-mcp-lazarus?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@chuk-mcp-lazarus Load gpt2 and generate a sentence." That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

chuk-mcp-lazarus

by chrishayuk

Overview Schema Related Servers Score Discussions

Python

Local

chuk-mcp-lazarus

Mechanistic interpretability MCP server wrapping chuk-lazarus.

Load any model, extract activations, train probes, steer generation, and ablate components -- all via MCP tools that Claude (or any MCP client) can call autonomously.

Quick Start

# Clone and install
git clone https://github.com/chuk-ai/chuk-mcp-lazarus.git
cd chuk-mcp-lazarus
uv sync

# Run the smoke test (53 tests on SmolLM2-135M, ~3 seconds)
uv run python examples/smoke_test.py

# Run the full 15-step language transition demo
uv run python examples/language_transition_demo.py

Related MCP server: mhlabs-mcp-tools

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "lazarus": {
      "command": "uv",
      "args": ["run", "chuk-mcp-lazarus", "stdio"],
      "cwd": "/path/to/chuk-mcp-lazarus"
    }
  }
}

Tools (64)

Group	Tool	Purpose
Model	`load_model`	Load any HuggingFace model into memory
Model	`get_model_info`	Return architecture metadata
Generation	`generate_text`	Generate text from the loaded model
Generation	`predict_next_token`	Top-k next-token predictions with probabilities
Generation	`tokenize`	Show how text is tokenized
Generation	`logit_lens`	Layer-by-layer prediction evolution (calibrated logit lens)
Generation	`track_token`	Track a specific token's probability across layers
Generation	`track_race`	Race N candidate tokens across layers with crossing detection
Generation	`embedding_neighbors`	Find nearest tokens in embedding space (cosine similarity)
Activations	`extract_activations`	Hidden states at specific layers and positions
Activations	`compare_activations`	Cosine similarity + PCA across prompts
Attention	`attention_pattern`	Per-head attention weights at specified layers
Attention	`attention_heads`	Per-head entropy and focus analysis
Probing	`train_probe`	Train a classifier on activations
Probing	`evaluate_probe`	Evaluate on held-out data
Probing	`scan_probe_across_layers`	Find the crossover layer
Probing	`probe_at_inference`	Run a trained probe during autoregressive generation
Probing	`list_probes`	List all trained probes
Steering	`compute_steering_vector`	Contrastive activation addition
Steering	`steer_and_generate`	Generate with steering applied
Steering	`list_steering_vectors`	List all computed vectors
Ablation	`ablate_layers`	Zero out layers, measure disruption
Ablation	`patch_activations`	Swap activations between prompts
Causal	`trace_token`	Which layers are causally necessary for a prediction
Causal	`full_causal_trace`	Position × layer causal heatmap (Meng et al. style)
Residual	`residual_decomposition`	Attention vs MLP contribution per layer
Residual	`layer_clustering`	Representation similarity and cluster separation across layers
Residual	`logit_attribution`	Direct logit attribution: per-layer component contributions to predicted token
Residual	`head_attribution`	Per-head logit attribution: which attention heads push toward the target token
Residual	`top_neurons`	Per-neuron MLP identification: which neurons push toward the target token
Attribution	`attribution_sweep`	Batch logit attribution across prompts with per-prompt summary
Intervention	`component_intervention`	Zero/scale attention, FFN, or individual heads at a layer
Neuron	`discover_neurons`	Auto-find neurons that discriminate between prompt groups
Neuron	`analyze_neuron`	Profile specific neurons: activation stats across prompts
Neuron	`neuron_trace`	Trace a neuron's influence through downstream layers
Direction	`extract_direction`	Find directions via mean-diff, LDA, PCA, or probe weights
Experiment	`create_experiment`	Create a named experiment for result persistence
Experiment	`add_experiment_result`	Add a step result to an experiment
Experiment	`get_experiment`	Retrieve an experiment and its results
Experiment	`list_experiments`	List all saved experiments
Comparison	`load_comparison_model`	Load a second model for side-by-side analysis
Comparison	`compare_weights`	Frobenius norm + cosine sim per layer per component
Comparison	`compare_representations`	Per-layer activation divergence across prompts
Comparison	`compare_attention`	Per-head JS divergence in attention patterns
Comparison	`compare_generations`	Side-by-side text output from both models
Comparison	`unload_comparison_model`	Free VRAM from comparison model
Geometry	`token_space`	Angles between token unembed vectors and residual stream at a layer
Geometry	`direction_angles`	Pairwise angles between any directions (tokens, neurons, heads, residual, FFN, attention, steering vectors)
Geometry	`subspace_decomposition`	Decompose a vector into basis direction components + orthogonal residual
Geometry	`residual_trajectory`	Track residual rotation through layers by angles to reference tokens
Geometry	`feature_dimensionality`	PCA spectrum + classification-by-dimension for a feature
Geometry	`decode_residual`	Decode residual stream into vocabulary space: raw vs normalised rankings, gap analysis, mean direction
Geometry	`computation_map`	Complete prediction flow: geometry, attribution, logit lens race, top heads/neurons in one call
Geometry	`inject_residual`	Inject donor residual into recipient at a layer and continue generation (Markov property test). `donor_layer` captures from a different layer than injection point
Geometry	`residual_match`	Find candidate prompts with most similar residual streams to a target at a layer
Geometry	`compute_subspace`	PCA subspace from model activations across varied prompts — stores basis in SubspaceRegistry
Geometry	`list_subspaces`	List all named PCA subspaces stored in the SubspaceRegistry
Geometry	`residual_atlas`	Map residual stream via PCA on diverse prompts: variance spectrum, vocab-decoded principal components
Geometry	`weight_geometry`	Map supply side: head/neuron push directions through unembedding, effective supply rank
Geometry	`residual_map`	Compact per-layer variance spectrum across the full model (no vocab projection)
Geometry	`branch_and_collapse`	Non-collapsing superposition: inject donor residual into multiple templates, evolve independently, collapse to highest confidence
Geometry	`subspace_surgery`	All-position subspace replacement: swap entity subspace at every position while preserving orthogonal complement (donor/coordinates/lookup modes)
Geometry	`build_dark_table`	Precompute dark coordinate lookup table: project reference prompts onto a subspace for zero-pass injection
Geometry	`list_dark_tables`	List all dark tables in the DarkTableRegistry

Resources (4)

URI	Description
`model://info`	Current model metadata
`probes://registry`	All trained probes and accuracy metrics
`vectors://registry`	All computed steering vectors
`comparisons://state`	Comparison model state

Supported Models

Works with any model chuk-lazarus supports:

Gemma -- Gemma 3 (270M--27B), TranslateGemma 4B/12B
Llama -- Llama 2/3, Mistral, SmolLM2
Qwen -- Qwen 2/3
Granite -- IBM Granite 3.x/4.x (hybrid Mamba-2/Transformer)
Jamba -- AI21 Jamba (hybrid Mamba-Transformer MoE)
Mamba -- Pure SSM models
StarCoder2 -- Code generation
GPT-2 -- GPT-2 and compatible

Default demo target: TranslateGemma 4B (34 layers, fits on Apple Silicon). Smoke tests use SmolLM2-135M for speed.

Demos

Script	Tools Covered	Default Model
`language_transition_demo.py`	17 tools -- flagship 15-step workflow (probing, steering, causal tracing)	gemma-3-4b-it
`comparison_demo.py`	8 tools -- two-model comparison (Gemma 3 vs TranslateGemma)	gemma-3-4b-it
`deep_dive_demo.py`	8 tools -- full interpretability pipeline (logit attribution → heads → neurons)	SmolLM2-135M
`attribution_sweep_demo.py`	3 tools -- batch attribution with prompt summary tables	SmolLM2-135M
`track_race_demo.py`	1 tool -- multi-candidate logit trajectory with crossing detection	SmolLM2-135M
`intervention_demo.py`	1 tool -- surgical component intervention (zero/scale attention, FFN)	SmolLM2-135M
`experiment_demo.py`	4 tools -- experiment persistence (create, add results, retrieve, list)	SmolLM2-135M
`ablation_demo.py`	4 tools -- layer ablation and activation patching	SmolLM2-135M
`attention_demo.py`	4 tools -- attention patterns and head entropy analysis	SmolLM2-135M
`residual_stream_demo.py`	4 tools -- residual decomposition and layer clustering	SmolLM2-135M
`logit_attribution_demo.py`	3 tools -- direct logit attribution (knowledge localization)	SmolLM2-135M
`causal_tracing_demo.py`	3 tools -- causal tracing (observation vs intervention)	SmolLM2-135M
`geometry_demo.py`	6 tools -- angles, trajectories, dimensionality in activation space	SmolLM2-135M
`subspace_demo.py`	12 tools -- PCA subspaces, residual injection, surgery, dark tables	SmolLM2-135M
`copy_circuit_demo.py`	8 tools -- copy circuit hypothesis (DLA, head output, KV vectors)	SmolLM2-135M
`direction_demo.py`	7 tools -- direction extraction, steering, probing	SmolLM2-135M
`neuron_demo.py`	4 tools -- neuron discovery, analysis, and downstream tracing	SmolLM2-135M
`smoke_test.py`	53 tests -- validates all tools with error envelope coverage	SmolLM2-135M

The Demo: Language Transition Probing

The flagship experiment follows a 15-step workflow:

Load model -- load_model("google/gemma-3-4b-it")
Inspect architecture -- get_model_info() reveals 34 layers
Tokenize -- see how the prompt breaks into tokens
Generate text -- see baseline model output
Sanity-check activations -- verify activations are non-trivial
Compare at early layer -- language representations are distinct
Compare at late layer -- representations converge
Logit lens -- see how predictions evolve through layers
Track token -- watch a specific token's probability rise across layers
Scan probes across layers -- find where language identity becomes decodable
Evaluate best probe -- confirm on held-out data
Compute steering vector -- French-to-German direction
Steer generation -- redirect a French translation to German
Alpha sweep -- iterate with different steering strengths
Causal tracing -- prove which layers are necessary for the prediction

Run it: uv run python examples/language_transition_demo.py

The Demo: Model Comparison

Compare a base model against its fine-tuned variant. First see actual output differences with compare_generations, then find where fine-tuning changed weights, activations, and attention patterns. Designed for Gemma 3 4B vs TranslateGemma 4B using low-resource languages (Icelandic, Swahili, Estonian, Marathi) where TranslateGemma shows 25-30% improvement

Run it: uv run python examples/comparison_demo.py

Architecture

See ARCHITECTURE.md for the 10 design principles.

Key points:

Async-native -- all tools are async def, CPU-bound work wrapped in asyncio.to_thread
Pydantic-native -- every data structure is a typed BaseModel
Model-agnostic -- works with 9+ model families
Error envelopes -- tools never raise; always return structured errors
JSON-safe boundary -- MLX arrays converted at the tool return

Project Structure

src/chuk_mcp_lazarus/
├── server.py            # ChukMCPServer instance
├── main.py              # Entry point (stdio / http)
├── model_state.py       # ModelState singleton
├── probe_store.py       # ProbeRegistry singleton
├── steering_store.py    # SteeringVectorRegistry singleton
├── comparison_state.py  # ComparisonState singleton (2nd model)
├── experiment_store.py  # ExperimentStore singleton
├── subspace_registry.py # SubspaceRegistry singleton
├── dark_table_registry.py # DarkTableRegistry singleton
├── resources.py         # MCP resources (4 resources)
├── errors.py            # Error types + envelope helper (17 error types)
├── _bootstrap.py        # Optional dependency stubs
├── _serialize.py        # MLX/NumPy -> JSON-safe
├── _generate.py         # Shared text generation
├── _compare.py          # Shared comparison kernels
├── _extraction.py       # Shared activation extraction
├── _residual_helpers.py # Shared residual-stream helpers
└── tools/
    ├── model/               # load_model, get_model_info
    ├── generation/          # generate_text, predict_next_token, tokenize,
    │                        #   logit_lens, track_token, track_race, embedding_neighbors
    ├── activation/          # extract_activations, compare_activations
    ├── attention/           # attention_pattern, attention_heads
    ├── residual/            # residual_decomposition, layer_clustering,
    │                        #   logit_attribution, head_attribution, top_neurons
    ├── neuron/              # discover_neurons, analyze_neuron, neuron_trace
    ├── probe/               # train_probe, evaluate_probe, scan_probe_across_layers,
    │                        #   probe_at_inference, list_probes
    ├── steering/            # compute_steering_vector, steer_and_generate,
    │                        #   list_steering_vectors, extract_direction
    ├── causal/              # trace_token, full_causal_trace,
    │                        #   ablate_layers, patch_activations
    ├── comparison/          # load_comparison_model, compare_weights,
    │                        #   compare_representations, compare_attention,
    │                        #   compare_generations, unload_comparison_model
    ├── attribution/         # attribution_sweep
    ├── intervention/        # component_intervention
    ├── experiment/          # create_experiment, add_experiment_result,
    │                        #   get_experiment, list_experiments
    └── geometry/            # Geometry tools (per-tool subpackage, 18+ tools)
        ├── _helpers.py          # Shared enums, math, direction extraction
        ├── _injection_helpers.py # Shared injection/generation helpers
        └── (one .py per tool)

Development

# Install with dev dependencies
uv sync --extra dev

# Run smoke tests
uv run python examples/smoke_test.py

# Run with a different model
uv run python examples/smoke_test.py --model TinyLlama/TinyLlama-1.1B-Chat-v1.0

# HTTP mode for development
uv run chuk-mcp-lazarus http --port 8765

Requirements

Python >= 3.11
Apple Silicon Mac (for MLX)
chuk-lazarus >= 0.4
chuk-mcp-server >= 0.25

License

Apache 2.0

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

0dRelease cycle

10Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/chrishayuk/chuk-mcp-lazarus'

If you have feedback or need assistance with the MCP directory API, please join our Discord server