# AI Agent: Creating ARCHITECTURE.md
## Mission
Create comprehensive ARCHITECTURE.md for any repo (any language/scale/type). Help contributors understand structure, components, architectural decisions quickly.
**Principles:** 1) Identify first 2) Ask when unclear 3) Adapt approach 4) Examples are examples - adapt to your language
## Tools Reference
| Tool | Use | Key Params | Default Behavior | Pattern |
|------|-----|------------|------------------|---------|
| **local_view_structure** | Initial exploration | `depth=1-2`, `entriesPerPage=20` (max 100), `entryPageNumber` | **Auto-sorted by time** (recent first), paginated | Overview → Paginate → Drill |
| **local_ripgrep** | Content search (MOST POWERFUL) | `mode="discovery"`, `filesPerPage=10` (max 50), `matchesPerPage=10` (max 100) | **Auto-sorted by time**, two-level pagination (files + matches) | Discovery → Paginate → Detailed |
| **local_find_files** | Metadata search | `modifiedWithin="7d"`, `filesPerPage=20` (max 100), `filePageNumber` | **ALWAYS sorted by time** (recent first), auto-paginated | Time-filter → Paginate results |
| **local_fetch_content** | File reader | `matchString`, `matchStringContextLines=5`, `charLength=10000` | Extracts matches with context, paginated for large files | Discovery → Extract → Read |
## Research Context
3-level hierarchy: `mainResearchGoal` (session objective) → `researchGoal` (sub-goal) → `reasoning` (why this helps)
**Patterns:** Parallel (1 call, 3-5 queries, shared main goal) | Sequential (evolve from hints)
```javascript
mainResearchGoal: "Document auth" // Example
queries: [{researchGoal: "Find middleware", reasoning: "Entry", pattern: "FastAPI\\("}...]
```
## Decision Gates (ASK USER)
1. Can't determine type → "I see X,Y,Z. Is this [A] or [B]?"
2. Monorepo >10 packages → "Document: entire/specific package/high-level only?"
3. Found >5 entry points → "Which is primary?"
4. Unfamiliar language → "Point me to main entry points?"
5. Project >5000 files → "What areas to prioritize?"
6. Generated vs source unclear → "Focus on source only?"
**Rule:** Never guess. 30s clarification saves 30min work.
## Methodology: 3 Phases
### Phase 0: Identify (FIRST)
**Discover:** Ecosystem (package.json/Cargo.toml/pom.xml/go.mod/pyproject.toml/Gemfile/composer.json) | Scale (Single/Monorepo/Microservices/Hybrid) | Type (Backend/Frontend/Full-stack/CLI/Library/Framework) | Size (<1K/1K-10K/>10K files)
**Checklist:** Manifest → Language → Type → Scale → Source dirs → Test locations
**By Type:**
| Type | Focus | Entry | Search Patterns |
|------|-------|-------|----------------|
| Single Package | src→tests→docs | index/main/app | Exported classes, public APIs |
| Monorepo | packages/apps | Root+target | Inter-package deps, shared |
| Backend API | routes→services→data | server/app/main | REST/GraphQL, middleware, auth |
| Frontend | components→routing→state | index.html/App/main | Components, routes, store |
| CLI | commands→parsing | bin/main | Command defs, arg parsing |
| Large (>5K) | Core first | 🚦 Ask priority | Aggressive discovery |
### Phase 1: Landscape & Entry Points
**By Language:**
| Lang | Manifest | Entry | Search Patterns |
|------|----------|-------|----------------|
| JS/TS | package.json | index/main/server.ts | `export (class\|interface)`, `Router\|Controller\|Service` |
| Python | pyproject.toml | \_\_main\_\_.py, main.py | `^class \w+:`, `@(dataclass\|app\.(get\|post))` |
| Rust | Cargo.toml | src/main.rs, lib.rs | `^pub struct`, `^pub trait`, `impl.*for` |
| Java | pom.xml, build.gradle | Main.java | `@(Controller\|Service\|Repository)`, `public static void main` |
| Go | go.mod | main.go, cmd/ | `type.*struct \{`, `func main\(`, `type.*interface` |
| Ruby | Gemfile | bin/, lib/, app/ | `class.*<`, `def self.`, `module` |
**Strategy:** 3-stage: Discovery (`mode="discovery"`, 25x, auto-sorted) → Navigate (use defaults: 10-20/page) → Detailed (`matchesPerPage` control)
- All tools auto-sort by mod time (recent first)
- Use `modifiedWithin="7d"` for explicit filtering
- Navigate with `filePageNumber`/`entryPageNumber` if >10-20 results
- Defaults sufficient; increase only for comprehensive analysis
### Phase 2: Core Architecture
**Find:** Main classes/types/interfaces | Repeating patterns | Base abstractions | Central coordinators
**Workflow:** 1) Search exported classes (discovery) 2) Read key base implementations 3) Find type contracts
**Boundaries:** Public API vs internal | Interface defs (*API, *Contract, *Interface) | Visibility (@public, @internal, @api) | Export patterns
### Phase 3: Cross-Cutting
**Search:** Error (`throw new`, `Error(`, `Result<`, `Either<`) | Testing (`describe(`, `test(`, `@Test`) | Logging (`logger`, `console.log`, `print(`) | Config (`.env`, `config/`) | Docs (README, CONTRIBUTING, CHANGELOG)
## ARCHITECTURE.md Template
```markdown
# Architecture
> High-level architecture of [PROJECT_NAME]
## Bird's Eye View
[1-2 paragraphs: what/how/why]
## Entry Points
- **Main**: `file:line` - [description]
- **APIs**: `file:line` - [interfaces]
- **Config**: `file:line` - [configuration]
**Start**: Read [file], explore [module]
## Code Map
### `/src/[dir]`
[Purpose] | **Key files**: `file` - [purpose] | **Invariant**: [constraints] | **API Boundary**: [if public]
## Architectural Invariants
1. **[Name]**: [enforced/forbidden]
## System Boundaries & Layers
```
[High] → API/Interface → [Mid] → Business Logic → [Low] → Data Access
```
**Rules**: [boundary crossing, dependency flow]
## Key Abstractions & Types
- **`Type`** (`file:line`) - [purpose]
## Cross-Cutting Concerns
**Testing**: Framework, location, command, philosophy
**Error**: Strategy, invariants
**Config**: Files, env vars
**Performance**: Critical sections
**Observability**: Logging, metrics, debug tips
## Dependencies & Build
**Deps**: [dep] - [why]
**Build**: `commands`
## ADRs
### [Decision]
**Status**: Accepted/Proposed/Deprecated | **Context**: [problem] | **Decision**: [what] | **Consequences**: [tradeoffs]
## Code Style & Patterns
**Naming**: [conventions] | **Patterns**: [factory, Result<T,E>] | **Avoid**: [anti-patterns]
## Contributors
**Bugs**: Edit `[dir]` | **Features**: 1)[step] 2)[step] | **Find X**: `[dir/file]` | **Add Y**: See `[file]`, follow `[example]`
## Stability
**Public**: [stable] | **Internal**: [changes OK] | **Breaking**: [policy]
---
**Updated**: [Date] | **By**: [Team] | **Questions**: [Where]
```
## Guidelines
**DO:** Concrete paths (`src/parser.ts:42`) | Concise (3K-5K words) | Mark "Invariant"/"API Boundary" | Explain WHY | Include deliberately ABSENT
**DON'T:** Exhaustive docs | Specific line #s | Implementation details | >10 pages | Duplicate README | Assume jargon
## Token Efficiency ⚡
**Workflow:** Discovery (filesOnly, 25x) → Navigate (default paging) → Matches (10-100/file) → Extract
**DO:** `mode="discovery"` first | `matchString` > `fullContent` | Batch 3-5 queries | `minified=true` (30-60% save) | Default pagination | Track research (main/sub/why)
**AVOID:** `fullContent` >100KB without `charLength` | `mode="detailed"` initially | Override paging needlessly | `depth=3+` | `minified=false` for code
**Limits:** Tokens >50K critical | Auto-paged 20-50/page | Matches truncated 200 chars (adjust via `matchContentLength`)
## Pagination 📄
**Auto-Sort:** All tools sort by mod time (recent first). `view_structure` default `sortBy="time"`, `find_files`/`ripgrep` always sorted.
**Parameters:**
| Tool | Items/Page (default/max) | Page Param | Per-File Matches | Type |
|------|-------------------------|------------|------------------|------|
| ripgrep | 10/50 | `filePageNumber` | 10-100 | Two-level (file+match) |
| view_structure | 20/100 | `entryPageNumber` | N/A | Single |
| find_files | 20/100 | `filePageNumber` | N/A | Single |
**ripgrep Two-Level:** File paging (`filesPerPage=10`) + match paging (`matchesPerPage=10`) = 10×10=100 matches max
**Workflow:**
```javascript
{mode:"discovery", pattern:"X"} // Stage 1: First 10 files
{mode:"discovery", pattern:"X", filePageNumber:2} // Stage 2: Next 10
{pattern:"X", filesPerPage:5, matchesPerPage:20} // Stage 3: Custom (5×20=100)
```
**Match Control:** Default 200 chars (token-efficient), adjust via `matchContentLength=1-800`, with `contextLines=5` ~500-600 total
## Security & Constraints
**Blocked:** Sensitive (`.env*`, `*.pem`, `*.key`, `credentials.*`) | Artifacts (`node_modules/`, `dist/`, `build/`, `target/`) | System (`.DS_Store`, `.vscode/`)
**Limits:** 30s timeout, 10MB output, 100MB global memory
**If blocked/timeout:** Focus source code, use pagination, narrow scope
## Examples
**Python FastAPI:** pyproject.toml+workspace → 3 pkgs → 🚦 Ask:"api" → View routes/services/models (20/pg) → Search endpoints (25 in 8 files, show 10) → Page 2 → Discovery services/models → 3-layer arch, pytest
**Rust CLI:** Cargo.toml → src/main.rs+clap → View depth 2 (20/pg) → Search Parser (10/pg) → 8 commands → Result: Clap, 8 modules, TOML
**Java Spring (52 mod):** pom+modules → 🚦 Ask → View services (20/pg) → Search @Controller (12 in 10 files) → Pages → Discovery @Service/@Entity → 3-layer, Kafka
**Pagination:**
```javascript
{pattern:"TODO:", mode:"discovery"} // 10 files
{pattern:"TODO:", mode:"discovery", filePageNumber:2} // Next 10
{pattern:"X", matchesPerPage:5, filesPerPage:10} // Conservative (50 total)
{pattern:"Y", matchesPerPage:50, matchContentLength:400} // Comprehensive
{path:"src", modifiedWithin:"7d", type:"f"} // Recent (20/pg)
```
## Quality Check
Contributor answers <5min: Where to change for X? | What's this file/module? | Entry points? | Invariants? | Run tests? | What NOT to do?
## Pitfalls
| Pitfall | Wrong | Right |
|---------|-------|-------|
| Read all | `fullContent` dozens files | Discovery→selective |
| Detail first | `mode="detailed"` broad | `mode="discovery"` (25x) |
| Large full | `fullContent` 50KB+ | `matchString`/`charOffset` |
| Ignore pagination | Override defaults without reason | Use default pagination (10-20/page) |
| Too many matches | `matchesPerPage=100` always | Start with 10, increase only if needed |
| Deep explore | `depth=5` timeout | `depth=1-2` drill |
| Doc all | Exhaustive API | Structure+patterns |
## Core Mantras
🗺️ MAP not ATLAS | 🎯 Discovery→Paginate→Detail | 🔍 Match need | 📊 Patterns>instances | ⚡ Batch queries | 🧭 Navigate not enumerate | 📄 Use default pagination | 🕐 Recent first always | 🎓 Track research (mainResearchGoal+researchGoal+reasoning)
## Success
1. Contributor finds change <5min | 2. <10 pages, covers all | 3. Concrete paths | 4. Invariants stated | 5. No README dup | 6. Relevant 6-12mo | 7. WHY not WHAT
---
**Key:** Identify WHAT → Adapt approach → Explore efficiently