Skip to main content
Glama
directory_tool.md14.5 kB
# Directory Intelligence Tool A .gitignore-aware directory structure analyzer that generates XML hierarchies with intelligent summarization for large directories. ## Overview The Directory Intelligence Tool provides a comprehensive solution for analyzing directory structures and generating structured XML output. It respects `.gitignore` patterns, handles edge cases gracefully, and provides intelligent summarization for large directories. ## Key Features - **Git-aware**: Respects `.gitignore` patterns using the `pathspec` library - **XML Output**: Generates structured XML with `<dir>`, `<file>`, and `<summary>` elements - **Smart Summarization**: Automatically summarizes directories with more than 50 files - **Robust Error Handling**: Gracefully handles unreadable files, broken symlinks, malformed .gitignore - **Flexible Configuration**: Supports environment variables and config files - **FastMCP Integration**: Available as a FastMCP tool for remote access - **Symlink Loop Protection**: Detects and prevents infinite recursion from circular symlinks ## XML Structure The Directory Intelligence Tool generates a structured XML hierarchy with specific element types and attributes. ### Schema Overview ```xml <?xml version="1.0" ?> <project_structure> <dir name="directory_name"> <!-- Files are represented as text nodes --> <file>filename.py</file> <file>README.md</file> <!-- Subdirectories are recursively structured --> <dir name="subdirectory"> <file>module.py</file> </dir> <!-- Large directories get summarized --> <dir name="large_directory"> <summary count="127"/> </dir> </dir> <!-- Warnings are collected at the end --> <warnings> <warning>warning_message</warning> </warnings> </project_structure> ``` ### Element Reference **`<project_structure>`** (root) - Container for the entire directory analysis - Always present as the root element **`<dir name="...">`** (directory) - Represents a directory in the hierarchy - `name` attribute: Sanitized directory name (restricted to alphanumerics, dots, underscores, and hyphens) - Contains child elements: `<file>`, `<dir>`, or `<summary>` **`<file>`** (file) - Represents a file within a directory - Text content: Sanitized filename (see name restrictions above) - No attributes **`<summary count="..."/>`** (summarized directory) - Replaces file listing when directory exceeds max_file_count threshold - `count` attribute: Total number of files in the directory and its subdirectories - Self-closing element with no children **`<warnings>`** (warning container) - Optional element present only when warnings occur - Contains one or more `<warning>` elements - Positioned at the end of the XML structure **`<warning>`** (individual warning) - Contains warning message text - Follows the taxonomy format: `type: path - description` - May appear more than 100 times (truncation notice is appended instead of truncating the list) ### Summary Element When a directory contains more than the threshold number of files (default: 50), a `<summary>` element is generated instead of listing all files: ```xml <dir name="large_directory"> <summary count="127"/> </dir> ``` This prevents XML bloat while still providing information about the directory contents. **Summarization Behavior:** - `max_file_count` controls the threshold (default: 50, configurable via environment or config) - Only subdirectories (depth > 0) are summarized; the root directory is always fully expanded - When `expand_large=True`, summarization is disabled and all files are listed ## Ignore Rules The tool applies multiple layers of ignore rules in a deterministic order. Each layer is evaluated sequentially, and if a path matches any rule, it is ignored immediately. ### Ignore Precedence (Evaluation Order) 1. **Explicit ignore patterns (gitignore)** - Checked first 2. **Top-level dot-directory suppression** - Applied to directories starting with `.` at depth 1 3. **Normal scanning** - If no ignore rules match, the path is included This deterministic order ensures predictable behavior across different environments. ### 1. Default Patterns These patterns are always applied (included in every `.gitignore` evaluation): - `.git` - Git metadata - `.vscode` - VS Code configuration - `.idea` - IntelliJ configuration - `__pycache__` - Python cache directories - `*.pyc`, `*.pyo`, `*.pyd` - Compiled Python files - `.pytest_cache` - Pytest cache - `.coverage` - Coverage.py data - `.tox` - Tox environments - `.venv`, `venv`, `env`, `ENV` - Virtual environments - `node_modules` - Node.js dependencies - `.DS_Store` - macOS metadata - `Thumbs.db` - Windows thumbnail cache - `*.egg-info` - Python egg metadata - `dist`, `build` - Build directories - `*.so`, `*.dylib`, `*.dll` - Compiled libraries ### 2. .gitignore Patterns User-defined patterns from `.gitignore` files are read from the root directory and merged with default patterns. The tool uses gitwildmatch syntax, which is compatible with `.gitignore`. ### 3. Top-Level Dot Directories By default, directories starting with `.` at the top level (depth 1) are ignored. This can be controlled via the `IGNORE_TOPLEVEL_DOTDIRS` constant. This rule applies after gitignore patterns are checked. ## Warning System The tool generates warnings for various error conditions. All warnings follow the format: ``` <type>: <path> - <message> ``` ### Warning Taxonomy #### 1. **unreadable_file**: File cannot be opened or read **When triggered:** I/O errors or permission denied when attempting to read files **Primary use case:** .gitignore files that cannot be read due to permissions **Example:** ``` unreadable_file: /path/to/.gitignore - Permission denied or I/O error: [details] ``` **Behavior:** - Warning is added to the warnings list - The file is skipped - Processing continues with other items - A default ignore spec (empty) is used #### 2. **unreadable_directory**: Directory cannot be accessed **When triggered:** Permission denied or I/O errors when attempting to list directory contents **Example:** ``` unreadable_directory: /path/to/dir - Permission denied ``` **Behavior:** - Warning is added to the warnings list - The directory and its contents are skipped - Processing continues with other directories - Does not prevent analysis of accessible sibling directories #### 3. **malformed_gitignore**: .gitignore patterns are invalid **When triggered:** AFTER successfully reading the .gitignore file, pattern parsing fails **This is different from unreadable_file:** - **unreadable_file** = cannot open/read the file (I/O or permission error) - **malformed_gitignore** = file was read successfully but contains invalid patterns **Example:** ``` malformed_gitignore: /path/to/project - failed to parse .gitignore: [parsing error] ``` **Behavior:** - Warning is added to the warnings list - Default patterns are still applied - The invalid .gitignore patterns are discarded - Processing continues with default patterns only #### 4. **symlink_loop**: Circular symlink reference detected **When triggered:** A symlink resolves to a target that has already been visited in the current traversal path **Example:** ``` symlink_loop: /path/to/link -> /path/to/target ``` **Behavior:** - Warning is added to the warnings list - The symlink is skipped (not followed) - Prevents infinite recursion - Processing continues with other items #### 5. **broken_symlink**: Symlink points to non-existent target **When triggered:** A symlink cannot be resolved to a valid target **Example:** ``` broken_symlink: /path/to/link - [error details] ``` **Behavior:** - Warning is added to the warnings list - The symlink is skipped (not followed) - Processing continues with other items #### 6. **too_many_warnings**: Warning count exceeded threshold **When triggered:** More than 100 warnings have been generated **Example:** ``` too_many_warnings: truncated ``` **Behavior:** - All 100+ warnings are preserved in the XML (not truncated) - This final warning is appended to indicate truncation occurred - Alerts users that the complete warning list is very long ### Warning Output in XML Warnings are included in the XML output under a `<warnings>` element at the end of the structure: ```xml <project_structure> <dir name="..."> ... </dir> <warnings> <warning>unreadable_directory: /path/to/dir - Permission denied</warning> <warning>symlink_loop: /path/to/link -> /path/to/target</warning> </warnings> </project_structure> ``` ## Symlink Handling ### Symlink Loop Detection The tool tracks resolved symlink targets to detect and prevent infinite loops. When a symlink pointing to an already-visited target is encountered: 1. A `symlink_loop` warning is generated 2. The symlink is skipped 3. Processing continues normally ### Broken Symlinks When a symlink cannot be resolved: 1. A `broken_symlink` warning is generated 2. The symlink is skipped 3. Processing continues normally ## Configuration ### Environment Variables - `DIRECTORY_TOOL_MAX_FILES`: Override the default file count threshold (default: 50) - `DIRECTORY_TOOL_EXPAND_LARGE`: Set default expand_large behavior (true/false) ### Config File (Future) Configuration can also be loaded from `config/config.json`: ```json { "max_file_count": 50, "expand_large": false } ``` ## Usage ### As Python Module ```python from directory_tool import get_codebase_structure # Analyze current directory xml_output = get_codebase_structure(".") print(xml_output) # Expand large directories xml_output = get_codebase_structure(".", expand_large=True) print(xml_output) ``` ### As FastMCP Tool The tool is exposed via the FastMCP server as `get_codebase_structure`: ```python # Server must be running # Call via HTTP or FastMCP client ``` ### Command Line ```bash python src/directory_tool.py /path/to/analyze --expand-large ``` ## Error Handling ### Unreadable Directories When a directory cannot be read: - A warning is added to the warnings list - The directory is skipped - Processing continues with other items - No exception is raised ### Unreadable Files When a file cannot be read: - A warning is added to the warnings list - The file is skipped - Processing continues with other items ### Malformed .gitignore When a .gitignore file has invalid syntax: - A warning is generated - Default patterns are still applied - Processing continues normally ## Performance Considerations - **Directory Scanning**: Uses `os.listdir()` for efficient traversal - **File Counting**: Recursively counts files for summarization decisions - **Pattern Matching**: Uses compiled PathSpec for fast gitignore matching - **Memory Usage**: Processes directories sequentially to minimize memory footprint ## Known Limitations 1. **Nested .gitignore**: Only reads `.gitignore` from the root directory 2. **Symbolic Links**: May miss symlink loops in complex scenarios 3. **Very Large Directories**: Directory traversal may be slow for directories with thousands of entries 4. **Case Sensitivity**: Pattern matching respects filesystem case sensitivity ## Examples ### Example 1: Simple Project Structure ```xml <?xml version="1.0" ?> <project_structure> <dir name="my_project"> <file>README.md</file> <file>main.py</file> <dir name="src"> <file>app.py</file> <file>utils.py</file> </dir> <dir name="tests"> <file>test_app.py</file> </dir> </dir> </project_structure> ``` ### Example 2: Large Directory with Summary ```xml <?xml version="1.0" ?> <project_structure> <dir name="my_project"> <dir name="logs"> <summary count="1047"/> </dir> <file>main.py</file> </dir> <warnings> <warning>unreadable_directory: /path/to/restricted - Permission denied</warning> </warnings> </project_structure> ``` ### Example 3: Ignoring Common Patterns When analyzing a Python project: ```xml <project_structure> <dir name="my_app"> <!-- __pycache__ is automatically ignored --> <!-- .git is automatically ignored --> <!-- node_modules is automatically ignored --> <file>app.py</file> <dir name="tests"> <file>test_app.py</file> </dir> </dir> </project_structure> ``` ## Troubleshooting ### Directory Not Analyzed **Problem**: Directory doesn't appear in output **Solutions**: - Check if directory matches ignore patterns - Verify read permissions - Check warning messages for errors ### Too Many Warnings **Problem**: More than 100 warnings generated **Solution**: Fix underlying issues (permissions, broken symlinks, etc.) The tool will truncate warnings at 100 and add a truncation notice. ### Unexpected Files Included **Problem**: Files you expected to be ignored are included **Solutions**: - Check .gitignore syntax - Verify pattern matching with gitwildmatch rules - Check default ignore patterns ### Performance Issues **Problem**: Analysis is slow for large directories **Solutions**: - Use `expand_large=False` to enable summarization - Exclude unnecessary directories with .gitignore - Consider using ignore patterns for known-large directories ## API Reference ### Functions #### `get_codebase_structure(root_path: str = ".", expand_large: bool = False) -> str` Analyze directory structure and generate XML representation. **Parameters:** - `root_path`: Root directory to analyze (default: current directory) - `expand_large`: Always expand directories regardless of file count (default: False) **Returns:** XML string representing the directory structure ### Classes #### `DirectoryIntelligenceTool` Main class for directory analysis. **Methods:** - `__init__(root_path: str = ".")`: Initialize tool - `generate_xml_structure(expand_large: bool = None) -> str`: Generate complete XML structure - `load_gitignore_patterns(directory: Path) -> pathspec.PathSpec`: Load and parse .gitignore patterns - `scan_directory(directory: Path, ignore_spec: pathspec.PathSpec, expand_large: bool = False, depth: int = 0) -> ET.Element`: Scan directory recursively - `count_directory_files(directory: Path, ignore_spec: pathspec.PathSpec, depth: int = 0) -> int`: Count files in directory - `_should_ignore(path: Path, ignore_spec: pathspec.PathSpec, depth: int) -> bool`: Check if path should be ignored ## Requirements - Python 3.11+ - pathspec>=0.11.0 - fastmcp>=0.4.0 (for MCP server integration) ## License This tool is part of the Directory Intelligence Tool project.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/itstanner5216/EliteMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server