Skip to main content
Glama

load_document

Load documents into DocNav-MCP for intelligent navigation and analysis, generating a document ID for structured access.

Instructions

Load a document for navigation and analysis.

Args:
    file_path: Path to the document file

Returns:
    Success message with auto-generated document ID

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes

Implementation Reference

  • server.py:24-52 (handler)
    The primary handler for the 'load_document' MCP tool. Resolves the file path, validates existence, loads the document using DocumentNavigator.load_document_from_file_sync, retrieves metadata, and returns a formatted success message with the document ID.
    @mcp.tool()
    def load_document(file_path: str) -> str:
        """Load a document for navigation and analysis.
    
        Args:
            file_path: Path to the document file
    
        Returns:
            Success message with auto-generated document ID
        """
        try:
            path = Path(file_path).resolve()
            if not path.exists():
                return f"Error: File not found: {file_path}"
    
            # Use the synchronous version to avoid event loop conflicts
            doc_id, document = navigator.load_document_from_file_sync(path)
    
            metadata = navigator.get_document_metadata(doc_id)
            return (
                f"Document loaded successfully!\n"
                f"File: {path.name}\n"
                f"Document ID: {doc_id}\n"
                f"Format: {metadata['format'] if metadata else 'unknown'}\n"
                f"Use get_outline('{doc_id}') to see document structure."
            )
        except Exception as e:
            return f"Error loading document: {str(e)}"
  • Helper method invoked by the tool handler to load the document synchronously from file. Handles both async and pure sync contexts, finds appropriate processor, processes the file, generates doc_id, stores document and metadata.
    def load_document_from_file_sync(self, file_path: Path) -> Tuple[str, Document]:
        """Load document from file (synchronous version).
    
        Args:
            file_path: Path to the document file
    
        Returns:
            Tuple of (doc_id, Document) where doc_id is auto-generated UUID
        """
        if not file_path.exists():
            raise FileNotFoundError(f"File not found: {file_path}")
    
        # Normalize path to prevent injection issues
        normalized_path = self._normalize_file_path(file_path)
    
        try:
            # Check if we're in an async context (like MCP server)
            import asyncio
    
            try:
                # Try to get the running event loop
                asyncio.get_running_loop()
                # If we get here, we're in an async context
                # Fall back to sync processing immediately
                return self._load_file_fallback_sync(file_path)
            except RuntimeError:
                # No running event loop, we can use asyncio.run
                processor = self._find_processor(file_path)
                document = asyncio.run(processor.process(file_path))
    
                doc_id = self._generate_doc_id()
                self.loaded_documents[doc_id] = document
    
                # Store metadata with normalized path
                self.document_metadata[doc_id] = {
                    "title": file_path.name,
                    "format": document.source_format,
                    "source_type": "file",
                    "file_path": normalized_path,
                    "created_at": str(uuid.uuid1().time),
                }
    
                return doc_id, document
    
        except Exception as e:
            # For any error, fall back to sync processing
            try:
                return self._load_file_fallback_sync(file_path)
            except Exception as fallback_error:
                raise ValueError(
                    f"Error loading document: {str(e)}. Fallback also failed: {str(fallback_error)}"
                )
  • Fallback synchronous loader for files, handles PDF conversion to markdown using pymupdf4llm and text files by reading content and using text loader. Parses structure and stores document.
    def _load_file_fallback_sync(self, file_path: Path) -> Tuple[str, Document]:
        """Fallback sync file loading for when async processors can't be used."""
        normalized_path = self._normalize_file_path(file_path)
    
        # Handle PDF files directly with pymupdf4llm (which is actually sync)
        if file_path.suffix.lower() == ".pdf":
            try:
                import pymupdf4llm
    
                # Convert PDF to markdown using pymupdf4llm (this is actually synchronous)
                markdown_content = pymupdf4llm.to_markdown(str(file_path))
    
                # Create Document object
                from .models import Document
    
                document = Document(
                    file_path=file_path,
                    title=file_path.stem,
                    source_text=markdown_content,
                    source_format="pdf",
                )
    
                # Use markdown processor to parse the converted content
                # Create temporary file for processing
                import tempfile
    
                with tempfile.NamedTemporaryFile(
                    mode="w", suffix=".md", delete=False, encoding="utf-8"
                ) as f:
                    f.write(markdown_content)
                    temp_path = Path(f.name)
    
                try:
                    # Use the markdown processor synchronously by creating a simple parser
                    from .processors.markdown import MarkdownProcessor
    
                    md_processor = MarkdownProcessor()
    
                    # Parse using the internal parsing method directly
                    root = md_processor._parse_markdown_to_tree(markdown_content)
                    document.root = root
                    document.rebuild_index()
    
                finally:
                    temp_path.unlink()  # Clean up
    
                # Generate doc ID and store
                doc_id = self._generate_doc_id()
                self.loaded_documents[doc_id] = document
    
                # Store metadata
                self.document_metadata[doc_id] = {
                    "title": file_path.name,
                    "format": "pdf",
                    "source_type": "file",
                    "file_path": normalized_path,
                    "created_at": str(uuid.uuid1().time),
                }
    
                return doc_id, document
    
            except ImportError:
                raise ValueError(
                    "pymupdf4llm is required for PDF processing but not available"
                )
            except Exception as e:
                raise ValueError(f"Error processing PDF file: {str(e)}")
    
        # For markdown and other text files
        content = file_path.read_text(encoding="utf-8")
        format_map = {
            ".md": "markdown",
            ".markdown": "markdown",
            ".xml": "xml",
        }
        file_format = format_map.get(file_path.suffix.lower(), "markdown")
    
        # Use the sync text loading method
        doc_id, document = self.load_document_from_text_sync(
            content, file_format, file_path.stem
        )
    
        # Update metadata to reflect file source
        self.document_metadata[doc_id].update(
            {
                "source_type": "file",
                "file_path": normalized_path,
            }
        )
    
        return doc_id, document
  • server.py:24-24 (registration)
    The @mcp.tool() decorator registers the load_document function as an MCP tool.
    @mcp.tool()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shenyimings/DocNav-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server