Skip to main content
Glama

list_files

View all documents stored in your local vector database, showing file paths and chunk counts for each ingested file.

Instructions

List all ingested files in the vector database. Returns file paths and chunk counts for each document.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The MCP tool handler for 'list_files'. Calls vectorStore.listFiles(), enriches raw-data files with source information, and returns JSON-formatted list.
     * list_files tool handler
     * Enriches raw-data files with original source information
     */
    async handleListFiles(): Promise<{ content: [{ type: 'text'; text: string }] }> {
      try {
        const files = await this.vectorStore.listFiles()
    
        // Enrich raw-data files with source information
        const enrichedFiles = files.map((file) => {
          if (isRawDataPath(file.filePath)) {
            const source = extractSourceFromPath(file.filePath)
            if (source) {
              return { ...file, source }
            }
          }
          return file
        })
    
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(enrichedFiles, null, 2),
            },
          ],
        }
      } catch (error) {
        console.error('Failed to list files:', error)
        throw error
      }
    }
  • Registers the 'list_files' tool in the MCP ListToolsRequestSchema handler, including name, description, and input schema.
    {
      name: 'list_files',
      description:
        'List all ingested files in the vector database. Returns file paths and chunk counts for each document.',
      inputSchema: { type: 'object', properties: {} },
    },
  • Input schema for 'list_files' tool: empty object (no parameters required).
        'List all ingested files in the vector database. Returns file paths and chunk counts for each document.',
      inputSchema: { type: 'object', properties: {} },
    },
  • Core implementation of listFiles() in VectorStore: queries all chunks from LanceDB, groups by filePath, counts chunks per file, selects latest timestamp per file.
     * Get list of ingested files
     *
     * @returns Array of file information
     */
    async listFiles(): Promise<{ filePath: string; chunkCount: number; timestamp: string }[]> {
      if (!this.table) {
        return [] // Return empty array if table doesn't exist
      }
    
      try {
        // Retrieve all records
        const allRecords = await this.table.query().toArray()
    
        // Group by file path
        const fileMap = new Map<string, { chunkCount: number; timestamp: string }>()
    
        for (const record of allRecords) {
          const filePath = record.filePath as string
          const timestamp = record.timestamp as string
    
          if (fileMap.has(filePath)) {
            const fileInfo = fileMap.get(filePath)
            if (fileInfo) {
              fileInfo.chunkCount += 1
              // Keep most recent timestamp
              if (timestamp > fileInfo.timestamp) {
                fileInfo.timestamp = timestamp
              }
            }
          } else {
            fileMap.set(filePath, { chunkCount: 1, timestamp })
          }
        }
    
        // Convert Map to array of objects
        return Array.from(fileMap.entries()).map(([filePath, info]) => ({
          filePath,
          chunkCount: info.chunkCount,
          timestamp: info.timestamp,
        }))
      } catch (error) {
        throw new DatabaseError('Failed to list files', error as Error)
      }
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that the tool returns file paths and chunk counts, which is useful behavioral context. However, it does not mention potential limitations such as pagination, rate limits, or error conditions, leaving gaps in transparency for a read operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences that are front-loaded with the core purpose and efficiently add return details. Every sentence earns its place by providing essential information without redundancy or fluff, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (0 parameters, no output schema, no annotations), the description is complete enough for a basic list operation. It explains what the tool does and what it returns, covering the key aspects. However, it could be slightly more complete by mentioning any constraints like ordering or filtering, but this is minor.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are 0 parameters, and schema description coverage is 100%, so no parameter documentation is needed. The description appropriately does not discuss parameters, focusing instead on the tool's function and output. A baseline of 4 is applied as it effectively handles the lack of parameters without unnecessary details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('List all ingested files') and resource ('in the vector database'), distinguishing it from siblings like delete_file, ingest_file, and query_documents. It explicitly mentions what is returned ('file paths and chunk counts for each document'), making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by specifying what it does (listing files) and what it returns, which suggests it should be used to retrieve file metadata. However, it does not explicitly state when to use this tool versus alternatives like query_documents (which likely searches content) or status (which might check system state), leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shinpr/mcp-local-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server