Skip to main content
Glama

readwise_list_documents

Retrieve and filter documents from Readwise Reader to manage saved content by ID, date, location, category, or tags for efficient organization.

Instructions

List documents from Readwise Reader with optional filtering

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
idNoFilter by specific document ID
updatedAfterNoFilter documents updated after this date (ISO 8601)
addedAfterNoFilter documents added after this date (ISO 8601). Note: This will fetch all documents first and then filter client-side.
locationNoFilter by document location
categoryNoFilter by document category
tagNoFilter by tag name
pageCursorNoPage cursor for pagination
withHtmlContentNo⚠️ PERFORMANCE WARNING: Include HTML content in the response. This significantly slows down the API. Only use when explicitly requested by the user or when raw HTML is specifically needed for the task.
withFullContentNo⚠️ PERFORMANCE WARNING: Include full converted text content in the response. This significantly slows down the API as it fetches and processes each document's content. Only use when explicitly requested by the user or when document content is specifically needed for analysis/reading. Default: false for performance.
contentMaxLengthNoMaximum length of content to include per document (in characters). Default: 50000. Use with withFullContent=true to prevent token limit issues.
contentStartOffsetNoCharacter offset to start content extraction from. Use with contentMaxLength for pagination through large documents. Default: 0.
contentFilterKeywordsNoFilter content to include only sections containing these keywords (case-insensitive). Useful for extracting specific topics from large documents.
limitNoMaximum number of documents to return. Use this to prevent token limit issues when requesting multiple documents with content.

Implementation Reference

  • Core handler function that executes the readwise_list_documents tool: fetches documents via Readwise API with support for pagination, filtering (including client-side for 'addedAfter'), optional full content extraction (HTML/URL to text conversion), content processing (truncation, offsets, keyword filtering), and formats a user-friendly text response.
    export async function handleListDocuments(args: any) {
      const client = initializeClient();
      const params = args as ListDocumentsParams;
      
      // If withFullContent is true, we also need HTML content
      if (params.withFullContent === true) {
        params.withHtmlContent = true;
      }
      
      let response;
      let clientSideFiltered = false;
      
      // If addedAfter is specified, we need to fetch all documents and filter client-side
      if (params.addedAfter) {
        clientSideFiltered = true;
        const addedAfterDate = new Date(params.addedAfter);
        
        // Create params without addedAfter for the API call
        const apiParams = { ...params };
        delete apiParams.addedAfter;
        
        // Fetch all documents if no other pagination is specified
        if (!apiParams.pageCursor && !apiParams.limit) {
          const allDocuments: any[] = [];
          let nextPageCursor: string | undefined;
          
          do {
            const fetchParams = { ...apiParams };
            if (nextPageCursor) {
              fetchParams.pageCursor = nextPageCursor;
            }
            
            const pageResponse = await client.listDocuments(fetchParams);
            allDocuments.push(...pageResponse.data.results);
            nextPageCursor = pageResponse.data.nextPageCursor;
          } while (nextPageCursor);
          
          // Filter documents by addedAfter date
          const filteredDocuments = allDocuments.filter(doc => {
            if (!doc.saved_at) return false;
            const savedDate = new Date(doc.saved_at);
            return savedDate > addedAfterDate;
          });
          
          response = {
            data: {
              count: filteredDocuments.length,
              nextPageCursor: undefined,
              results: filteredDocuments
            },
            messages: []
          };
        } else {
          // If pagination is specified, just do a regular API call and filter the current page
          response = await client.listDocuments(apiParams);
          const filteredDocuments = response.data.results.filter(doc => {
            if (!doc.saved_at) return false;
            const savedDate = new Date(doc.saved_at);
            return savedDate > addedAfterDate;
          });
          
          response.data.results = filteredDocuments;
          response.data.count = filteredDocuments.length;
        }
      } else {
        response = await client.listDocuments(params);
      }
    
      // Convert content to LLM-friendly text for documents only if withFullContent is explicitly true
      const shouldIncludeContent = params.withFullContent === true; // Default to false for performance
      
      // Process documents with content if needed
      const documentsWithText = await Promise.all(
        response.data.results.map(async (doc) => {
          let content = '';
          let contentMetadata: any = {};
          
          if (shouldIncludeContent) {
            let rawContent = '';
            
            // Try to use HTML content first (from Readwise), fallback to URL fetching
            if (doc.html_content) {
              // Use HTML content from Readwise for non-jina content types
              const shouldUseJina = !doc.category || doc.category === 'article' || doc.category === 'pdf';
              if (shouldUseJina) {
                const urlToConvert = doc.source_url || doc.url;
                if (urlToConvert) {
                  rawContent = await convertUrlToText(urlToConvert, doc.category);
                }
              } else {
                rawContent = await extractTextFromHtml(doc.html_content);
              }
            } else {
              // Fallback to URL fetching if no HTML content available
              const urlToConvert = doc.source_url || doc.url;
              if (urlToConvert) {
                rawContent = await convertUrlToText(urlToConvert, doc.category);
              }
            }
            
            // Process content with pagination and filtering options
            if (rawContent) {
              const processResult = processContentWithOptions(rawContent, {
                maxLength: params.contentMaxLength,
                startOffset: params.contentStartOffset,
                filterKeywords: params.contentFilterKeywords
              });
              
              content = processResult.content;
              contentMetadata = {
                contentTruncated: processResult.truncated,
                contentTotalLength: processResult.totalLength,
                contentExtractedSections: processResult.extractedSections?.length || 0,
                // Include debug info in response
                ...(processResult.debug && { contentDebug: processResult.debug })
              };
              
              // Add helpful metadata for users
              if (processResult.truncated) {
                contentMetadata.contentNote = `Content truncated. Original length: ${processResult.totalLength} chars. ` +
                  `To get more content, use contentStartOffset=${(params.contentStartOffset || 0) + (params.contentMaxLength || 50000)}`;
              }
              
              if (params.contentFilterKeywords && params.contentFilterKeywords.length > 0) {
                contentMetadata.contentKeywordsUsed = params.contentFilterKeywords;
                
                // If no content found after filtering, add a helpful note
                if (!processResult.extractedSections || processResult.extractedSections.length === 0) {
                  contentMetadata.contentFilterNote = `No content sections found containing the keywords: ${params.contentFilterKeywords.join(', ')}`;
                }
              }
            }
          }
          
          const result: any = {
            id: doc.id,
            url: doc.url,
            title: doc.title,
            author: doc.author,
            source: doc.source,
            category: doc.category,
            location: doc.location,
            tags: doc.tags,
            site_name: doc.site_name,
            word_count: doc.word_count,
            created_at: doc.created_at,
            updated_at: doc.updated_at,
            published_date: doc.published_date,
            summary: doc.summary,
            image_url: doc.image_url,
            source_url: doc.source_url,
            notes: doc.notes,
            parent_id: doc.parent_id,
            reading_progress: doc.reading_progress,
            first_opened_at: doc.first_opened_at,
            last_opened_at: doc.last_opened_at,
            saved_at: doc.saved_at,
            last_moved_at: doc.last_moved_at,
            ...contentMetadata, // Add content processing metadata
          };
          
          if (shouldIncludeContent) {
            result.content = content; // LLM-friendly text content instead of raw HTML
          }
          
          if (params.withHtmlContent && doc.html_content) {
            result.html_content = doc.html_content;
          }
          
          return result;
        })
      );
    
      // Create a summary response to avoid token limits
      let responseText = '';
      
      if (shouldIncludeContent && (params.contentMaxLength || params.contentFilterKeywords)) {
        // When content processing is involved, provide a more compact response
        responseText = `Found ${response.data.count} document(s).\n\n`;
        
        documentsWithText.forEach((doc, index) => {
          responseText += `Document ${index + 1}:\n`;
          responseText += `Title: ${doc.title || 'Untitled'}\n`;
          responseText += `Author: ${doc.author || 'Unknown'}\n`;
          responseText += `Category: ${doc.category || 'Unknown'}\n`;
          responseText += `URL: ${doc.url}\n`;
          
          if (doc.content) {
            responseText += `\nContent (${doc.content.length} characters):\n${doc.content}\n`;
          }
          
          if (doc.contentTruncated) {
            responseText += `\n[Content was truncated. Original length: ${doc.contentTotalLength} chars]\n`;
          }
          
          if (doc.contentFilterNote) {
            responseText += `\n[${doc.contentFilterNote}]\n`;
          }
          
          responseText += '\n' + '='.repeat(50) + '\n\n';
        });
        
        if (response.data.nextPageCursor) {
          responseText += `Next page cursor: ${response.data.nextPageCursor}\n`;
        }
      } else {
        // For non-content requests, return compact JSON
        responseText = JSON.stringify({
          count: response.data.count,
          nextPageCursor: response.data.nextPageCursor,
          documents: documentsWithText.map(doc => ({
            id: doc.id,
            title: doc.title,
            author: doc.author,
            category: doc.category,
            url: doc.url,
            summary: doc.summary,
            reading_progress: doc.reading_progress
          }))
        }, null, 2);
      }
      
      let allMessages = response.messages || [];
      
      // Add message about client-side filtering if it was performed
      if (clientSideFiltered) {
        allMessages.push({
          type: 'info',
          content: 'Documents were filtered client-side based on the addedAfter date. All documents were fetched from the API first, then filtered by their saved_at date.'
        });
      }
      
      // Add information about parameter usage
      if (params.limit && params.limit > 0) {
        allMessages.push({
          type: 'info',
          content: `Document limit of ${params.limit} was applied client-side after API response.`
        });
      }
      
      if (params.contentFilterKeywords && params.contentFilterKeywords.length > 0) {
        allMessages.push({
          type: 'info',
          content: `Content was filtered for keywords: ${params.contentFilterKeywords.join(', ')}. Check individual documents for contentFilterNote if no matches found.`
        });
      }
      
      if (allMessages.length > 0) {
        responseText += '\n\nMessages:\n' + allMessages.map(msg => `${msg.type.toUpperCase()}: ${msg.content}`).join('\n');
      }
    
      return {
        content: [
          {
            type: 'text',
            text: responseText,
          },
        ],
      };
    }
  • Tool definition including name, description, and detailed inputSchema for parameter validation (filters, pagination, content processing options).
    {
      name: 'readwise_list_documents',
      description: 'List documents from Readwise Reader with optional filtering',
      inputSchema: {
        type: 'object',
        properties: {
          id: {
            type: 'string',
            description: 'Filter by specific document ID',
          },
          updatedAfter: {
            type: 'string',
            description: 'Filter documents updated after this date (ISO 8601)',
          },
          addedAfter: {
            type: 'string',
            description: 'Filter documents added after this date (ISO 8601). Note: This will fetch all documents first and then filter client-side.',
          },
          location: {
            type: 'string',
            enum: ['new', 'later', 'shortlist', 'archive', 'feed'],
            description: 'Filter by document location',
          },
          category: {
            type: 'string',
            enum: ['article', 'book', 'tweet', 'pdf', 'email', 'youtube', 'podcast'],
            description: 'Filter by document category',
          },
          tag: {
            type: 'string',
            description: 'Filter by tag name',
          },
          pageCursor: {
            type: 'string',
            description: 'Page cursor for pagination',
          },
          withHtmlContent: {
            type: 'boolean',
            description: '⚠️ PERFORMANCE WARNING: Include HTML content in the response. This significantly slows down the API. Only use when explicitly requested by the user or when raw HTML is specifically needed for the task.',
          },
          withFullContent: {
            type: 'boolean',
            description: '⚠️ PERFORMANCE WARNING: Include full converted text content in the response. This significantly slows down the API as it fetches and processes each document\'s content. Only use when explicitly requested by the user or when document content is specifically needed for analysis/reading. Default: false for performance.',
          },
          contentMaxLength: {
            type: 'number',
            description: 'Maximum length of content to include per document (in characters). Default: 50000. Use with withFullContent=true to prevent token limit issues.',
          },
          contentStartOffset: {
            type: 'number',
            description: 'Character offset to start content extraction from. Use with contentMaxLength for pagination through large documents. Default: 0.',
          },
          contentFilterKeywords: {
            type: 'array',
            items: { type: 'string' },
            description: 'Filter content to include only sections containing these keywords (case-insensitive). Useful for extracting specific topics from large documents.',
          },
          limit: {
            type: 'number',
            description: 'Maximum number of documents to return. Use this to prevent token limit issues when requesting multiple documents with content.',
          },
        },
        additionalProperties: false,
      },
    },
  • Switch case in central handler dispatcher that routes 'readwise_list_documents' tool calls to the specific handleListDocuments implementation.
    case 'readwise_list_documents':
      return handleListDocuments(args);
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only states basic functionality without disclosing behavioral traits. It doesn't mention pagination behavior (implied by pageCursor parameter), rate limits, authentication requirements, error conditions, or what the response structure looks like. The description is minimal and doesn't compensate for the lack of annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. Every word earns its place with no redundancy or unnecessary elaboration. It's appropriately sized for a list operation with detailed schema support.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 13 parameters, no annotations, and no output schema, the description is incomplete. It doesn't address the complexity of filtering options, performance implications highlighted in the schema, or what the tool returns. For a tool with rich filtering capabilities and performance considerations, the description should provide more contextual guidance about usage patterns and expected outputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents all 13 parameters thoroughly. The description adds no additional parameter semantics beyond 'with optional filtering,' which is already implied by the schema. Baseline 3 is appropriate when schema does the heavy lifting, though the description could have explained parameter relationships or filtering strategies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List') and resource ('documents from Readwise Reader') with optional filtering. It distinguishes from siblings like readwise_list_books or readwise_list_tags by specifying documents rather than other resource types. However, it doesn't explicitly contrast with readwise_search_highlights or readwise_topic_search, which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through 'with optional filtering' but doesn't provide explicit guidance on when to use this tool versus alternatives like readwise_list_books or readwise_search_highlights. The input schema includes performance warnings for content parameters, which offers some implicit guidance, but the description itself lacks explicit when/when-not instructions or named alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/arnaldo-delisio/readwise-mcp-enhanced'

If you have feedback or need assistance with the MCP directory API, please join our Discord server