README.md•6.38 kB
# Mineru MCP Server
A Model Context Protocol (MCP) document parsing server that integrates with Mineru API to provide powerful document parsing capabilities.
## Features
- **Single File Parsing**: Create document parsing tasks via URL
- **Batch File Parsing**: Support multiple file batch upload and parsing
- **Task Status Monitoring**: Real-time query of parsing progress and results
- **Multi-format Support**: Support PDF, DOC, DOCX, PPT, PPTX, PNG, JPG, JPEG and other formats
- **OCR Functionality**: Optional OCR text recognition
- **Formula Recognition**: Support mathematical formula recognition
- **Table Recognition**: Support table structure recognition
- **Multi-language Support**: Support Chinese, English and other languages
## Installation
```bash
npm install
```
## Configuration
Before using, you need to configure the Mineru API key:
```typescript
const config = {
mineruApiKey: "your-mineru-api-bearer-token", // Mineru API Bearer token
mineruBaseUrl: "https://mineru.net/api/v4" // Mineru API base URL
};
```
## Available Tools
### 1. create_parsing_task
Create a document parsing task for a single file
**Parameters:**
- `url` (required): File URL
- `is_ocr` (optional): Enable OCR, default false
- `enable_formula` (optional): Enable formula recognition, default true
- `enable_table` (optional): Enable table recognition, default true
- `language` (optional): Document language, default "ch"
- `page_ranges` (optional): Page ranges, e.g., "1-10,15-20"
- `model_version` (optional): Model version, "v1" or "v2"
- `extra_formats` (optional): Additional export formats, ["docx", "html", "latex"]
### 2. get_task_status
Query parsing task status
**Parameters:**
- `task_id` (required): Task ID
### 3. create_batch_parsing_task
Create a batch file upload parsing task (for local file uploads)
**Parameters:**
- `files` (required): File array, each file contains name, is_ocr, page_ranges and other properties
- `enable_formula` (optional): Enable formula recognition
- `enable_table` (optional): Enable table recognition
- `language` (optional): Document language
- `model_version` (optional): Model version
- `extra_formats` (optional): Additional export formats
### 4. create_batch_url_parsing_task
Create a batch URL parsing task (for remote file URLs)
**Parameters:**
- `files` (required): File array, each file contains url, is_ocr, page_ranges and other properties
- `enable_formula` (optional): Enable formula recognition
- `enable_table` (optional): Enable table recognition
- `language` (optional): Document language
- `model_version` (optional): Model version
- `extra_formats` (optional): Additional export formats
### 5. get_batch_task_results
Query batch parsing task results (supports both URL batch parsing and local upload batch parsing)
**Parameters:**
- `batch_id` (required): Batch task ID (from create_batch_url_parsing_task or create_batch_parsing_task)
## Usage Examples
### Single File Parsing
```typescript
// Create parsing task
const taskResult = await create_parsing_task({
url: "https://example.com/document.pdf",
is_ocr: true,
enable_formula: true,
language: "en"
});
// Query task status
const status = await get_task_status({
task_id: taskResult.task_id
});
```
### Batch File Upload Parsing
```typescript
// Create batch upload task
const batchResult = await create_batch_parsing_task({
files: [
{ name: "document1.pdf", is_ocr: true },
{ name: "document2.docx" }
],
enable_formula: true,
language: "ch"
});
// Query batch task results (applicable to both batch parsing methods)
const batchStatus = await get_batch_task_results({
batch_id: batchResult.batch_id
});
```
### Batch URL Parsing
```typescript
// Create batch URL parsing task
const batchUrlResult = await create_batch_url_parsing_task({
files: [
{ url: "https://example.com/doc1.pdf", is_ocr: true },
{ url: "https://example.com/doc2.docx" }
],
enable_formula: true,
language: "en"
});
// Query batch task results (applicable to both batch parsing methods)
const batchUrlStatus = await get_batch_task_results({
batch_id: batchUrlResult.batch_id
});
```
## Development
```bash
npm run dev
```
## Important Notes
1. Single file size cannot exceed 200MB, page count cannot exceed 600 pages
2. Each account has 2000 pages of highest priority parsing quota per day
3. Due to network restrictions, foreign URLs like GitHub and AWS may timeout
4. Batch upload file links are valid for 24 hours
5. No need to set Content-Type header when uploading files
## Common Error Codes
| Error Code | Description | Solution |
|------------|-------------|----------|
| A0202 | Token error | Check if the Token is correct, or replace with a new Token |
| A0211 | Token expired | Replace with a new Token |
| -500 | Parameter error | Ensure parameter types and Content-Type are correct |
| -10001 | Service exception | Please try again later |
| -10002 | Request parameter error | Check request parameter format |
| -60001 | Failed to generate upload URL | Please try again later |
| -60002 | Failed to get matching file format | File type detection failed, ensure the requested filename and link have correct extensions, and the file is one of pdf, doc, docx, ppt, pptx, png, jp(e)g |
| -60003 | File read failed | Check if the file is corrupted and re-upload |
| -60004 | Empty file | Please upload a valid file |
| -60005 | File size exceeds limit | Check file size, maximum support 200MB |
| -60006 | File page count exceeds limit | Please split the file and try again |
| -60007 | Model service temporarily unavailable | Please try again later or contact technical support |
| -60008 | File read timeout | Check if URL is accessible |
| -60009 | Task submission queue is full | Please try again later |
| -60010 | Parsing failed | Please try again later |
| -60011 | Failed to get valid file | Please ensure the file has been uploaded |
| -60012 | Task not found | Please ensure task_id is valid and not deleted |
| -60013 | No permission to access this task | Can only access tasks submitted by yourself |
| -60014 | Delete running task | Running tasks do not support deletion |
| -60015 | File conversion failed | Can manually convert to PDF and upload |
| -60016 | File conversion failed | File conversion to specified format failed, can try other format export or retry |
## License
ISC