Perplexity MCP Server

Overview Schema Related Servers Score Discussions

Perplexity_MCP
Notes

reasearch_on_docs.md•19 KiB

# Conversation History Below is the entire conversation history so far, formatted as a numbered list of queries and their corresponding responses. Each item includes the user's query followed by the assistant's response. 1. **Query:** You are an expert among the experts in accessing documentation to the deepest level possible to include all context and summarizing the content. I am Building a custom Perplexity MCP using their API. I decided a little on the functionality different tools my MCP should have access to (based on the parameters an API offers and the Language it should be built on. (but my current understanding on the API is basic). I am in exploration stage right now. Outcomes I am expecting from this conversation is thoroughly, 1. understand all API's they provide in detail. 2. understand all parameters each API offers in detail. 3. understand the difference between each API's (Explain with a simple example use case) Their API is called Sonar API and here is their home link - https://docs.perplexity.ai/home We get the most context about API inside the api-reference section, there are around 4 API's they provide as shown below. - https://docs.perplexity.ai/api-reference/chat-completions-post - https://docs.perplexity.ai/api-reference/async-chat-completions-post - https://docs.perplexity.ai/api-reference/async-chat-completions-get - https://docs.perplexity.ai/api-reference/async-chat-completions-request_id-get **Response:** # Perplexity Sonar API: Complete Guide for MCP Development ## Overview of Perplexity Sonar API Perplexity's Sonar API is a powerful **real-time web search and AI-powered question answering system** that combines advanced language models with live internet search capabilities. The API provides developers with access to grounded, factual responses that include proper citations and real-time information retrieval. ## 1. Complete API Reference ### Available APIs Perplexity offers **4 distinct API endpoints** that serve different use cases: #### 1. Chat Completions (Synchronous) - **Endpoint**: `POST /chat/completions` - **Purpose**: Real-time, synchronous chat completions with immediate response - **Use Case**: Interactive applications requiring instant responses #### 2. Create Async Chat Completion - **Endpoint**: `POST /async/chat/completions` - **Purpose**: Submit asynchronous chat completion requests - **Use Case**: Long-running queries or when you need to handle multiple requests efficiently #### 3. List Async Chat Completions - **Endpoint**: `GET /async/chat/completions` - **Purpose**: Retrieve a list of all async chat completion requests - **Use Case**: Monitoring and managing multiple async requests #### 4. Get Async Chat Completion Response - **Endpoint**: `GET /async/chat/completions/{request_id}` - **Purpose**: Retrieve results from a specific async chat completion request - **Use Case**: Fetching completed async results ## 2. Detailed Parameter Analysis ### Synchronous Chat Completions Parameters **Required Parameters:** - `model`: Specifies which Sonar model to use (e.g., "sonar", "sonar-pro") - `messages`: Array of message objects with role and content **Optional Parameters:** - `temperature`: Controls randomness (0.0-1.0) - `max_tokens`: Maximum tokens in response - `top_p`: Nucleus sampling parameter - `stream`: Enable streaming responses - `presence_penalty`: Penalizes new tokens based on presence - `frequency_penalty`: Penalizes repetitive tokens ### Async Chat Completions Parameters **Enhanced Parameters for Async Requests:** - `search_mode`: Controls search behavior ("web" is common) - `reasoning_effort`: Intensity of reasoning ("low", "medium", "high") - `search_domain_filter`: Array of specific domains to search - `return_images`: Boolean to include image results - `return_related_questions`: Boolean to include related questions - `search_recency_filter`: Time-based filtering for search results - `search_after_date_filter`: Search results after specific date - `search_before_date_filter`: Search results before specific date - `last_updated_after_filter`: Content updated after specific date - `last_updated_before_filter`: Content updated before specific date - `top_k`: Number of top search results to consider - `web_search_options`: Object with `search_context_size` ("high", "medium", "low") ## 3. Available Models and Capabilities ### Current Model Lineup | Model | Features | Rate Limits | Best For | |-------|----------|-------------|----------| | `sonar-deep-research` | Related questions, structured outputs | 5-100 RPM | Deep research tasks | | `sonar-reasoning-pro` | Images, related questions, search filters, structured outputs | 50-2000 RPM | Complex reasoning with search | | `sonar-reasoning` | Images, related questions, search filters, structured outputs | 50-2000 RPM | General reasoning tasks | | `sonar-pro` | Images, related questions, search filters, structured outputs | 50-2000 RPM | Enhanced search capabilities | | `sonar` | Images, related questions, search filters, structured outputs | 50-2000 RPM | Standard search tasks | | `r1-1776` | Structured outputs only | 50-2000 RPM | Offline reasoning (no search) | ### Model Comparison **Sonar vs Sonar Pro:** - **Sonar**: Lightweight, fast, cost-effective for basic search tasks - **Sonar Pro**: Enhanced capabilities, 2x more citations, larger context window, handles complex multi-step queries ## 4. API Response Structure ### Standard Response Format ```json { "id": "string", "model": "string", "created": 123, "usage": { "prompt_tokens": 123, "completion_tokens": 123, "total_tokens": 123, "search_context_size": "string", "citation_tokens": 123, "num_search_queries": 123, "reasoning_tokens": 123 }, "object": "chat.completion", "choices": [{ "index": 123, "finish_reason": "stop", "message": { "content": "string", "role": "system" } }], "citations": ["string"], "search_results": [{ "title": "string", "url": "string", "date": "2023-12-25" }] } ``` ### Async Response Format Async responses include additional fields: - `created_at`: Request creation timestamp - `started_at`: Processing start timestamp - `completed_at`: Completion timestamp - `failed_at`: Failure timestamp (if applicable) - `error_message`: Error details (if applicable) - `status`: Current status ("CREATED", "RUNNING", "COMPLETED", "FAILED") ## 5. Rate Limits and Usage Tiers ### Usage Tier System | Tier | Credit Purchase (Lifetime) | Benefits | |------|---------------------------|----------| | Tier 0 | - | Basic limits | | Tier 1 | $50 | Increased limits | | Tier 2 | $250 | Higher limits | | Tier 3 | $500 | Enhanced limits | | Tier 4 | $1000 | Premium limits | | Tier 5 | $5000 | Maximum limits | ### Rate Limits by Tier **Tier 0 (Starting):** - Most models: 50 requests per minute - Deep research: 5 requests per minute - Async endpoints: 5 POST, 3000 GET (list), 6000 GET (specific) **Tier 5 (Maximum):** - Most models: 2000 requests per minute - Deep research: 100 requests per minute - Async endpoints: 100 POST, 3000 GET (list), 6000 GET (specific) ## 6. Practical Use Case Examples ### Example 1: Simple Real-time Search (Sync API) **Use Case**: Building a chatbot that needs immediate responses to user queries about current events. ```python import requests def sync_search_query(query): response = requests.post( "https://api.perplexity.ai/chat/completions", headers={"Authorization": "Bearer YOUR_API_KEY"}, json={ "model": "sonar", "messages": [ {"role": "user", "content": query} ] } ) return response.json() ``` ### Example 2: Deep Research Task (Async API) **Use Case**: Conducting thorough research on a complex topic that requires multiple searches and comprehensive analysis. ```python import requests def async_research_query(research_topic): response = requests.post( "https://api.perplexity.ai/async/chat/completions", headers={"Authorization": "Bearer YOUR_API_KEY"}, json={ "request": { "model": "sonar-pro", "messages": [ {"role": "user", "content": f"Provide a comprehensive analysis of {research_topic}"} ], "search_mode": "web", "reasoning_effort": "high", "return_related_questions": True, "web_search_options": { "search_context_size": "high" } } } ) return response.json()["id"] # Returns request ID for later retrieval ``` ### Example 3: Monitoring Async Requests **Use Case**: Managing multiple research tasks and checking their completion status. ```python def check_async_status(request_id): response = requests.get( f"https://api.perplexity.ai/async/chat/completions/{request_id}", headers={"Authorization": "Bearer YOUR_API_KEY"} ) return response.json() def list_all_async_requests(): response = requests.get( "https://api.perplexity.ai/async/chat/completions", headers={"Authorization": "Bearer YOUR_API_KEY"} ) return response.json() ``` ## 7. Key Differences Between APIs ### Synchronous vs Asynchronous **Synchronous API (`/chat/completions`):** - **Pros**: Immediate response, simple integration, real-time interaction - **Cons**: Limited to simpler queries, may timeout on complex requests - **Best For**: Interactive applications, chatbots, quick Q&A **Asynchronous API (`/async/chat/completions`):** - **Pros**: Handles complex queries, better resource management, scalable - **Cons**: Requires polling or webhooks, more complex integration - **Best For**: Research tasks, batch processing, complex analysis ### Management APIs **List Async Requests (`GET /async/chat/completions`):** - **Purpose**: Overview of all async requests - **Parameters**: `limit` (max results), `next_token` (pagination) - **Use Case**: Dashboard views, request monitoring **Get Specific Async Request (`GET /async/chat/completions/{request_id}`):** - **Purpose**: Detailed status and results of specific request - **Parameters**: `request_id` (path parameter) - **Use Case**: Retrieving completed results, error checking ## 8. MCP Development Recommendations ### For Python MCP Development 1. **Use OpenAI-compatible client**: Perplexity API is compatible with OpenAI clients 2. **Implement both sync and async endpoints**: Provide flexibility for different use cases 3. **Add proper error handling**: Include rate limiting and retry logic 4. **Implement caching**: Store frequently accessed results to reduce API calls 5. **Use appropriate models**: Match model capabilities to task requirements ### Sample MCP Tool Structure ```python class PerplexityMCP: def __init__(self, api_key): self.api_key = api_key self.base_url = "https://api.perplexity.ai" def quick_search(self, query, model="sonar"): """For immediate responses""" # Implement sync API call def deep_research(self, query, model="sonar-pro"): """For comprehensive research""" # Implement async API call def check_research_status(self, request_id): """Monitor async requests""" # Implement status checking def list_pending_research(self): """Manage multiple requests""" # Implement request listing ``` This comprehensive guide provides you with all the necessary information to build a robust Perplexity MCP. The API's flexibility allows you to create tools that range from simple search queries to complex research assistants, all backed by real-time web information and proper citations. 2. **Query:** - Okay for my use case I don’t need asynchronous endpoints. - Let us entirely Shift our focus to just to the chat-completions-post end point. - When I manually checked the list of parameters that this endpoint offers, it has much more parameters than what you gave in the above message. Revalidated this. and Provide a complete list. - Also Focus on the Output we get from the API, list all values/parameters we get in Output as well. - one other thing I am concerned is about the timeout issue you Arte talking about when using synchronous chat-completions-post. Let’s say I configured it to access deep research with a great reasoning effort, what will happen it that request takes too much time ? What's the average time out for complex parameters ? **Response:** # Perplexity Chat Completions API: Complete Parameter Reference and Timeout Analysis ## Complete Parameter Reference for `/chat/completions` You were absolutely correct - the synchronous chat completions endpoint offers significantly more parameters than I initially documented. Here's the **complete list** of all available parameters: ### **Required Parameters** | Parameter | Type | Description | |-----------|------|-------------| | `model` | string | The name of the model (e.g., "sonar", "sonar-pro", "sonar-deep-research") | | `messages` | Message[] | List of message objects with role and content | ### **Optional Parameters** | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `search_mode` | enum | "web" | Controls search mode ("academic" or "web") | | `reasoning_effort` | enum | "medium" | **Only for sonar-deep-research**: "low", "medium", or "high" | | `max_tokens` | integer | - | Maximum completion tokens returned | | `temperature` | number | 0.2 | Randomness (0-2), lower = more deterministic | | `top_p` | number | 0.9 | Nucleus sampling threshold (0-1) | | `search_domain_filter` | any[] | - | List of domains to limit/exclude (max 10) | | `return_images` | boolean | false | Include images in search results | | `return_related_questions` | boolean | false | Include related questions | | `search_recency_filter` | string | - | Time-based filter ("week", "day") | | `search_after_date_filter` | string | - | Content published after date (%m/%d/%Y) | | `search_before_date_filter` | string | - | Content published before date (%m/%d/%Y) | | `last_updated_after_filter` | string | - | Content updated after date (%m/%d/%Y) | | `last_updated_before_filter` | string | - | Content updated before date (%m/%d/%Y) | | `top_k` | number | 0 | Top-k filtering for token selection | | `stream` | boolean | false | Enable streaming responses | | `presence_penalty` | number | 0 | Penalty for discussing new topics (0-2.0) | | `frequency_penalty` | number | 0 | Penalty for repetitive content (0-2.0) | | `response_format` | object | - | Structured JSON output formatting | | `web_search_options` | object | - | Web search configuration with `search_context_size` | ### **Advanced Search Parameters** The **search and filtering capabilities** are extensive: - **Domain Control**: Allowlist/denylist up to 10 domains (prefix with `-` for exclusion) - **Temporal Filtering**: Multiple date-based filters for publication and update dates - **Content Type**: Control image inclusion and related question generation - **Search Context**: Configure search depth with `web_search_options.search_context_size` ("high", "medium", "low") ## Complete Response Schema The API returns a comprehensive response structure: ### **Main Response Fields** | Field | Type | Description | |-------|------|-------------| | `id` | string | Unique identifier for the completion | | `model` | string | Model that generated the response | | `created` | integer | Unix timestamp of creation | | `object` | string | Always "chat.completion" | | `choices` | ChatCompletionsChoice[] | Array of completion choices | | `citations` | string[] \| null | Citation sources for the response | | `search_results` | ApiPublicSearchResult[] \| null | Related search results | | `usage` | object | Detailed usage statistics | ### **Usage Statistics Object** | Field | Type | Description | |-------|------|-------------| | `prompt_tokens` | integer | Tokens in the input prompt | | `completion_tokens` | integer | Tokens in the generated response | | `total_tokens` | integer | Total tokens used | | `search_context_size` | string | Size of search context used | | `citation_tokens` | integer | Tokens from citation processing | | `num_search_queries` | integer | Number of search queries executed | | `reasoning_tokens` | integer | Tokens used for reasoning (deep research) | ### **Choice Object Structure** | Field | Type | Description | |-------|------|-------------| | `index` | integer | Choice index | | `finish_reason` | string | Completion reason ("stop", etc.) | | `message` | object | Message with content and role | ### **Search Results Object** | Field | Type | Description | |-------|------|-------------| | `title` | string | Search result title | | `url` | string | Source URL | | `date` | string | Publication date | ## Timeout Analysis and Recommendations ### **Current Timeout Challenges** The timeout issue you're concerned about is **very real** and well-documented: 1. **Deep Research Model**: Can take **10-30 minutes** to complete complex queries 2. **Community Reports**: **80% timeout rate** in some implementations 3. **Platform Limitations**: Many platforms enforce 5-minute timeouts 4. **Synchronous API Risk**: No built-in timeout handling for long-running queries ### **Timeout Duration Expectations** Based on community feedback and real-world usage: | Query Complexity | Expected Duration | Timeout Risk | |------------------|-------------------|--------------| | Simple queries (sonar) | 2-10 seconds | Very Low | | Complex queries (sonar-pro) | 10-30 seconds | Low | | Deep research (low effort) |

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Rohit-Seelam/Perplexity_MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

reasearch_on_docs.md•19 KiB