parquet_mcp_server

Name: Parquet MCP Server
Author: DeepSpringAI

鍛冶屋のバッジ

ウェブ検索や類似コンテンツの検索ツールを提供する強力なMCP（モデル制御プロトコル）サーバーです。このサーバーはClaude Desktopと連携するように設計されており、主に以下の2つの機能を提供します。

ウェブ検索: ウェブ検索を実行し、結果をスクレイピングする
類似検索: 過去の検索から関連情報を抽出します

このサーバーは特に次の場合に役立ちます:

ウェブ検索機能を必要とするアプリケーション
検索クエリに基づいて類似コンテンツを見つける必要があるプロジェクト

インストール

Smithery経由でインストール

Smithery経由で Claude Desktop 用の Parquet MCP Server を自動的にインストールするには:

npx -y @smithery/cli install @DeepSpringAI/parquet_mcp_server --client claude

このリポジトリをクローンする

git clone ... cd parquet_mcp_server

仮想環境の作成と有効化

uv venv .venv\Scripts\activate # On Windows source .venv/bin/activate # On macOS/Linux

パッケージをインストールする

uv pip install -e .

環境

次の変数を含む.envファイルを作成します。

EMBEDDING_URL=http://sample-url.com/api/embed # URL for the embedding service OLLAMA_URL=http://sample-url.com/ # URL for Ollama server EMBEDDING_MODEL=sample-model # Model to use for generating embeddings SEARCHAPI_API_KEY=your_searchapi_api_key FIRECRAWL_API_KEY=your_firecrawl_api_key VOYAGE_API_KEY=your_voyage_api_key AZURE_OPENAI_ENDPOINT=http://sample-url.com/azure_openai AZURE_OPENAI_API_KEY=your_azure_openai_api_key

Related MCP server: MCP Log Reader

Claude Desktopでの使用

Claude Desktop 構成ファイル ( claude_desktop_config.json ) に以下を追加します。

{ "mcpServers": { "parquet-mcp-server": { "command": "uv", "args": [ "--directory", "/home/${USER}/workspace/parquet_mcp_server/src/parquet_mcp_server", "run", "main.py" ] } } }

利用可能なツール

サーバーは主に 2 つのツールを提供します。

ウェブ検索: ウェブ検索を実行し、結果をスクレイピングします
- 必須パラメータ:
  - queries : 検索クエリのリスト
- オプションパラメータ:
  - page_number : 検索結果のページ番号（デフォルトは1）
検索から情報を抽出: 以前の検索から関連情報を抽出します
- 必須パラメータ:
  - queries : 結合する検索クエリのリスト

プロンプトの例

エージェントで使用できるプロンプトの例を次に示します。

ウェブ検索の場合:

"Please perform a web search for 'macbook' and 'laptop' and scrape the results from page 1"

検索から情報を抽出するには:

"Please extract relevant information from the previous searches for 'macbook'"

MCPサーバーのテスト

このプロジェクトにはsrc/testsディレクトリに包括的なテストスイートが含まれています。以下のコマンドですべてのテストを実行できます。

python src/tests/run_tests.py

または個別のテストを実行します。

# Test Web Search python src/tests/test_search_web.py # Test Extract Info from Search python src/tests/test_extract_info_from_search.py

クライアントを直接使用してサーバーをテストすることもできます。

from parquet_mcp_server.client import ( perform_search_and_scrape, # New web search function find_similar_chunks # New extract info function ) # Perform a web search perform_search_and_scrape(["macbook", "laptop"], page_number=1) # Extract information from the search results find_similar_chunks(["macbook"])

トラブルシューティング

SSL検証エラーが発生した場合は、 .envファイルのSSL設定が正しいことを確認してください。
埋め込みが生成されない場合は、以下を確認してください。
- Ollamaサーバーは稼働しておりアクセス可能です
- 指定されたモデルはOllamaサーバーで利用可能です
- 入力Parquetファイルにテキスト列が存在します
DuckDB 変換が失敗した場合は、以下を確認してください。
- 入力Parquetファイルが存在し、読み取り可能である
- 出力ディレクトリへの書き込み権限があります
- Parquetファイルは破損していません
PostgreSQL 変換が失敗した場合は、以下を確認してください。
- .envファイル内のPostgreSQL接続設定は正しいです
- PostgreSQLサーバーが稼働しておりアクセス可能である
- テーブルを作成/変更するために必要な権限があります
- pgvector拡張機能がデータベースにインストールされています

ベクトル類似度検索のためのPostgreSQL関数

PostgreSQL でベクトル類似性検索を実行するには、次の関数を使用できます。

-- Create the function for vector similarity search CREATE OR REPLACE FUNCTION match_web_search( query_embedding vector(1024), -- Adjusted vector size match_threshold float, match_count int -- User-defined limit for number of results ) RETURNS TABLE ( id bigint, metadata jsonb, text TEXT, -- Added text column to the result date TIMESTAMP, -- Using the date column instead of created_at similarity float ) LANGUAGE plpgsql AS $$ BEGIN RETURN QUERY SELECT web_search.id, web_search.metadata, web_search.text, -- Returning the full text of the chunk web_search.date, -- Returning the date timestamp 1 - (web_search.embedding <=> query_embedding) as similarity FROM web_search WHERE 1 - (web_search.embedding <=> query_embedding) > match_threshold ORDER BY web_search.date DESC, -- Sort by date in descending order (newest first) web_search.embedding <=> query_embedding -- Sort by similarity LIMIT match_count; -- Limit the results to the match_count specified by the user END; $$;

この関数を使用すると、PostgreSQLデータベースに保存されているベクトル埋め込みに対して類似度検索を実行できます。指定された類似度閾値を満たす結果が返され、ユーザー入力に基づいて結果の件数が制限されます。結果は日付と類似度でソートされます。

Postgresテーブルの作成

CREATE TABLE web_search ( id SERIAL PRIMARY KEY, text TEXT, metadata JSONB, embedding VECTOR(1024), -- This will be auto-updated date TIMESTAMP DEFAULT NOW() );

Parquet MCP Server