Skip to main content
Glama

search_vector_only

Search academic papers using semantic vector similarity to find relevant literature when keyword matching fails. Returns results based on conceptual meaning rather than exact text matches.

Instructions

纯向量搜索

仅使用向量相似度搜索,适合语义相关但关键词不匹配的场景。

Args: query: 搜索查询字符串 k: 返回结果数量,默认 10

Returns: 搜索结果列表

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
kNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The main handler function for the 'search_vector_only' MCP tool, decorated with @mcp.tool(). It generates an embedding for the query, performs vector search using the database, formats results with snippets, distances, and similarities.
    @mcp.tool()
    def search_vector_only(
        query: str,
        k: int = 10,
    ) -> dict[str, Any]:
        """纯向量搜索
        
        仅使用向量相似度搜索,适合语义相关但关键词不匹配的场景。
        
        Args:
            query: 搜索查询字符串
            k: 返回结果数量,默认 10
            
        Returns:
            搜索结果列表
        """
        try:
            # Note: This is synchronous, but for single use it's fine. 
            # Could be asyncified if needed, but hybrid is sufficient.
            query_embedding = get_embedding(query)
            results = search_vector(query_embedding, k)
            
            formatted_results = []
            for r in results:
                text = r["text"]
                snippet = text[:200] + "..." if len(text) > 200 else text
                formatted_results.append({
                    "chunk_id": r["chunk_id"],
                    "doc_id": r["doc_id"],
                    "page_start": r["page_start"],
                    "page_end": r["page_end"],
                    "snippet": snippet,
                    "distance": r["distance"],
                    "similarity": 1.0 - r["distance"],
                })
            
            return {
                "query": query,
                "k": k,
                "results": formatted_results,
            }
        except Exception as e:
            return {
                "error": str(e),
                "query": query,
                "k": k,
                "results": [],
            }
  • Core helper function that executes the vector similarity search query against the PostgreSQL database using pgvector for cosine distance.
    def search_vector(query_embedding: list[float], limit: int = 50) -> list[dict[str, Any]]:
        """向量搜索
        
        Args:
            query_embedding: 查询向量
            limit: 返回结果数量
            
        Returns:
            搜索结果列表,包含 chunk_id, doc_id, page_start, page_end, text, distance
        """
        # 将 embedding 转换为 pgvector 格式
        embedding_str = "[" + ",".join(str(x) for x in query_embedding) + "]"
        
        sql = """
        SELECT 
            c.chunk_id,
            c.doc_id,
            c.page_start,
            c.page_end,
            c.text,
            ce.embedding <=> %s::vector as distance
        FROM chunks c
        JOIN chunk_embeddings ce ON c.chunk_id = ce.chunk_id
        ORDER BY distance ASC
        LIMIT %s
        """
        return query_all(sql, (embedding_str, limit))
  • Top-level registration call in the main MCP server file that invokes register_search_tools(mcp), thereby registering the 'search_vector_only' tool among others.
    register_search_tools(mcp)
  • Function that defines and registers the search tools, including 'search_vector_only', using @mcp.tool() decorators.
    def register_search_tools(mcp: FastMCP) -> None:
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions the tool is for vector similarity search, it doesn't describe important behavioral aspects like whether this is a read-only operation, what permissions are required, rate limits, or how results are sorted/ranked. For a search tool with zero annotation coverage, this leaves significant gaps in understanding its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with a clear purpose statement, usage context, and parameter explanations in just a few sentences. Every sentence earns its place, and information is front-loaded with the core purpose first. The bilingual presentation (Chinese with some English terms) doesn't detract from conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (implied by 'Has output schema: true'), the description doesn't need to explain return values. However, for a search tool with no annotations and 2 parameters, the description should provide more behavioral context about how the search works, what data sources it searches, or performance characteristics. The current description is adequate but has clear gaps in completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates well by explaining both parameters: 'query: 搜索查询字符串' (search query string) and 'k: 返回结果数量,默认 10' (number of returned results, default 10). This adds meaningful semantics beyond the bare schema. The only minor gap is not explaining the range or constraints for the 'k' parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs '纯向量搜索' (pure vector search) using vector similarity, which is a specific verb+resource combination. It distinguishes itself from keyword-based search by mentioning '适合语义相关但关键词不匹配的场景' (suitable for semantically related but keyword-mismatched scenarios). However, it doesn't explicitly differentiate from sibling tools like 'search_fts_only' or 'search_hybrid' by name.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: '适合语义相关但关键词不匹配的场景' (suitable for semantically related but keyword-mismatched scenarios). This gives guidance on its appropriate use case. However, it doesn't explicitly mention when NOT to use it or name alternative tools like 'search_fts_only' or 'search_hybrid' that appear in the sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server