Skip to main content
Glama
bandham-manikanta

MCP API Catalog Recommender

MCP-Powered API Catalog Recommender

An agentic API discovery system that combines semantic vector search over an OpenAPI catalog with a LangGraph orchestrator and MCP (Model Context Protocol) tools. Given a natural-language intent (e.g. "create a chat completion" or "charge a customer $50"), the agent retrieves the best-matching endpoints and returns a grounded, technical recommendation.


Architecture

The system uses a two-phase, decoupled design: expensive embedding work happens offline; runtime queries stay fast with at most one NIM call per search.

flowchart TB
    subgraph phase1 [Phase 1 - Offline Indexing]
        SPECS[OpenAPI specs in data/specs]
        BUILD[scripts/build_index.py]
        NIM_EMB[NVIDIA NIM nv-embedqa-e5-v5]
        FAISS[(faiss.index)]
        META[(metadata.json)]
        SPECS --> BUILD
        BUILD --> NIM_EMB
        NIM_EMB --> FAISS
        BUILD --> META
    end

    subgraph phase2 [Phase 2 - Runtime Serving]
        USER[User or Client]
        CLI[CLI src/mcp_agent.py]
        API[FastAPI src/main.py]
        AGENT[LangGraph MCPCatalogAgent]
        VERTEX[Vertex AI Qwen 2.5 7B primary]
        NIM_LLM[NVIDIA NIM Llama 3.1 8B fallback]
        MCP[FastMCP api_catalog_mcp.py]
        SEARCH[search_api_catalog]
        DETAILS[get_endpoint_details]

        USER --> CLI
        USER --> API
        CLI --> AGENT
        API --> AGENT
        AGENT --> VERTEX
        AGENT -.-> NIM_LLM
        AGENT --> MCP
        MCP --> AGENT
        MCP --> SEARCH
        MCP --> DETAILS
        SEARCH --> FAISS
        SEARCH --> META
        DETAILS --> META
    end

Request flow

  1. User sends a natural-language query via CLI or POST /query.

  2. LangGraph agent calls search_api_catalog1 NVIDIA NIM embedding call + local FAISS top-5 search.

  3. Agent calls get_endpoint_details for the best match(es) → 0 NIM calls (pure JSON lookup).

  4. Primary LLM (Vertex AI Qwen 2.5 7B) synthesizes a Markdown recommendation; on failure, falls back to NVIDIA NIM Llama 3.1 8B.


Related MCP server: Public APIs MCP

Design Choices

Area

Choice

Rationale

Retrieval

FAISS IndexFlatIP on L2-normalized vectors

Exact cosine similarity via inner product; fast enough for ~20–10k endpoints on CPU

Embeddings

NVIDIA NIM nvidia/nv-embedqa-e5-v5

Separate passage (index) vs query (search) input types for better retrieval quality

Protocol

FastMCP stdio server

Standard MCP tool interface; agent discovers tools at runtime via langchain-mcp-adapters

Orchestration

LangGraph state machine

Bounded tool-calling loop (max 6 iterations) with explicit agent → action → agent edges

Primary LLM

Vertex AI Qwen 2.5 7B (:rawPredict)

Enterprise-hosted inference; OpenAI-compatible client with URL rewrite hook

Fallback LLM

NVIDIA NIM meta/llama-3.1-8b-instruct

Resilience when Vertex endpoint is unavailable

Serving

FastAPI + Uvicorn

REST /query and /health for integration; Swagger at /docs

Index build

Offline batch job

Avoids re-embedding catalog on every server start; predictable startup latency


Constraints & Limitations

  • Pre-built index requireddata/faiss.index and data/metadata.json must exist before starting the MCP server or agent. Run the indexer first.

  • Catalog scope — Currently indexes OpenAPI specs under data/specs/ only (OpenAI + Stripe in the default dataset).

  • Top-K = 5search_api_catalog returns at most 5 endpoints per query (TOP_K in src/api_catalog_mcp.py).

  • Loop guard — Agent terminates after 6 LLM iterations to prevent infinite tool loops (MAX_LOOP_ITERATIONS in src/mcp_agent.py).

  • Vertex AI auth — Primary LLM requires Google Application Default Credentials (gcloud auth application-default login).

  • Windows file locks — Rebuilding the FAISS index while the FastAPI server is running may fail with PermissionError; stop the server first.

  • NIM dependency at search time — Each semantic search makes exactly one embedding API call; detail lookups are free.


Dataset

Source specs (data/specs/)

File

API

Endpoints

openai_openapi.json

OpenAI API

10

stripe_openapi.json

Stripe API

10

Total

2 APIs

20 endpoints

Derived artifacts (data/)

File

Description

faiss.index

Binary FAISS IndexFlatIP — one normalized vector per endpoint

metadata.json

Full endpoint records: api_name, path, method, summary, description, parameters, requestBody, responses

api_catalog.json

Supplementary sample catalog (Ford vehicle/EV APIs) — reference data, not indexed by default

Embedding input format

Each indexed endpoint is embedded as:

{api_name} {METHOD} {path}: {summary}

Example: Openai API POST /v1/chat/completions: Create a chat completion

Adding new APIs

  1. Drop an OpenAPI 3.0 JSON file into data/specs/ (e.g. twilio_openapi.json).

  2. Re-run the index builder (see Quick Start).

  3. Restart the MCP server / FastAPI service to load the new index.


Project Structure

mcp-catalog-agent/
├── src/
│   ├── api_catalog_mcp.py   # FastMCP server — search + detail tools
│   ├── mcp_agent.py         # LangGraph agent + CLI entry point
│   └── main.py              # FastAPI REST service
├── scripts/
│   ├── build_index.py       # Offline FAISS index builder
│   └── parse_output.ps1     # Saves base64 index output to data/ (Windows helper)
├── data/
│   ├── specs/               # OpenAPI 3.0 source specs
│   ├── faiss.index          # Generated vector index
│   └── metadata.json        # Generated endpoint metadata
├── run_test_sequence.py     # Spins up server, hits /health + /query, tears down
├── query_service.py         # HTTP smoke test against a running server
├── requirements.txt
├── TESTING.md               # Extended troubleshooting guide
└── .env.example

Quick Start

1. Clone and install

cd mcp-catalog-agent
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -r requirements.txt

2. Configure environment

Copy-Item .env.example .env
# Edit .env with your NVIDIA_API_KEY and VERTEX_ENDPOINT_URL

Variable

Required

Purpose

NVIDIA_API_KEY

Yes

Embeddings + LLM fallback

VERTEX_ENDPOINT_URL

Yes

Primary Qwen 2.5 7B endpoint

EMBEDDING_MODEL

No

Default: nvidia/nv-embedqa-e5-v5

NVIDIA_BASE_URL

No

Default: https://integrate.api.nvidia.com/v1

LANGCHAIN_*

No

LangSmith tracing

3. Build the vector index

python scripts/build_index.py > build_output_utf8.txt
.\scripts\parse_output.ps1

Verify data/faiss.index and data/metadata.json were created.

4. Run the CLI agent

python src/mcp_agent.py "How do I create a chat completion using OpenAI?"

5. Run the FastAPI service

python -m uvicorn src.main:app --host 127.0.0.1 --port 8000

Open http://127.0.0.1:8000/docs for interactive API docs.


Testing Examples

CLI queries

# OpenAI — chat completions
python src/mcp_agent.py "How do I create a chat completion using the OpenAI API?"

# Stripe — customers and charges
python src/mcp_agent.py "I need to list customers and create a $50 charge with Stripe."

# Stripe — invoices
python src/mcp_agent.py "How do I retrieve a customer invoice from Stripe?"

REST API

Health check

Invoke-RestMethod -Uri "http://127.0.0.1:8000/health" -Method Get

Expected response shape:

{
  "status": "healthy",
  "agent_initialized": true,
  "tools_count": 2,
  "tools": ["search_api_catalog", "get_endpoint_details"]
}

Query

$body = @{ query = "Find me a chat completion API" } | ConvertTo-Json
Invoke-RestMethod -Uri "http://127.0.0.1:8000/query" -Method Post -Body $body -ContentType "application/json"

curl

curl -X POST http://127.0.0.1:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How do I create a charge in Stripe?"}'

Automated smoke test

With the server already running:

python query_service.py

Or start server, test, and stop automatically:

python run_test_sequence.py

Verify NVIDIA NIM connectivity

python test_nvidia.py

MCP Tools

Tool

NIM calls

Description

search_api_catalog

1 per invocation

Semantic search; returns top matches with api_name, path, method, summary, score

get_endpoint_details

0

Full endpoint spec lookup by exact api_name + path

The agent system prompt enforces: search first → fetch details → synthesize recommendation.


Observability

When LANGCHAIN_TRACING_V2=true, traces appear in LangSmith under project mcp-api-catalog-recommender. Inspect the trace tree to verify tool-call order and LLM fallback behavior.


Troubleshooting

See TESTING.md for Windows-specific issues (pywintypes, port conflicts, FAISS file locks).


License

MIT (OpenAPI source specs retain their original licenses.)

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bandham-manikanta/mcp-catalog-agent'

If you have feedback or need assistance with the MCP directory API, please join our Discord server