de en es ja ko ru zh

Genkit MCP

Official

by firebase

Overview Schema Related Servers Score Discussions

Python

Hybrid

genkit
py
samples
web-endpoints-hello

GEMINI.md•17.3 KiB

# web-endpoints-hello — Sample Guidelines ## Overview This is a **self-contained, template-ready** Genkit endpoints sample. It demonstrates all the ways to expose Genkit flows: REST (ASGI) and gRPC. It can be copied out of the monorepo and used as a standalone project starter. ## Self-Contained Design All scripts and dependencies are local — the sample does **not** reference files outside its directory: - `scripts/_common.sh` — Shared shell utilities (local copy) - `scripts/jaeger.sh` — Jaeger container management (podman preferred, docker fallback) - `scripts/generate_proto.sh` — Regenerate gRPC stubs from proto definition - `scripts/eject.sh` — Eject from monorepo into standalone project (pins deps, updates CI) - `setup.sh` — Installs all development tools (uv, just, podman/docker, genkit CLI) - `Containerfile` — Distroless container image (multi-stage, nonroot) - `deploy_*.sh` — Platform-specific deployment scripts - `run.sh` — Main entry point for running the app (REST + gRPC, passes `--debug`) ### Using as a Template ```bash cp -r web-endpoints-hello my-project cd my-project ./scripts/eject.sh # Auto-detect version, pin deps, update CI ./scripts/eject.sh --version 0.5.0 # Pin to a specific version ./scripts/eject.sh --name my-project # Also rename the project ./scripts/eject.sh --dry-run # Preview changes without modifying files ``` The eject script handles all monorepo isolation automatically: 1. Pins `genkit` and `genkit-plugin-*` dependencies to a release version 2. Updates `working-directory` in `.github/workflows/*.yml` from monorepo path to `.` 3. Renames the project (optional, via `--name`) 4. Regenerates the lockfile (`uv lock`) Then install and run: ```bash cp local.env.example .local.env # Configure local dev overrides just dev # Start app + Jaeger ``` ## Development Workflow The dev workflow is designed to be seamless: 1. `./setup.sh` — One-time setup: installs uv, just, podman/docker, genkit CLI 2. `just dev` — Auto-starts Jaeger (uses podman or docker), then the app 3. `just stop` — Kills all services (app, DevUI, Jaeger) ### Key Commands | Command | What it does | |---------|-------------| | `just dev` | Start app + Jaeger (with tracing, passes `--debug`) | | `just dev-litestar` | Same, with Litestar framework | | `just dev-quart` | Same, with Quart framework | | `just prod` | Multi-worker production server (gunicorn) | | `just stop` | Stop all services | | `just test` | Run pytest | | `just coverage` | Run tests with coverage (terminal + HTML) | | `just coverage-open` | Run coverage and open HTML report | | `just lint` | Run all lint checks (mirrors workspace `bin/lint`) | | `just eject` | Eject from monorepo into standalone project | | `just eject-dry-run` | Preview eject changes | | `./run.sh` | Start app only (no Jaeger, passes `--debug`) | ## Architecture ``` src/ ├── __init__.py # Package docstring ├── app_init.py # Genkit instance + cloud telemetry auto-detection ├── asgi.py # ASGI app factory for gunicorn (multi-worker) ├── cache.py # TTL + LRU response cache (stampede protection) ├── circuit_breaker.py # Async-safe circuit breaker for LLM API protection ├── config.py # Settings via pydantic-settings + CLI args (secure defaults) ├── connection.py # Connection pool / keep-alive tuning ├── flows.py # Genkit flow definitions (with cache + breaker) ├── generated/ # Protobuf + gRPC stubs (auto-generated) ├── grpc_server.py # gRPC service + logging/rate-limit interceptors ├── log_config.py # Structured logging (Rich/JSON + structlog + secret masking) ├── main.py # Entry point: resilience → security → start servers ├── rate_limit.py # Token-bucket rate limiting (ASGI + gRPC) ├── resilience.py # Shared cache + circuit breaker singletons ├── schemas.py # Pydantic models with Field constraints ├── security.py # ASGI security middleware stack (see below) ├── sentry_init.py # Optional Sentry error tracking ├── server.py # ASGI server helpers (granian/uvicorn/hypercorn) ├── telemetry.py # OpenTelemetry setup + framework instrumentation └── frameworks/ ├── fastapi_app.py # FastAPI adapter (debug gates Swagger UI) ├── litestar_app.py # Litestar adapter (debug gates OpenAPI docs) └── quart_app.py # Quart adapter gunicorn.conf.py # Gunicorn config for multi-worker production protos/ └── genkit_sample.proto # gRPC service definition ``` ## Frameworks & Servers - **REST Frameworks**: FastAPI (default), Litestar, Quart — selected via `--framework` - **ASGI Servers**: uvicorn (default), granian, hypercorn — selected via `--server` - **gRPC Server**: runs in parallel on `:50051` (disable with `--no-grpc`) - Each framework adapter in `src/frameworks/` provides a `create_app(ai, *, debug)` factory ## Tracing OpenTelemetry is a **required** dependency (not optional). `just dev` auto-starts Jaeger and passes `--otel-endpoint http://localhost:4318` so every request produces a trace visible at `http://localhost:16686`. ## Testing Tests live in `tests/` and require `pythonpath = ["."]` in `pyproject.toml` (already configured) so `from src.* import ...` works from any working directory. ```bash just test # Run all tests uv run pytest tests/ # Same, without just ``` ## Performance & Resilience - **Response cache** — In-memory TTL + LRU cache for idempotent flows (`src/cache.py`). Per-key `asyncio.Lock` coalescing prevents cache stampedes. Configurable via `CACHE_TTL`, `CACHE_MAX_SIZE`, `CACHE_ENABLED`. - **Circuit breaker** — Async-safe protection against cascading LLM API failures (`src/circuit_breaker.py`). States: CLOSED → OPEN → HALF_OPEN. Gated half-open probes. Configurable via `CB_FAILURE_THRESHOLD`, `CB_RECOVERY_TIMEOUT`. - **Connection tuning** — Keep-alive (75s) exceeds LB idle timeout (60s) to prevent 502s. LLM timeout (120s) prevents indefinite hangs. Pool sizes tuned via env vars. - **Multi-worker** — `gunicorn.conf.py` + `src/asgi.py` for multi-process production deployments. `just prod` shortcut. Worker recycling prevents memory leaks. - **Request ID** — `X-Request-ID` header on every request/response, bound to structlog context for log correlation (`src/security.py`). - **JSON logging** — `LOG_FORMAT=json` (production default) for log aggregators (`src/log_config.py`). Override to `console` in `local.env`. - **Readiness probe** — Separate `/ready` endpoint for k8s readiness probes. Exempt from rate limiting. ## Security — Secure by Default The sample follows a **secure-by-default** philosophy: every default is chosen so that a fresh deployment with zero configuration is locked down. Development convenience requires explicit opt-in via `--debug` or `DEBUG=true`. ### Debug mode A single flag gates all development-only features: | Feature | `debug=false` (default) | `debug=true` | |---------|-----------------------|-------------| | Swagger UI (`/docs`, `/redoc`) | Disabled | Enabled | | OpenAPI schema (`/openapi.json`) | Disabled | Enabled | | gRPC reflection | Disabled | Enabled | | Content-Security-Policy | `default-src none` (strict) | Relaxed for Swagger CDN | | CORS (when unconfigured) | Same-origin only | `*` (wildcard) | | Log format (when unconfigured) | `json` (structured) | `console` (colored) | | Trusted hosts warning | Logs warning at startup | Suppressed | Activate: `--debug` CLI flag, `DEBUG=true` env var, or `run.sh` (passes `--debug` automatically). **Never set `DEBUG=true` in production.** The `run.sh` dev script passes `--debug` automatically; production entry points (gunicorn, Cloud Run, Kubernetes) should never set it. ### `debug=False` security invariants When modifying any code that uses the `debug` flag, verify that `debug=False` (production) **always** picks the more restrictive option. This checklist covers every location where `debug` is checked: | Module | What `debug=False` does | What to verify | |--------|------------------------|----------------| | `security.py` `SecurityHeadersMiddleware` | Strict CSP: `default-src none` | Never use the relaxed CDN allowlist in production | | `security.py` `ExceptionMiddleware` | Returns generic `"Internal server error"` | Never expose exception type or traceback to clients | | `security.py` `apply_security_middleware` | CORS origins default to `[]` (same-origin) | Never fall back to `["*"]` when `debug=False` | | `security.py` trusted hosts warning | Logs a warning when `TRUSTED_HOSTS` is empty | Warning fires in production, not in debug | | `fastapi_app.py` | `docs_url=None`, `redoc_url=None`, `openapi_url=None` | Swagger UI and OpenAPI schema are disabled | | `litestar_app.py` | `enabled_endpoints=set()` | All doc endpoints are disabled | | `quart_app.py` | `debug` accepted but unused (no built-in Swagger) | No security impact; verify no future code adds a gate | | `grpc_server.py` | gRPC reflection not registered | API schema not exposed to unauthenticated clients | | `main.py` log format | Keeps `log_format="json"` (no colored console) | Never switch to `console` unless `debug=True` | | `config.py` | `debug: bool = False` | Default is `False`; CLI uses `action="store_true"` | **Rule:** Every `if debug:` block must enable a development convenience (not a security feature). Every `if not debug:` block must enforce a security restriction or emit a security warning. If a new feature needs `debug`, add it to this table and the debug mode matrix above. ### Secure defaults vs development overrides | Setting | Production default | Dev override (`local.env`) | |---------|-------------------|--------------------------| | `DEBUG` | `false` | `true` | | `CORS_ALLOWED_ORIGINS` | `""` (same-origin) | `*` | | `LOG_FORMAT` | `json` | `console` | | `TRUSTED_HOSTS` | `""` (warns at startup) | (empty OK in dev) | | `RATE_LIMIT_DEFAULT` | `60/minute` | (same) | | `MAX_BODY_SIZE` | `1048576` (1 MB) | (same) | ### Security features | Feature | Module | What it does | |---------|--------|-------------| | **OWASP security headers** | `security.py` | CSP, X-Frame-Options, HSTS, Referrer-Policy, etc. via `secure` library | | **CORS** | `security.py` | Same-origin by default; explicit allowlist in production | | **Rate limiting** | `rate_limit.py` | Token-bucket per client IP (REST 429 + gRPC RESOURCE_EXHAUSTED) | | **Body size limit** | `security.py` | 413 on oversized payloads before parsing (prevents memory exhaustion) | | **Per-request timeout** | `security.py` | Returns 504 on expiry; configurable via settings/CLI | | **Global exception handler** | `security.py` | Returns JSON 500; no tracebacks to clients in production | | **Secret masking in logs** | `log_config.py` | structlog processor redacts API keys, tokens, passwords, DSNs | | **HTTP access logging** | `security.py` | Logs method, path, status, duration for every request | | **Server header suppression** | `security.py` | Removes `Server` header to prevent version fingerprinting | | **Cache-Control: no-store** | `security.py` | Prevents intermediaries/browsers from caching API responses | | **HSTS** | `security.py` | Conditional on HTTPS; configurable `max-age` | | **GZip compression** | `security.py` | Via Starlette `GZipMiddleware`; configurable minimum size | | **Input validation** | `schemas.py` | Pydantic `Field` constraints on all inputs (max_length, pattern, etc.) | | **Request ID** | `security.py` | UUID4 generation/propagation, structlog binding, response echo | | **Trusted hosts** | `security.py` | Host-header validation (warns if unconfigured in production) | | **gRPC interceptors** | `grpc_server.py` | Logging + rate limiting + max message size + debug-only reflection | | **Circuit breaker** | `circuit_breaker.py` | Fail fast on LLM API degradation (prevents cascading failures) | | **Cache stampede protection** | `cache.py` | Per-key request coalescing (prevents thundering herd) | | **Graceful shutdown** | `main.py` / `grpc_server.py` | SIGTERM handling with configurable grace period (default: 10s) | | **Structured logging** | `log_config.py` | JSON by default (production); console override for dev; secret masking | | **Sentry** | `sentry_init.py` | Optional error tracking (`SENTRY_DSN`); PII stripped | | **Platform telemetry** | `app_init.py` | Auto-detects GCP/AWS/Azure; guarded `try/except ImportError` | | **License checks** | `justfile` | `just licenses` validates dependency licenses via `liccheck` | | **Vulnerability scanning** | `justfile` | `just audit` checks for CVEs via `pip-audit` + `pysentry-rs` | | **Distroless container** | `Containerfile` | No shell, nonroot (uid 65534), ~50 MB, no package manager | All middleware is framework-agnostic (pure ASGI) and applied in `apply_security_middleware()`. ### ASGI middleware stack order Middleware is applied inside-out in `apply_security_middleware()`. The request-flow order is: ``` AccessLog → GZip → CORS → TrustedHost → Timeout → MaxBodySize → ExceptionHandler → SecurityHeaders → RequestId → App ``` ### CORS allow_headers The CORS middleware uses an **explicit allowlist** of headers, not `["*"]`: ```python allow_headers=["Content-Type", "Authorization", "X-Request-ID"] ``` Wildcard `allow_headers` enables cache poisoning and header injection via CORS preflight — the explicit list only permits headers the API uses. ### Platform telemetry auto-detection Auto-detects cloud platform by checking environment signals: | Platform | Signal | Notes | |----------|--------|-------| | GCP (Cloud Run) | `K_SERVICE` | Always triggers | | GCP (GCE/GKE) | `GCE_METADATA_HOST` | Always triggers | | GCP (explicit) | `GOOGLE_CLOUD_PROJECT` + `GENKIT_TELEMETRY_GCP=1` | Requires opt-in (GOOGLE_CLOUD_PROJECT alone is too common on dev machines) | | AWS | `AWS_EXECUTION_ENV` | Always triggers | | Azure | `CONTAINER_APP_NAME` | Always triggers | | Generic OTLP | `OTEL_EXPORTER_OTLP_ENDPOINT` | Fallback | ## Threading, Asyncio & Event-Loop Audit Checklist When modifying any concurrency-related code in this sample (cache, circuit breaker, rate limiter, middleware), check every item below. These are real bugs found during code audits. ### Lock types - **Never use `threading.Lock`/`RLock` in async code** — blocks the event loop. All locks in this sample use `asyncio.Lock`. - **Third-party sync libraries may use threading locks internally.** This is why `circuit_breaker.py` and `cache.py` use custom implementations instead of wrapping `pybreaker` or `aiocache` — see docstrings for details. ### Time functions - **Use `time.monotonic()` for intervals/durations**, not `time.time()` or `datetime.now()`. Wall-clock time is subject to NTP jumps. - **Clamp `retry_after`** to `[0, 3600]` to guard against clock anomalies. - **Call time functions once** and reuse the value when needed in multiple expressions. ### Race conditions - **Cache stampede prevention** — `cache.py` uses per-key `asyncio.Lock` coalescing so only one coroutine executes the expensive LLM call per cache key. Without this, concurrent misses for the same key all trigger duplicate LLM API calls. - **Half-open probe gating** — `circuit_breaker.py` tracks `_half_open_calls` inside the async lock so only `half_open_max_calls` probes are allowed in flight. Without this, all concurrent callers that arrive during the half-open window would probe simultaneously. - **Avoid `exists()` + `delete()`** — use a single `delete()` or check-and-delete inside one lock acquisition to prevent TOCTOU races. ### Blocking I/O - **Never call sync network I/O from async code.** All rate limiting, caching, and circuit breaking in this sample use in-memory data structures (sub-microsecond, safe on the event loop). If switching to Redis/Memcached backends, wrap calls in `asyncio.to_thread()`. ### OSS library decisions | Area | Decision | Why | |------|----------|-----| | **Circuit breaker** | Custom (`circuit_breaker.py`) | `pybreaker` is sync-only, uses `threading.RLock`, requires private API access, uses wall-clock time | | **Cache** | Custom (`cache.py`) | `aiocache` has no LRU, no stampede prevention, weak types, same line count | | **Rate limiter** | Custom (`rate_limit.py`) | `limits` is sync-only, uses `time.time()`, fixed-window allows boundary bursts | | **Security headers** | OSS (`secure` library) | Tracks OWASP recommendations, header deprecations (X-XSS-Protection), evolving browser standards | See the module docstrings in each file for detailed rationale. ## Code Quality `pyproject.toml` includes full linter and type checker configs — they work both inside the monorepo and when the sample is copied out as a standalone project: | Tool | Purpose | |------|---------| | **Ruff** | Linting + formatting (isort, security, async, type annotations) | | **ty** | Astral's type checker (strict, blocks on errors) | | **Pyright** | Microsoft's type checker (basic mode) | | **Pyrefly** | Meta's type checker (strict, warnings-as-errors) | ```bash just lint # Run all checks (mirrors workspace bin/lint) just typecheck # Type checkers only (ty, pyrefly, pyright) just fmt # Format code with ruff ``` `just lint` includes: ruff check, ruff format, ty, pyrefly, pyright, shellcheck, addlicense, pysentry-rs, liccheck, and `uv lock --check`.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/firebase/genkit'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

GEMINI.md•17.3 KiB