Skip to main content
Glama
firebase
by firebase
README.md3.2 kB
# Evaluation in Genkit This sample demonstrates the different evaluation features using Genkit Python SDK. Note: This sample focuses on evaluation features in Genkit, by utilizing the official Genkit Evaluators plugin. If you are interested in writing your own custom evaluator, please check the `custom/test_evaluator` defined in `src/index.py`. ## Setup and start the sample ```bash # Start the Genkit Dev UI genkit start -- uv run samples/evaluator-demo/src/index.py # This command should output the link to the Genkit Dev UI. ``` The rest of the commands in this guide can be run in a separate terminal or directly in the Dev UI. ### Initial Setup ```bash # Index "docs/cat-handbook.pdf" to start # testing Genkit evaluation features. Please see # src/setup.py for more details. genkit flow:run setup ``` ## Evaluations ### Running Evaluations via CLI Use the `eval:flow` command to run a flow against a dataset and evaluate the outputs: ```bash # Evaluate with a specific evaluator genkit eval:flow pdf_qa --input data/cat_adoption_questions.json --evaluator=custom/test_evaluator # Evaluate with multiple evaluators genkit eval:flow pdf_qa --input data/cat_adoption_questions.json --evaluator=genkitEval/faithfulness --evaluator=genkitEval/maliciousness # Evaluate with all available evaluators (omit --evaluator flag) genkit eval:flow pdf_qa --input data/cat_adoption_questions.json ``` ### Running Evaluations in Dev UI 1. Navigate to the **Evaluations** tab in the Dev UI 2. Click **"Run Evaluation"** or **"New Evaluation"** 3. Configure: - **Flow**: Select the flow to evaluate (e.g., `pdf_qa`) - **Dataset**: Upload or select a JSON file (e.g., `data/cat_adoption_questions.json`) - **Evaluators**: Select one or more evaluators: - `custom/test_evaluator` - Random evaluator for testing (fast, no LLM calls) - `genkitEval/faithfulness` - Checks if output is faithful to context - `genkitEval/maliciousness` - Detects harmful content - `genkitEval/answer_relevancy` - Checks if answer is relevant to question 4. Click **"Run"** 5. View results in the Evaluations tab ### Programmatic Evaluation The `dog_facts_eval` flow demonstrates running evaluations from code. See `src/eval_in_code.py` for implementation details. ```bash # Run programmatic evaluation genkit flow:run dog_facts_eval ``` **Note:** The `dog_facts_eval` flow evaluates 20 test cases with the faithfulness metric, making 40 LLM API calls. This takes approximately 5 minutes to complete. ## Available Flows - **setup**: Indexes the default PDF document (`docs/cat-handbook.pdf`) into the vector store - **index_pdf**: Indexes a specified PDF file (defaults to `docs/cat-wiki.pdf`) - **pdf_qa**: RAG flow that answers questions based on indexed PDF documents. It requires `setup` flow run first. - **simple_structured**: Simple flow with structured input/output - **simple_echo**: Simple echo flow - **dog_facts_eval**: Programmatic evaluation flow using the faithfulness metric on a dog facts dataset ## Reference For more details on using Genkit evaluations, please refer to the official [Genkit documentation](https://firebase.google.com/docs/genkit/evaluation).

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/firebase/genkit'

If you have feedback or need assistance with the MCP directory API, please join our Discord server