MCP Terminal Server

MIT License
Overview InspectNew Schema Related Servers Reviews Score
js
testapps
custom-evaluators
# Writing your own evaluators

This sample demonstrates how to write your own suite of custom evaluators. The evaluators in this package demonstrate how to write evaluators that leverage LLMs as well as a simple regex matcher. There are also simple test datasets to demonstrate how to use them.

## The Bring Your Own (BYO) custom evaluator plugin

To use a new evaluator, you need to define a custom evaluator plugin that is registered with genkit. We define this as a function `byoEval` in `src/index.ts`.

## Non LLM Evaluators

### Regex

Location: `src/regex`
Name: `byo/regex_match_{name}`
Output: numeric

The regex evaluator is an example that does not use an LLM. It also demonstrates how to create a factory method that can be parameterized.

## LLM Evaluators

### PII Detection

Location: `src/pii`
Name: `byo/pii_detection`
Output: boolean

An evaluator that attempts to detect PII in your output.

### Funniness

Location: `src/funniness`
Name: `byo/funniness`
Output: enum/categorization (FUNNY_JOKE, NOT_FUNNY_JOKE, OFFENSIVE_JOKE, NOT_A_JOKE)

An evaluator that attempts to judge if a passed statement is a joke and if it is funny.

### Deliciousness

Location: `src/deliciousness`
Name: `byo/deliciousness`
Output: string (yes, no, maybe)

An evaluator that attempts to judge if a passed statement is delicious literally or metaphorically.

## Build and start the app

```posix-terminal
pnpm build
```

or if you need to, build everything:

```posix-terminal
cd ../../../; pnpm build; pnpm pack:all; cd -
```

Start the testapp

```posix-terminal
genkit start -- pnpm dev
```

## Test your evaluator

Note: Run these commands in a separate terminal.

Regex:

```posix-terminal
genkit eval:run datasets/regex_dataset.json --evaluators=byo/regex_match_url,byo/regex_match_us_phone
```

PII Detection:

```posix-terminal
genkit eval:run ./datasets/pii_detection_dataset.json --evaluators=byo/pii_detection
```

Funniness:

```posix-terminal
genkit eval:run datasets/funniness_dataset.json --evaluators=byo/funniness
```

Deliciousness:

```posix-terminal
genkit eval:run datasets/deliciousness_dataset.json --evaluators=byo/deliciousness
```

Note: The evaluators implemented in this plugin do not consider the `input` provided to the model as part of the evaluation. Therefore, many of the test datasets provided in this testapp have `input` set to `"input"`. If you are implementing an evaluator that utilizes the input provied to the model, you have to provide the actual input in this field.

## See your results

Navigate to the `Evaluations` section in the Dev UI.