Server Details

MCP server providing access to the Scorecard API to evaluate and optimize LLM systems.

Status: Healthy
Last Tested: 2026-04-03 22:36
Transport: Streamable HTTP
URL
Repository: scorecard-ai/scorecard-node
GitHub Stars: 0

See and control every tool call

Log every tool call with full inputs and outputs

Control which tools are enabled per connector

Manage credentials once, use from any MCP client

Monitor uptime and get alerted when servers go down

Available Tools

33 tools

create_metricsInspect

Create a new Metric for evaluating system outputs. The structure of a metric depends on the evalType and outputType of the metric.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

create_projectsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new Project.

Response Schema

{
  $ref: '#/$defs/project',
  $defs: {
    project: {
      type: 'object',
      description: 'A Project in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Project.'
        },
        description: {
          type: 'string',
          description: 'The description of the Project.'
        },
        name: {
          type: 'string',
          description: 'The name of the Project.'
        }
      },
      required: [        'id',
        'description',
        'name'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`name`	Yes	The name of the Project.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`description`	Yes	The description of the Project.

create_recordsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new Record in a Run.

Response Schema

{
  $ref: '#/$defs/record',
  $defs: {
    record: {
      type: 'object',
      description: 'A record of a system execution in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Record.'
        },
        expected: {
          type: 'object',
          description: 'The expected outputs for the Testcase.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'The actual inputs sent to the system, which should match the system\'s input schema.',
          additionalProperties: true
        },
        outputs: {
          type: 'object',
          description: 'The actual outputs from the system.',
          additionalProperties: true
        },
        runId: {
          type: 'string',
          description: 'The ID of the Run containing this Record.'
        },
        testcaseId: {
          type: 'string',
          description: 'The ID of the Testcase.'
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'outputs',
        'runId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`runId`	Yes
`inputs`	Yes	The actual inputs sent to the system, which should match the system's input schema.
`outputs`	Yes	The actual outputs from the system.
`expected`	Yes	The expected outputs for the Testcase.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`testcaseId`	No	The ID of the Testcase.

create_runsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new Run.

Response Schema

{
  $ref: '#/$defs/run',
  $defs: {
    run: {
      type: 'object',
      description: 'A Run in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Run.'
        },
        metricIds: {
          type: 'array',
          description: 'The IDs of the metrics this Run is using.',
          items: {
            type: 'string'
          }
        },
        metricVersionIds: {
          type: 'array',
          description: 'The IDs of the metric versions this Run is using.',
          items: {
            type: 'string'
          }
        },
        numExpectedRecords: {
          type: 'number',
          description: 'The number of expected records in the Run. Determined by the number of testcases in the Run\'s Testset at the time of Run creation.'
        },
        numRecords: {
          type: 'number',
          description: 'The number of records in the Run.'
        },
        numScores: {
          type: 'number',
          description: 'The number of completed scores in the Run so far.'
        },
        status: {
          type: 'string',
          description: 'The status of the Run.',
          enum: [            'pending',
            'awaiting_execution',
            'running_execution',
            'awaiting_scoring',
            'running_scoring',
            'awaiting_human_scoring',
            'completed'
          ]
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system this Run is using.'
        },
        systemVersionId: {
          type: 'string',
          description: 'The ID of the system version this Run is using.'
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Run is testing.'
        }
      },
      required: [        'id',
        'metricIds',
        'metricVersionIds',
        'numExpectedRecords',
        'numRecords',
        'numScores',
        'status',
        'systemId',
        'systemVersionId',
        'testsetId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`metricIds`	Yes	The IDs of the metrics this Run is using.
`projectId`	Yes
`testsetId`	No	The ID of the Testset this Run is testing.
`systemVersionId`	No	The ID of the system version this Run is using.

create_testcasesInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create multiple Testcases in the specified Testset.

Response Schema

{
  $ref: '#/$defs/testcase_create_response',
  $defs: {
    testcase_create_response: {
      type: 'object',
      properties: {
        items: {
          type: 'array',
          items: {
            $ref: '#/$defs/testcase'
          }
        }
      },
      required: [        'items'
      ]
    },
    testcase: {
      type: 'object',
      description: 'A test case in the Scorecard system. Contains JSON data that is validated against the schema defined by its Testset.\nThe `inputs` and `expected` fields are derived from the `data` field based on the Testset\'s `fieldMapping`, and include all mapped fields, including those with validation errors.\nTestcases are stored regardless of validation results, with any validation errors included in the `validationErrors` field.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testcase.'
        },
        expected: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as expected outputs, including those with validation errors.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as inputs, including those with validation errors.',
          additionalProperties: true
        },
        jsonData: {
          type: 'object',
          description: 'The JSON data of the Testcase, which is validated against the Testset\'s schema.',
          additionalProperties: true
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Testcase belongs to.'
        },
        validationErrors: {
          type: 'array',
          description: 'Validation errors found in the Testcase data. If present, the Testcase doesn\'t fully conform to its Testset\'s schema.',
          items: {
            type: 'object',
            properties: {
              message: {
                type: 'string',
                description: 'Human-readable error description.'
              },
              path: {
                type: 'string',
                description: 'JSON Pointer to the field with the validation error.'
              }
            },
            required: [              'message',
              'path'
            ]
          }
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'jsonData',
        'testsetId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`items`	Yes	Testcases to create (max 100).
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`testsetId`	Yes

create_testsetsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new Testset for a Project. The Testset will be created in the Project specified in the path.

Response Schema

{
  $ref: '#/$defs/testset',
  $defs: {
    testset: {
      type: 'object',
      description: 'A collection of Testcases that share the same schema.\nEach Testset defines the structure of its Testcases through a JSON schema.\nThe `fieldMapping` object maps top-level keys of the Testcase schema to their roles (input/expected output).\nFields not mentioned in the `fieldMapping` during creation or update are treated as metadata.\n\n## JSON Schema validation constraints supported:\n\n- **Required fields** - Fields listed in the schema\'s `required` array must be present in Testcases.\n- **Type validation** - Values must match the specified type (string, number, boolean, null, integer, object, array).\n- **Enum validation** - Values must be one of the options specified in the `enum` array.\n- **Object property validation** - Properties of objects must conform to their defined schemas.\n- **Array item validation** - Items in arrays must conform to the `items` schema.\n- **Logical composition** - Values must conform to at least one schema in the `anyOf` array.\n\nTestcases that fail validation will still be stored, but will include `validationErrors` detailing the issues.\nExtra fields in the Testcase data that are not in the schema will be stored but are ignored during validation.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testset.'
        },
        description: {
          type: 'string',
          description: 'The description of the Testset.'
        },
        fieldMapping: {
          type: 'object',
          description: 'Maps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.',
          properties: {
            expected: {
              type: 'array',
              description: 'Fields that represent expected outputs.',
              items: {
                type: 'string'
              }
            },
            inputs: {
              type: 'array',
              description: 'Fields that represent inputs to the AI system.',
              items: {
                type: 'string'
              }
            },
            metadata: {
              type: 'array',
              description: 'Fields that are not inputs or expected outputs.',
              items: {
                type: 'string'
              }
            }
          },
          required: [            'expected',
            'inputs',
            'metadata'
          ]
        },
        jsonSchema: {
          type: 'object',
          description: 'The JSON schema for each Testcase in the Testset.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the Testset.'
        }
      },
      required: [        'id',
        'description',
        'fieldMapping',
        'jsonSchema',
        'name'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`name`	Yes	The name of the Testset.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`projectId`	Yes
`jsonSchema`	Yes	The JSON schema for each Testcase in the Testset.
`description`	Yes	The description of the Testset.
`fieldMapping`	Yes	Maps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.

delete_metrics

Idempotent

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Delete a specific Metric by ID. The metric will be removed from metric groups and monitors.

Response Schema

{
  $ref: '#/$defs/metric_delete_response',
  $defs: {
    metric_delete_response: {
      type: 'object',
      properties: {
        success: {
          type: 'boolean',
          description: 'Whether the deletion was successful.'
        }
      },
      required: [        'success'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description	Default
`metricId`	Yes
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).

delete_records

Idempotent

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Delete a specific Record by ID.

Response Schema

{
  $ref: '#/$defs/record_delete_response',
  $defs: {
    record_delete_response: {
      type: 'object',
      properties: {
        success: {
          type: 'boolean',
          description: 'Whether the deletion was successful.'
        }
      },
      required: [        'success'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description	Default
`recordId`	Yes
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).

delete_systems

Idempotent

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Delete a system definition by ID. This will not delete associated system versions.

Response Schema

{
  $ref: '#/$defs/system_delete_response',
  $defs: {
    system_delete_response: {
      type: 'object',
      properties: {
        success: {
          type: 'boolean',
          description: 'Whether the deletion was successful.'
        }
      },
      required: [        'success'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description	Default
`systemId`	Yes
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).

delete_testcasesInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Delete multiple Testcases by their IDs.

Response Schema

{
  $ref: '#/$defs/testcase_delete_response',
  $defs: {
    testcase_delete_response: {
      type: 'object',
      properties: {
        success: {
          type: 'boolean',
          description: 'Whether the deletion was successful.'
        }
      },
      required: [        'success'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description	Default
`ids`	Yes	IDs of Testcases to delete.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).

delete_testsets

Idempotent

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Delete Testset

Response Schema

{
  $ref: '#/$defs/testset_delete_response',
  $defs: {
    testset_delete_response: {
      type: 'object',
      properties: {
        success: {
          type: 'boolean',
          description: 'Whether the deletion was successful.'
        }
      },
      required: [        'success'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description	Default
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`testsetId`	Yes

get_metrics

Read-only

Inspect

Retrieve a specific Metric by ID.

ParametersJSON Schema

Name	Required	Description	Default
`metricId`	Yes

get_runs

Read-only

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a specific Run by ID.

Response Schema

{
  $ref: '#/$defs/run',
  $defs: {
    run: {
      type: 'object',
      description: 'A Run in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Run.'
        },
        metricIds: {
          type: 'array',
          description: 'The IDs of the metrics this Run is using.',
          items: {
            type: 'string'
          }
        },
        metricVersionIds: {
          type: 'array',
          description: 'The IDs of the metric versions this Run is using.',
          items: {
            type: 'string'
          }
        },
        numExpectedRecords: {
          type: 'number',
          description: 'The number of expected records in the Run. Determined by the number of testcases in the Run\'s Testset at the time of Run creation.'
        },
        numRecords: {
          type: 'number',
          description: 'The number of records in the Run.'
        },
        numScores: {
          type: 'number',
          description: 'The number of completed scores in the Run so far.'
        },
        status: {
          type: 'string',
          description: 'The status of the Run.',
          enum: [            'pending',
            'awaiting_execution',
            'running_execution',
            'awaiting_scoring',
            'running_scoring',
            'awaiting_human_scoring',
            'completed'
          ]
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system this Run is using.'
        },
        systemVersionId: {
          type: 'string',
          description: 'The ID of the system version this Run is using.'
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Run is testing.'
        }
      },
      required: [        'id',
        'metricIds',
        'metricVersionIds',
        'numExpectedRecords',
        'numRecords',
        'numScores',
        'status',
        'systemId',
        'systemVersionId',
        'testsetId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description	Default
`runId`	Yes
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).

get_systems

Read-only

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a specific system by ID.

Response Schema

{
  $ref: '#/$defs/system',
  $defs: {
    system: {
      type: 'object',
      description: 'A System Under Test (SUT).\n\nSystems are templates - to run evaluations, pair them with a SystemVersion that provides specific\nparameter values.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system.'
        },
        description: {
          type: 'string',
          description: 'The description of the system.'
        },
        name: {
          type: 'string',
          description: 'The name of the system. Unique within the project.'
        },
        productionVersion: {
          $ref: '#/$defs/system_version'
        },
        versions: {
          type: 'array',
          description: 'The versions of the system.',
          items: {
            type: 'object',
            description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
            properties: {
              id: {
                type: 'string',
                description: 'The ID of the system version.'
              },
              name: {
                type: 'string',
                description: 'The name of the system version.'
              }
            },
            required: [              'id',
              'name'
            ]
          }
        }
      },
      required: [        'id',
        'description',
        'name',
        'productionVersion',
        'versions'
      ]
    },
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description	Default
`systemId`	Yes
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).

get_systems_versions

Read-only

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a specific system version by ID.

Response Schema

{
  $ref: '#/$defs/system_version',
  $defs: {
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description	Default
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`systemVersionId`	Yes

get_testcases

Read-only

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a specific Testcase by ID.

Response Schema

{
  $ref: '#/$defs/testcase',
  $defs: {
    testcase: {
      type: 'object',
      description: 'A test case in the Scorecard system. Contains JSON data that is validated against the schema defined by its Testset.\nThe `inputs` and `expected` fields are derived from the `data` field based on the Testset\'s `fieldMapping`, and include all mapped fields, including those with validation errors.\nTestcases are stored regardless of validation results, with any validation errors included in the `validationErrors` field.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testcase.'
        },
        expected: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as expected outputs, including those with validation errors.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as inputs, including those with validation errors.',
          additionalProperties: true
        },
        jsonData: {
          type: 'object',
          description: 'The JSON data of the Testcase, which is validated against the Testset\'s schema.',
          additionalProperties: true
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Testcase belongs to.'
        },
        validationErrors: {
          type: 'array',
          description: 'Validation errors found in the Testcase data. If present, the Testcase doesn\'t fully conform to its Testset\'s schema.',
          items: {
            type: 'object',
            properties: {
              message: {
                type: 'string',
                description: 'Human-readable error description.'
              },
              path: {
                type: 'string',
                description: 'JSON Pointer to the field with the validation error.'
              }
            },
            required: [              'message',
              'path'
            ]
          }
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'jsonData',
        'testsetId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description	Default
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`testcaseId`	Yes

get_testsets

Read-only

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Get Testset

Response Schema

{
  $ref: '#/$defs/testset',
  $defs: {
    testset: {
      type: 'object',
      description: 'A collection of Testcases that share the same schema.\nEach Testset defines the structure of its Testcases through a JSON schema.\nThe `fieldMapping` object maps top-level keys of the Testcase schema to their roles (input/expected output).\nFields not mentioned in the `fieldMapping` during creation or update are treated as metadata.\n\n## JSON Schema validation constraints supported:\n\n- **Required fields** - Fields listed in the schema\'s `required` array must be present in Testcases.\n- **Type validation** - Values must match the specified type (string, number, boolean, null, integer, object, array).\n- **Enum validation** - Values must be one of the options specified in the `enum` array.\n- **Object property validation** - Properties of objects must conform to their defined schemas.\n- **Array item validation** - Items in arrays must conform to the `items` schema.\n- **Logical composition** - Values must conform to at least one schema in the `anyOf` array.\n\nTestcases that fail validation will still be stored, but will include `validationErrors` detailing the issues.\nExtra fields in the Testcase data that are not in the schema will be stored but are ignored during validation.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testset.'
        },
        description: {
          type: 'string',
          description: 'The description of the Testset.'
        },
        fieldMapping: {
          type: 'object',
          description: 'Maps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.',
          properties: {
            expected: {
              type: 'array',
              description: 'Fields that represent expected outputs.',
              items: {
                type: 'string'
              }
            },
            inputs: {
              type: 'array',
              description: 'Fields that represent inputs to the AI system.',
              items: {
                type: 'string'
              }
            },
            metadata: {
              type: 'array',
              description: 'Fields that are not inputs or expected outputs.',
              items: {
                type: 'string'
              }
            }
          },
          required: [            'expected',
            'inputs',
            'metadata'
          ]
        },
        jsonSchema: {
          type: 'object',
          description: 'The JSON schema for each Testcase in the Testset.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the Testset.'
        }
      },
      required: [        'id',
        'description',
        'fieldMapping',
        'jsonSchema',
        'name'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description	Default
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`testsetId`	Yes

list_annotations

Read-only

Inspect

List all annotations (ratings and comments) for a specific Record. Annotations include thumbs up/down ratings and text comments left by users.

ParametersJSON Schema

Name	Required	Description	Default
`recordId`	Yes	The ID of the Record to list annotations for.
`jq_filter`	No	A jq filter to apply to the response. For example: ".data[].comment" to get only comments.

list_metrics

Read-only

Inspect

List Metrics configured for the specified Project. Metrics are returned in reverse chronological order.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
`cursor`	No	Cursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
`projectId`	Yes

list_projects

Read-only

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of all Projects. Projects are ordered by creation date, with oldest Projects first.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/project'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    project: {
      type: 'object',
      description: 'A Project in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Project.'
        },
        description: {
          type: 'string',
          description: 'The description of the Project.'
        },
        name: {
          type: 'string',
          description: 'The name of the Project.'
        }
      },
      required: [        'id',
        'description',
        'name'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
`cursor`	No	Cursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).

list_records

Read-only

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of Records for a Run, including all scores for each record.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/record_list_response'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    record_list_response: {
      allOf: [        {
          $ref: '#/$defs/record'
        }
      ],
      description: 'A record with all its associated scores.'
    },
    record: {
      type: 'object',
      description: 'A record of a system execution in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Record.'
        },
        expected: {
          type: 'object',
          description: 'The expected outputs for the Testcase.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'The actual inputs sent to the system, which should match the system\'s input schema.',
          additionalProperties: true
        },
        outputs: {
          type: 'object',
          description: 'The actual outputs from the system.',
          additionalProperties: true
        },
        runId: {
          type: 'string',
          description: 'The ID of the Run containing this Record.'
        },
        testcaseId: {
          type: 'string',
          description: 'The ID of the Testcase.'
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'outputs',
        'runId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
`runId`	Yes
`cursor`	No	Cursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).

list_runs

Read-only

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of all Runs for a Project. Runs are ordered by creation date, most recent first.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/run'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    run: {
      type: 'object',
      description: 'A Run in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Run.'
        },
        metricIds: {
          type: 'array',
          description: 'The IDs of the metrics this Run is using.',
          items: {
            type: 'string'
          }
        },
        metricVersionIds: {
          type: 'array',
          description: 'The IDs of the metric versions this Run is using.',
          items: {
            type: 'string'
          }
        },
        numExpectedRecords: {
          type: 'number',
          description: 'The number of expected records in the Run. Determined by the number of testcases in the Run\'s Testset at the time of Run creation.'
        },
        numRecords: {
          type: 'number',
          description: 'The number of records in the Run.'
        },
        numScores: {
          type: 'number',
          description: 'The number of completed scores in the Run so far.'
        },
        status: {
          type: 'string',
          description: 'The status of the Run.',
          enum: [            'pending',
            'awaiting_execution',
            'running_execution',
            'awaiting_scoring',
            'running_scoring',
            'awaiting_human_scoring',
            'completed'
          ]
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system this Run is using.'
        },
        systemVersionId: {
          type: 'string',
          description: 'The ID of the system version this Run is using.'
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Run is testing.'
        }
      },
      required: [        'id',
        'metricIds',
        'metricVersionIds',
        'numExpectedRecords',
        'numRecords',
        'numScores',
        'status',
        'systemId',
        'systemVersionId',
        'testsetId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
`cursor`	No	Cursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`projectId`	Yes

list_systems

Read-only

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of all systems. Systems are ordered by creation date.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/system'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    system: {
      type: 'object',
      description: 'A System Under Test (SUT).\n\nSystems are templates - to run evaluations, pair them with a SystemVersion that provides specific\nparameter values.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system.'
        },
        description: {
          type: 'string',
          description: 'The description of the system.'
        },
        name: {
          type: 'string',
          description: 'The name of the system. Unique within the project.'
        },
        productionVersion: {
          $ref: '#/$defs/system_version'
        },
        versions: {
          type: 'array',
          description: 'The versions of the system.',
          items: {
            type: 'object',
            description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
            properties: {
              id: {
                type: 'string',
                description: 'The ID of the system version.'
              },
              name: {
                type: 'string',
                description: 'The name of the system version.'
              }
            },
            required: [              'id',
              'name'
            ]
          }
        }
      },
      required: [        'id',
        'description',
        'name',
        'productionVersion',
        'versions'
      ]
    },
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
`cursor`	No	Cursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`projectId`	Yes

list_testcases

Read-only

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of Testcases belonging to a Testset.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/testcase'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    testcase: {
      type: 'object',
      description: 'A test case in the Scorecard system. Contains JSON data that is validated against the schema defined by its Testset.\nThe `inputs` and `expected` fields are derived from the `data` field based on the Testset\'s `fieldMapping`, and include all mapped fields, including those with validation errors.\nTestcases are stored regardless of validation results, with any validation errors included in the `validationErrors` field.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testcase.'
        },
        expected: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as expected outputs, including those with validation errors.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as inputs, including those with validation errors.',
          additionalProperties: true
        },
        jsonData: {
          type: 'object',
          description: 'The JSON data of the Testcase, which is validated against the Testset\'s schema.',
          additionalProperties: true
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Testcase belongs to.'
        },
        validationErrors: {
          type: 'array',
          description: 'Validation errors found in the Testcase data. If present, the Testcase doesn\'t fully conform to its Testset\'s schema.',
          items: {
            type: 'object',
            properties: {
              message: {
                type: 'string',
                description: 'Human-readable error description.'
              },
              path: {
                type: 'string',
                description: 'JSON Pointer to the field with the validation error.'
              }
            },
            required: [              'message',
              'path'
            ]
          }
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'jsonData',
        'testsetId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
`cursor`	No	Cursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`testsetId`	Yes

list_testsets

Read-only

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of Testsets belonging to a Project.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/testset'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    testset: {
      type: 'object',
      description: 'A collection of Testcases that share the same schema.\nEach Testset defines the structure of its Testcases through a JSON schema.\nThe `fieldMapping` object maps top-level keys of the Testcase schema to their roles (input/expected output).\nFields not mentioned in the `fieldMapping` during creation or update are treated as metadata.\n\n## JSON Schema validation constraints supported:\n\n- **Required fields** - Fields listed in the schema\'s `required` array must be present in Testcases.\n- **Type validation** - Values must match the specified type (string, number, boolean, null, integer, object, array).\n- **Enum validation** - Values must be one of the options specified in the `enum` array.\n- **Object property validation** - Properties of objects must conform to their defined schemas.\n- **Array item validation** - Items in arrays must conform to the `items` schema.\n- **Logical composition** - Values must conform to at least one schema in the `anyOf` array.\n\nTestcases that fail validation will still be stored, but will include `validationErrors` detailing the issues.\nExtra fields in the Testcase data that are not in the schema will be stored but are ignored during validation.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testset.'
        },
        description: {
          type: 'string',
          description: 'The description of the Testset.'
        },
        fieldMapping: {
          type: 'object',
          description: 'Maps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.',
          properties: {
            expected: {
              type: 'array',
              description: 'Fields that represent expected outputs.',
              items: {
                type: 'string'
              }
            },
            inputs: {
              type: 'array',
              description: 'Fields that represent inputs to the AI system.',
              items: {
                type: 'string'
              }
            },
            metadata: {
              type: 'array',
              description: 'Fields that are not inputs or expected outputs.',
              items: {
                type: 'string'
              }
            }
          },
          required: [            'expected',
            'inputs',
            'metadata'
          ]
        },
        jsonSchema: {
          type: 'object',
          description: 'The JSON schema for each Testcase in the Testset.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the Testset.'
        }
      },
      required: [        'id',
        'description',
        'fieldMapping',
        'jsonSchema',
        'name'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
`cursor`	No	Cursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`projectId`	Yes

search_docs

Read-only

Inspect

Search for documentation for how to use the client to interact with the API.

ParametersJSON Schema

Name	Required	Description
`query`	Yes	The query to search for.
`detail`	No	The amount of detail to return.
`language`	Yes	The language for the SDK to search for.

update_metricsInspect

Update an existing Metric. You must specify the evalType and outputType of the metric. The structure of a metric depends on the evalType and outputType of the metric.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

update_systemsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Update an existing system. Only the fields provided in the request body will be updated. If a field is provided, the new content will replace the existing content. If a field is not provided, the existing content will remain unchanged.

Response Schema

{
  $ref: '#/$defs/system',
  $defs: {
    system: {
      type: 'object',
      description: 'A System Under Test (SUT).\n\nSystems are templates - to run evaluations, pair them with a SystemVersion that provides specific\nparameter values.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system.'
        },
        description: {
          type: 'string',
          description: 'The description of the system.'
        },
        name: {
          type: 'string',
          description: 'The name of the system. Unique within the project.'
        },
        productionVersion: {
          $ref: '#/$defs/system_version'
        },
        versions: {
          type: 'array',
          description: 'The versions of the system.',
          items: {
            type: 'object',
            description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
            properties: {
              id: {
                type: 'string',
                description: 'The ID of the system version.'
              },
              name: {
                type: 'string',
                description: 'The name of the system version.'
              }
            },
            required: [              'id',
              'name'
            ]
          }
        }
      },
      required: [        'id',
        'description',
        'name',
        'productionVersion',
        'versions'
      ]
    },
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`name`	No	The name of the system. Unique within the project.
`systemId`	Yes
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`description`	No	The description of the system.
`productionVersionId`	No	The ID of the production version of the system.

update_testcases

Idempotent

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Replace the data of an existing Testcase while keeping its ID.

Response Schema

{
  $ref: '#/$defs/testcase',
  $defs: {
    testcase: {
      type: 'object',
      description: 'A test case in the Scorecard system. Contains JSON data that is validated against the schema defined by its Testset.\nThe `inputs` and `expected` fields are derived from the `data` field based on the Testset\'s `fieldMapping`, and include all mapped fields, including those with validation errors.\nTestcases are stored regardless of validation results, with any validation errors included in the `validationErrors` field.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testcase.'
        },
        expected: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as expected outputs, including those with validation errors.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as inputs, including those with validation errors.',
          additionalProperties: true
        },
        jsonData: {
          type: 'object',
          description: 'The JSON data of the Testcase, which is validated against the Testset\'s schema.',
          additionalProperties: true
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Testcase belongs to.'
        },
        validationErrors: {
          type: 'array',
          description: 'Validation errors found in the Testcase data. If present, the Testcase doesn\'t fully conform to its Testset\'s schema.',
          items: {
            type: 'object',
            properties: {
              message: {
                type: 'string',
                description: 'Human-readable error description.'
              },
              path: {
                type: 'string',
                description: 'JSON Pointer to the field with the validation error.'
              }
            },
            required: [              'message',
              'path'
            ]
          }
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'jsonData',
        'testsetId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`jsonData`	Yes	The JSON data of the Testcase, which is validated against the Testset's schema.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`testcaseId`	Yes

update_testsetsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Update a Testset. Only the fields provided in the request body will be updated. If a field is provided, the new content will replace the existing content. If a field is not provided, the existing content will remain unchanged.

When updating the schema:

If field mappings are not provided and existing mappings reference fields that no longer exist, those mappings will be automatically removed
To preserve all existing mappings, ensure all referenced fields remain in the updated schema
For complete control, provide both schema and fieldMapping when updating the schema

Response Schema

{
  $ref: '#/$defs/testset',
  $defs: {
    testset: {
      type: 'object',
      description: 'A collection of Testcases that share the same schema.\nEach Testset defines the structure of its Testcases through a JSON schema.\nThe `fieldMapping` object maps top-level keys of the Testcase schema to their roles (input/expected output).\nFields not mentioned in the `fieldMapping` during creation or update are treated as metadata.\n\n## JSON Schema validation constraints supported:\n\n- **Required fields** - Fields listed in the schema\'s `required` array must be present in Testcases.\n- **Type validation** - Values must match the specified type (string, number, boolean, null, integer, object, array).\n- **Enum validation** - Values must be one of the options specified in the `enum` array.\n- **Object property validation** - Properties of objects must conform to their defined schemas.\n- **Array item validation** - Items in arrays must conform to the `items` schema.\n- **Logical composition** - Values must conform to at least one schema in the `anyOf` array.\n\nTestcases that fail validation will still be stored, but will include `validationErrors` detailing the issues.\nExtra fields in the Testcase data that are not in the schema will be stored but are ignored during validation.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testset.'
        },
        description: {
          type: 'string',
          description: 'The description of the Testset.'
        },
        fieldMapping: {
          type: 'object',
          description: 'Maps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.',
          properties: {
            expected: {
              type: 'array',
              description: 'Fields that represent expected outputs.',
              items: {
                type: 'string'
              }
            },
            inputs: {
              type: 'array',
              description: 'Fields that represent inputs to the AI system.',
              items: {
                type: 'string'
              }
            },
            metadata: {
              type: 'array',
              description: 'Fields that are not inputs or expected outputs.',
              items: {
                type: 'string'
              }
            }
          },
          required: [            'expected',
            'inputs',
            'metadata'
          ]
        },
        jsonSchema: {
          type: 'object',
          description: 'The JSON schema for each Testcase in the Testset.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the Testset.'
        }
      },
      required: [        'id',
        'description',
        'fieldMapping',
        'jsonSchema',
        'name'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`name`	No	The name of the Testset.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`testsetId`	Yes
`jsonSchema`	No	The JSON schema for each Testcase in the Testset.
`description`	No	The description of the Testset.
`fieldMapping`	No	Maps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.

upsert_scores

Idempotent

Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create or update a Score for a given Record and MetricConfig. If a Score with the specified Record ID and MetricConfig ID already exists, it will be updated. Otherwise, a new Score will be created. The score provided should conform to the schema defined by the MetricConfig; otherwise, validation errors will be reported.

Response Schema

{
  $ref: '#/$defs/score',
  $defs: {
    score: {
      type: 'object',
      description: 'A Score represents the evaluation of a Record against a specific MetricConfig. The actual `score` is stored as flexible JSON. While any JSON is accepted, it is expected to conform to the output schema defined by the MetricConfig. Any discrepancies will be noted in the `validationErrors` field, but the Score will still be stored.',
      properties: {
        metricConfigId: {
          type: 'string',
          description: 'The ID of the MetricConfig this Score is for.'
        },
        recordId: {
          type: 'string',
          description: 'The ID of the Record this Score is for.'
        },
        score: {
          type: 'object',
          description: 'The score of the Record, as arbitrary JSON. This data should ideally conform to the output schema defined by the associated MetricConfig. If it doesn\'t, validation errors will be captured in the `validationErrors` field.',
          additionalProperties: true
        },
        validationErrors: {
          type: 'array',
          description: 'Validation errors found in the Score data. If present, the Score doesn\'t fully conform to its MetricConfig\'s schema.',
          items: {
            type: 'object',
            properties: {
              message: {
                type: 'string',
                description: 'Human-readable error description.'
              },
              path: {
                type: 'string',
                description: 'JSON Pointer to the field with the validation error.'
              }
            },
            required: [              'message',
              'path'
            ]
          }
        }
      },
      required: [        'metricConfigId',
        'recordId',
        'score'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`score`	Yes	The score of the Record, as arbitrary JSON. This data should ideally conform to the output schema defined by the associated MetricConfig. If it doesn't, validation errors will be captured in the `validationErrors` field.
`recordId`	Yes
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`metricConfigId`	Yes

upsert_systemsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new system. If one with the same name in the project exists, it updates it instead.

Response Schema

{
  $ref: '#/$defs/system',
  $defs: {
    system: {
      type: 'object',
      description: 'A System Under Test (SUT).\n\nSystems are templates - to run evaluations, pair them with a SystemVersion that provides specific\nparameter values.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system.'
        },
        description: {
          type: 'string',
          description: 'The description of the system.'
        },
        name: {
          type: 'string',
          description: 'The name of the system. Unique within the project.'
        },
        productionVersion: {
          $ref: '#/$defs/system_version'
        },
        versions: {
          type: 'array',
          description: 'The versions of the system.',
          items: {
            type: 'object',
            description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
            properties: {
              id: {
                type: 'string',
                description: 'The ID of the system version.'
              },
              name: {
                type: 'string',
                description: 'The name of the system version.'
              }
            },
            required: [              'id',
              'name'
            ]
          }
        }
      },
      required: [        'id',
        'description',
        'name',
        'productionVersion',
        'versions'
      ]
    },
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`name`	No	The name of the system. Should be unique within the project. Default is "Default system"
`config`	Yes	The configuration of the system.
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
`projectId`	Yes
`description`	No	The description of the system.

upsert_systems_versionsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new system version if it does not already exist. Does not set the created version to be the system's production version.

If there is already a system version with the same config, its name will be updated.

Response Schema

{
  $ref: '#/$defs/system_version',
  $defs: {
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}

ParametersJSON Schema

Name	Required	Description
`name`	No	The name of the system version. If creating a new system version and the name isn't provided, it will be autogenerated.
`config`	Yes	The configuration of the system version.
`systemId`	Yes
`jq_filter`	No	A jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).

Verify Ownership

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [
    {
      "email": "your-email@example.com"
    }
  ]
}

The email address must match the email associated with your Glama account. Once verified, the connector will appear as claimed by you.

mcp

Server Details

Available Tools

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Response Schema

Verify Ownership

Discussions

Your Connectors

Resources