Skip to main content
Glama

Server Details

MCP server providing access to the Scorecard API to evaluate and optimize LLM systems.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
scorecard-ai/scorecard-node
GitHub Stars
0

See and control every tool call

Log every tool call with full inputs and outputs
Control which tools are enabled per connector
Manage credentials once, use from any MCP client
Monitor uptime and get alerted when servers go down

Available Tools

33 tools
create_metricsInspect

Create a new Metric for evaluating system outputs. The structure of a metric depends on the evalType and outputType of the metric.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

create_projectsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new Project.

Response Schema

{
  $ref: '#/$defs/project',
  $defs: {
    project: {
      type: 'object',
      description: 'A Project in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Project.'
        },
        description: {
          type: 'string',
          description: 'The description of the Project.'
        },
        name: {
          type: 'string',
          description: 'The name of the Project.'
        }
      },
      required: [        'id',
        'description',
        'name'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesThe name of the Project.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
descriptionYesThe description of the Project.
create_recordsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new Record in a Run.

Response Schema

{
  $ref: '#/$defs/record',
  $defs: {
    record: {
      type: 'object',
      description: 'A record of a system execution in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Record.'
        },
        expected: {
          type: 'object',
          description: 'The expected outputs for the Testcase.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'The actual inputs sent to the system, which should match the system\'s input schema.',
          additionalProperties: true
        },
        outputs: {
          type: 'object',
          description: 'The actual outputs from the system.',
          additionalProperties: true
        },
        runId: {
          type: 'string',
          description: 'The ID of the Run containing this Record.'
        },
        testcaseId: {
          type: 'string',
          description: 'The ID of the Testcase.'
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'outputs',
        'runId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
runIdYes
inputsYesThe actual inputs sent to the system, which should match the system's input schema.
outputsYesThe actual outputs from the system.
expectedYesThe expected outputs for the Testcase.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
testcaseIdNoThe ID of the Testcase.
create_runsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new Run.

Response Schema

{
  $ref: '#/$defs/run',
  $defs: {
    run: {
      type: 'object',
      description: 'A Run in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Run.'
        },
        metricIds: {
          type: 'array',
          description: 'The IDs of the metrics this Run is using.',
          items: {
            type: 'string'
          }
        },
        metricVersionIds: {
          type: 'array',
          description: 'The IDs of the metric versions this Run is using.',
          items: {
            type: 'string'
          }
        },
        numExpectedRecords: {
          type: 'number',
          description: 'The number of expected records in the Run. Determined by the number of testcases in the Run\'s Testset at the time of Run creation.'
        },
        numRecords: {
          type: 'number',
          description: 'The number of records in the Run.'
        },
        numScores: {
          type: 'number',
          description: 'The number of completed scores in the Run so far.'
        },
        status: {
          type: 'string',
          description: 'The status of the Run.',
          enum: [            'pending',
            'awaiting_execution',
            'running_execution',
            'awaiting_scoring',
            'running_scoring',
            'awaiting_human_scoring',
            'completed'
          ]
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system this Run is using.'
        },
        systemVersionId: {
          type: 'string',
          description: 'The ID of the system version this Run is using.'
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Run is testing.'
        }
      },
      required: [        'id',
        'metricIds',
        'metricVersionIds',
        'numExpectedRecords',
        'numRecords',
        'numScores',
        'status',
        'systemId',
        'systemVersionId',
        'testsetId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
metricIdsYesThe IDs of the metrics this Run is using.
projectIdYes
testsetIdNoThe ID of the Testset this Run is testing.
systemVersionIdNoThe ID of the system version this Run is using.
create_testcasesInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create multiple Testcases in the specified Testset.

Response Schema

{
  $ref: '#/$defs/testcase_create_response',
  $defs: {
    testcase_create_response: {
      type: 'object',
      properties: {
        items: {
          type: 'array',
          items: {
            $ref: '#/$defs/testcase'
          }
        }
      },
      required: [        'items'
      ]
    },
    testcase: {
      type: 'object',
      description: 'A test case in the Scorecard system. Contains JSON data that is validated against the schema defined by its Testset.\nThe `inputs` and `expected` fields are derived from the `data` field based on the Testset\'s `fieldMapping`, and include all mapped fields, including those with validation errors.\nTestcases are stored regardless of validation results, with any validation errors included in the `validationErrors` field.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testcase.'
        },
        expected: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as expected outputs, including those with validation errors.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as inputs, including those with validation errors.',
          additionalProperties: true
        },
        jsonData: {
          type: 'object',
          description: 'The JSON data of the Testcase, which is validated against the Testset\'s schema.',
          additionalProperties: true
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Testcase belongs to.'
        },
        validationErrors: {
          type: 'array',
          description: 'Validation errors found in the Testcase data. If present, the Testcase doesn\'t fully conform to its Testset\'s schema.',
          items: {
            type: 'object',
            properties: {
              message: {
                type: 'string',
                description: 'Human-readable error description.'
              },
              path: {
                type: 'string',
                description: 'JSON Pointer to the field with the validation error.'
              }
            },
            required: [              'message',
              'path'
            ]
          }
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'jsonData',
        'testsetId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
itemsYesTestcases to create (max 100).
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
testsetIdYes
create_testsetsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new Testset for a Project. The Testset will be created in the Project specified in the path.

Response Schema

{
  $ref: '#/$defs/testset',
  $defs: {
    testset: {
      type: 'object',
      description: 'A collection of Testcases that share the same schema.\nEach Testset defines the structure of its Testcases through a JSON schema.\nThe `fieldMapping` object maps top-level keys of the Testcase schema to their roles (input/expected output).\nFields not mentioned in the `fieldMapping` during creation or update are treated as metadata.\n\n## JSON Schema validation constraints supported:\n\n- **Required fields** - Fields listed in the schema\'s `required` array must be present in Testcases.\n- **Type validation** - Values must match the specified type (string, number, boolean, null, integer, object, array).\n- **Enum validation** - Values must be one of the options specified in the `enum` array.\n- **Object property validation** - Properties of objects must conform to their defined schemas.\n- **Array item validation** - Items in arrays must conform to the `items` schema.\n- **Logical composition** - Values must conform to at least one schema in the `anyOf` array.\n\nTestcases that fail validation will still be stored, but will include `validationErrors` detailing the issues.\nExtra fields in the Testcase data that are not in the schema will be stored but are ignored during validation.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testset.'
        },
        description: {
          type: 'string',
          description: 'The description of the Testset.'
        },
        fieldMapping: {
          type: 'object',
          description: 'Maps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.',
          properties: {
            expected: {
              type: 'array',
              description: 'Fields that represent expected outputs.',
              items: {
                type: 'string'
              }
            },
            inputs: {
              type: 'array',
              description: 'Fields that represent inputs to the AI system.',
              items: {
                type: 'string'
              }
            },
            metadata: {
              type: 'array',
              description: 'Fields that are not inputs or expected outputs.',
              items: {
                type: 'string'
              }
            }
          },
          required: [            'expected',
            'inputs',
            'metadata'
          ]
        },
        jsonSchema: {
          type: 'object',
          description: 'The JSON schema for each Testcase in the Testset.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the Testset.'
        }
      },
      required: [        'id',
        'description',
        'fieldMapping',
        'jsonSchema',
        'name'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesThe name of the Testset.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
projectIdYes
jsonSchemaYesThe JSON schema for each Testcase in the Testset.
descriptionYesThe description of the Testset.
fieldMappingYesMaps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.
delete_metrics
Idempotent
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Delete a specific Metric by ID. The metric will be removed from metric groups and monitors.

Response Schema

{
  $ref: '#/$defs/metric_delete_response',
  $defs: {
    metric_delete_response: {
      type: 'object',
      properties: {
        success: {
          type: 'boolean',
          description: 'Whether the deletion was successful.'
        }
      },
      required: [        'success'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
metricIdYes
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
delete_records
Idempotent
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Delete a specific Record by ID.

Response Schema

{
  $ref: '#/$defs/record_delete_response',
  $defs: {
    record_delete_response: {
      type: 'object',
      properties: {
        success: {
          type: 'boolean',
          description: 'Whether the deletion was successful.'
        }
      },
      required: [        'success'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
recordIdYes
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
delete_systems
Idempotent
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Delete a system definition by ID. This will not delete associated system versions.

Response Schema

{
  $ref: '#/$defs/system_delete_response',
  $defs: {
    system_delete_response: {
      type: 'object',
      properties: {
        success: {
          type: 'boolean',
          description: 'Whether the deletion was successful.'
        }
      },
      required: [        'success'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
systemIdYes
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
delete_testcasesInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Delete multiple Testcases by their IDs.

Response Schema

{
  $ref: '#/$defs/testcase_delete_response',
  $defs: {
    testcase_delete_response: {
      type: 'object',
      properties: {
        success: {
          type: 'boolean',
          description: 'Whether the deletion was successful.'
        }
      },
      required: [        'success'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
idsYesIDs of Testcases to delete.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
delete_testsets
Idempotent
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Delete Testset

Response Schema

{
  $ref: '#/$defs/testset_delete_response',
  $defs: {
    testset_delete_response: {
      type: 'object',
      properties: {
        success: {
          type: 'boolean',
          description: 'Whether the deletion was successful.'
        }
      },
      required: [        'success'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
testsetIdYes
get_metrics
Read-only
Inspect

Retrieve a specific Metric by ID.

ParametersJSON Schema
NameRequiredDescriptionDefault
metricIdYes
get_runs
Read-only
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a specific Run by ID.

Response Schema

{
  $ref: '#/$defs/run',
  $defs: {
    run: {
      type: 'object',
      description: 'A Run in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Run.'
        },
        metricIds: {
          type: 'array',
          description: 'The IDs of the metrics this Run is using.',
          items: {
            type: 'string'
          }
        },
        metricVersionIds: {
          type: 'array',
          description: 'The IDs of the metric versions this Run is using.',
          items: {
            type: 'string'
          }
        },
        numExpectedRecords: {
          type: 'number',
          description: 'The number of expected records in the Run. Determined by the number of testcases in the Run\'s Testset at the time of Run creation.'
        },
        numRecords: {
          type: 'number',
          description: 'The number of records in the Run.'
        },
        numScores: {
          type: 'number',
          description: 'The number of completed scores in the Run so far.'
        },
        status: {
          type: 'string',
          description: 'The status of the Run.',
          enum: [            'pending',
            'awaiting_execution',
            'running_execution',
            'awaiting_scoring',
            'running_scoring',
            'awaiting_human_scoring',
            'completed'
          ]
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system this Run is using.'
        },
        systemVersionId: {
          type: 'string',
          description: 'The ID of the system version this Run is using.'
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Run is testing.'
        }
      },
      required: [        'id',
        'metricIds',
        'metricVersionIds',
        'numExpectedRecords',
        'numRecords',
        'numScores',
        'status',
        'systemId',
        'systemVersionId',
        'testsetId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
runIdYes
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
get_systems
Read-only
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a specific system by ID.

Response Schema

{
  $ref: '#/$defs/system',
  $defs: {
    system: {
      type: 'object',
      description: 'A System Under Test (SUT).\n\nSystems are templates - to run evaluations, pair them with a SystemVersion that provides specific\nparameter values.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system.'
        },
        description: {
          type: 'string',
          description: 'The description of the system.'
        },
        name: {
          type: 'string',
          description: 'The name of the system. Unique within the project.'
        },
        productionVersion: {
          $ref: '#/$defs/system_version'
        },
        versions: {
          type: 'array',
          description: 'The versions of the system.',
          items: {
            type: 'object',
            description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
            properties: {
              id: {
                type: 'string',
                description: 'The ID of the system version.'
              },
              name: {
                type: 'string',
                description: 'The name of the system version.'
              }
            },
            required: [              'id',
              'name'
            ]
          }
        }
      },
      required: [        'id',
        'description',
        'name',
        'productionVersion',
        'versions'
      ]
    },
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
systemIdYes
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
get_systems_versions
Read-only
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a specific system version by ID.

Response Schema

{
  $ref: '#/$defs/system_version',
  $defs: {
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
systemVersionIdYes
get_testcases
Read-only
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a specific Testcase by ID.

Response Schema

{
  $ref: '#/$defs/testcase',
  $defs: {
    testcase: {
      type: 'object',
      description: 'A test case in the Scorecard system. Contains JSON data that is validated against the schema defined by its Testset.\nThe `inputs` and `expected` fields are derived from the `data` field based on the Testset\'s `fieldMapping`, and include all mapped fields, including those with validation errors.\nTestcases are stored regardless of validation results, with any validation errors included in the `validationErrors` field.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testcase.'
        },
        expected: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as expected outputs, including those with validation errors.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as inputs, including those with validation errors.',
          additionalProperties: true
        },
        jsonData: {
          type: 'object',
          description: 'The JSON data of the Testcase, which is validated against the Testset\'s schema.',
          additionalProperties: true
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Testcase belongs to.'
        },
        validationErrors: {
          type: 'array',
          description: 'Validation errors found in the Testcase data. If present, the Testcase doesn\'t fully conform to its Testset\'s schema.',
          items: {
            type: 'object',
            properties: {
              message: {
                type: 'string',
                description: 'Human-readable error description.'
              },
              path: {
                type: 'string',
                description: 'JSON Pointer to the field with the validation error.'
              }
            },
            required: [              'message',
              'path'
            ]
          }
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'jsonData',
        'testsetId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
testcaseIdYes
get_testsets
Read-only
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Get Testset

Response Schema

{
  $ref: '#/$defs/testset',
  $defs: {
    testset: {
      type: 'object',
      description: 'A collection of Testcases that share the same schema.\nEach Testset defines the structure of its Testcases through a JSON schema.\nThe `fieldMapping` object maps top-level keys of the Testcase schema to their roles (input/expected output).\nFields not mentioned in the `fieldMapping` during creation or update are treated as metadata.\n\n## JSON Schema validation constraints supported:\n\n- **Required fields** - Fields listed in the schema\'s `required` array must be present in Testcases.\n- **Type validation** - Values must match the specified type (string, number, boolean, null, integer, object, array).\n- **Enum validation** - Values must be one of the options specified in the `enum` array.\n- **Object property validation** - Properties of objects must conform to their defined schemas.\n- **Array item validation** - Items in arrays must conform to the `items` schema.\n- **Logical composition** - Values must conform to at least one schema in the `anyOf` array.\n\nTestcases that fail validation will still be stored, but will include `validationErrors` detailing the issues.\nExtra fields in the Testcase data that are not in the schema will be stored but are ignored during validation.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testset.'
        },
        description: {
          type: 'string',
          description: 'The description of the Testset.'
        },
        fieldMapping: {
          type: 'object',
          description: 'Maps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.',
          properties: {
            expected: {
              type: 'array',
              description: 'Fields that represent expected outputs.',
              items: {
                type: 'string'
              }
            },
            inputs: {
              type: 'array',
              description: 'Fields that represent inputs to the AI system.',
              items: {
                type: 'string'
              }
            },
            metadata: {
              type: 'array',
              description: 'Fields that are not inputs or expected outputs.',
              items: {
                type: 'string'
              }
            }
          },
          required: [            'expected',
            'inputs',
            'metadata'
          ]
        },
        jsonSchema: {
          type: 'object',
          description: 'The JSON schema for each Testcase in the Testset.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the Testset.'
        }
      },
      required: [        'id',
        'description',
        'fieldMapping',
        'jsonSchema',
        'name'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
testsetIdYes
list_annotations
Read-only
Inspect

List all annotations (ratings and comments) for a specific Record. Annotations include thumbs up/down ratings and text comments left by users.

ParametersJSON Schema
NameRequiredDescriptionDefault
recordIdYesThe ID of the Record to list annotations for.
jq_filterNoA jq filter to apply to the response. For example: ".data[].comment" to get only comments.
list_metrics
Read-only
Inspect

List Metrics configured for the specified Project. Metrics are returned in reverse chronological order.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
cursorNoCursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
projectIdYes
list_projects
Read-only
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of all Projects. Projects are ordered by creation date, with oldest Projects first.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/project'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    project: {
      type: 'object',
      description: 'A Project in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Project.'
        },
        description: {
          type: 'string',
          description: 'The description of the Project.'
        },
        name: {
          type: 'string',
          description: 'The name of the Project.'
        }
      },
      required: [        'id',
        'description',
        'name'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
cursorNoCursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
list_records
Read-only
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of Records for a Run, including all scores for each record.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/record_list_response'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    record_list_response: {
      allOf: [        {
          $ref: '#/$defs/record'
        }
      ],
      description: 'A record with all its associated scores.'
    },
    record: {
      type: 'object',
      description: 'A record of a system execution in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Record.'
        },
        expected: {
          type: 'object',
          description: 'The expected outputs for the Testcase.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'The actual inputs sent to the system, which should match the system\'s input schema.',
          additionalProperties: true
        },
        outputs: {
          type: 'object',
          description: 'The actual outputs from the system.',
          additionalProperties: true
        },
        runId: {
          type: 'string',
          description: 'The ID of the Run containing this Record.'
        },
        testcaseId: {
          type: 'string',
          description: 'The ID of the Testcase.'
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'outputs',
        'runId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
runIdYes
cursorNoCursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
list_runs
Read-only
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of all Runs for a Project. Runs are ordered by creation date, most recent first.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/run'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    run: {
      type: 'object',
      description: 'A Run in the Scorecard system.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Run.'
        },
        metricIds: {
          type: 'array',
          description: 'The IDs of the metrics this Run is using.',
          items: {
            type: 'string'
          }
        },
        metricVersionIds: {
          type: 'array',
          description: 'The IDs of the metric versions this Run is using.',
          items: {
            type: 'string'
          }
        },
        numExpectedRecords: {
          type: 'number',
          description: 'The number of expected records in the Run. Determined by the number of testcases in the Run\'s Testset at the time of Run creation.'
        },
        numRecords: {
          type: 'number',
          description: 'The number of records in the Run.'
        },
        numScores: {
          type: 'number',
          description: 'The number of completed scores in the Run so far.'
        },
        status: {
          type: 'string',
          description: 'The status of the Run.',
          enum: [            'pending',
            'awaiting_execution',
            'running_execution',
            'awaiting_scoring',
            'running_scoring',
            'awaiting_human_scoring',
            'completed'
          ]
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system this Run is using.'
        },
        systemVersionId: {
          type: 'string',
          description: 'The ID of the system version this Run is using.'
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Run is testing.'
        }
      },
      required: [        'id',
        'metricIds',
        'metricVersionIds',
        'numExpectedRecords',
        'numRecords',
        'numScores',
        'status',
        'systemId',
        'systemVersionId',
        'testsetId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
cursorNoCursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
projectIdYes
list_systems
Read-only
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of all systems. Systems are ordered by creation date.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/system'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    system: {
      type: 'object',
      description: 'A System Under Test (SUT).\n\nSystems are templates - to run evaluations, pair them with a SystemVersion that provides specific\nparameter values.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system.'
        },
        description: {
          type: 'string',
          description: 'The description of the system.'
        },
        name: {
          type: 'string',
          description: 'The name of the system. Unique within the project.'
        },
        productionVersion: {
          $ref: '#/$defs/system_version'
        },
        versions: {
          type: 'array',
          description: 'The versions of the system.',
          items: {
            type: 'object',
            description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
            properties: {
              id: {
                type: 'string',
                description: 'The ID of the system version.'
              },
              name: {
                type: 'string',
                description: 'The name of the system version.'
              }
            },
            required: [              'id',
              'name'
            ]
          }
        }
      },
      required: [        'id',
        'description',
        'name',
        'productionVersion',
        'versions'
      ]
    },
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
cursorNoCursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
projectIdYes
list_testcases
Read-only
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of Testcases belonging to a Testset.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/testcase'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    testcase: {
      type: 'object',
      description: 'A test case in the Scorecard system. Contains JSON data that is validated against the schema defined by its Testset.\nThe `inputs` and `expected` fields are derived from the `data` field based on the Testset\'s `fieldMapping`, and include all mapped fields, including those with validation errors.\nTestcases are stored regardless of validation results, with any validation errors included in the `validationErrors` field.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testcase.'
        },
        expected: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as expected outputs, including those with validation errors.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as inputs, including those with validation errors.',
          additionalProperties: true
        },
        jsonData: {
          type: 'object',
          description: 'The JSON data of the Testcase, which is validated against the Testset\'s schema.',
          additionalProperties: true
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Testcase belongs to.'
        },
        validationErrors: {
          type: 'array',
          description: 'Validation errors found in the Testcase data. If present, the Testcase doesn\'t fully conform to its Testset\'s schema.',
          items: {
            type: 'object',
            properties: {
              message: {
                type: 'string',
                description: 'Human-readable error description.'
              },
              path: {
                type: 'string',
                description: 'JSON Pointer to the field with the validation error.'
              }
            },
            required: [              'message',
              'path'
            ]
          }
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'jsonData',
        'testsetId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
cursorNoCursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
testsetIdYes
list_testsets
Read-only
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Retrieve a paginated list of Testsets belonging to a Project.

Response Schema

{
  type: 'object',
  properties: {
    data: {
      type: 'array',
      items: {
        $ref: '#/$defs/testset'
      }
    },
    hasMore: {
      type: 'boolean'
    },
    nextCursor: {
      type: 'string'
    },
    total: {
      type: 'integer'
    }
  },
  required: [    'data',
    'hasMore',
    'nextCursor'
  ],
  $defs: {
    testset: {
      type: 'object',
      description: 'A collection of Testcases that share the same schema.\nEach Testset defines the structure of its Testcases through a JSON schema.\nThe `fieldMapping` object maps top-level keys of the Testcase schema to their roles (input/expected output).\nFields not mentioned in the `fieldMapping` during creation or update are treated as metadata.\n\n## JSON Schema validation constraints supported:\n\n- **Required fields** - Fields listed in the schema\'s `required` array must be present in Testcases.\n- **Type validation** - Values must match the specified type (string, number, boolean, null, integer, object, array).\n- **Enum validation** - Values must be one of the options specified in the `enum` array.\n- **Object property validation** - Properties of objects must conform to their defined schemas.\n- **Array item validation** - Items in arrays must conform to the `items` schema.\n- **Logical composition** - Values must conform to at least one schema in the `anyOf` array.\n\nTestcases that fail validation will still be stored, but will include `validationErrors` detailing the issues.\nExtra fields in the Testcase data that are not in the schema will be stored but are ignored during validation.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testset.'
        },
        description: {
          type: 'string',
          description: 'The description of the Testset.'
        },
        fieldMapping: {
          type: 'object',
          description: 'Maps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.',
          properties: {
            expected: {
              type: 'array',
              description: 'Fields that represent expected outputs.',
              items: {
                type: 'string'
              }
            },
            inputs: {
              type: 'array',
              description: 'Fields that represent inputs to the AI system.',
              items: {
                type: 'string'
              }
            },
            metadata: {
              type: 'array',
              description: 'Fields that are not inputs or expected outputs.',
              items: {
                type: 'string'
              }
            }
          },
          required: [            'expected',
            'inputs',
            'metadata'
          ]
        },
        jsonSchema: {
          type: 'object',
          description: 'The JSON schema for each Testcase in the Testset.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the Testset.'
        }
      },
      required: [        'id',
        'description',
        'fieldMapping',
        'jsonSchema',
        'name'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of items to return (1-100). Use with `cursor` for pagination through large sets.
cursorNoCursor for pagination. Pass the `nextCursor` from the previous response to get the next page of results.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
projectIdYes
search_docs
Read-only
Inspect

Search for documentation for how to use the client to interact with the API.

ParametersJSON Schema
NameRequiredDescriptionDefault
queryYesThe query to search for.
detailNoThe amount of detail to return.
languageYesThe language for the SDK to search for.
update_metricsInspect

Update an existing Metric. You must specify the evalType and outputType of the metric. The structure of a metric depends on the evalType and outputType of the metric.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

update_systemsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Update an existing system. Only the fields provided in the request body will be updated. If a field is provided, the new content will replace the existing content. If a field is not provided, the existing content will remain unchanged.

Response Schema

{
  $ref: '#/$defs/system',
  $defs: {
    system: {
      type: 'object',
      description: 'A System Under Test (SUT).\n\nSystems are templates - to run evaluations, pair them with a SystemVersion that provides specific\nparameter values.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system.'
        },
        description: {
          type: 'string',
          description: 'The description of the system.'
        },
        name: {
          type: 'string',
          description: 'The name of the system. Unique within the project.'
        },
        productionVersion: {
          $ref: '#/$defs/system_version'
        },
        versions: {
          type: 'array',
          description: 'The versions of the system.',
          items: {
            type: 'object',
            description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
            properties: {
              id: {
                type: 'string',
                description: 'The ID of the system version.'
              },
              name: {
                type: 'string',
                description: 'The name of the system version.'
              }
            },
            required: [              'id',
              'name'
            ]
          }
        }
      },
      required: [        'id',
        'description',
        'name',
        'productionVersion',
        'versions'
      ]
    },
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoThe name of the system. Unique within the project.
systemIdYes
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
descriptionNoThe description of the system.
productionVersionIdNoThe ID of the production version of the system.
update_testcases
Idempotent
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Replace the data of an existing Testcase while keeping its ID.

Response Schema

{
  $ref: '#/$defs/testcase',
  $defs: {
    testcase: {
      type: 'object',
      description: 'A test case in the Scorecard system. Contains JSON data that is validated against the schema defined by its Testset.\nThe `inputs` and `expected` fields are derived from the `data` field based on the Testset\'s `fieldMapping`, and include all mapped fields, including those with validation errors.\nTestcases are stored regardless of validation results, with any validation errors included in the `validationErrors` field.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testcase.'
        },
        expected: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as expected outputs, including those with validation errors.',
          additionalProperties: true
        },
        inputs: {
          type: 'object',
          description: 'Derived from data based on the Testset\'s fieldMapping. Contains all fields marked as inputs, including those with validation errors.',
          additionalProperties: true
        },
        jsonData: {
          type: 'object',
          description: 'The JSON data of the Testcase, which is validated against the Testset\'s schema.',
          additionalProperties: true
        },
        testsetId: {
          type: 'string',
          description: 'The ID of the Testset this Testcase belongs to.'
        },
        validationErrors: {
          type: 'array',
          description: 'Validation errors found in the Testcase data. If present, the Testcase doesn\'t fully conform to its Testset\'s schema.',
          items: {
            type: 'object',
            properties: {
              message: {
                type: 'string',
                description: 'Human-readable error description.'
              },
              path: {
                type: 'string',
                description: 'JSON Pointer to the field with the validation error.'
              }
            },
            required: [              'message',
              'path'
            ]
          }
        }
      },
      required: [        'id',
        'expected',
        'inputs',
        'jsonData',
        'testsetId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
jsonDataYesThe JSON data of the Testcase, which is validated against the Testset's schema.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
testcaseIdYes
update_testsetsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Update a Testset. Only the fields provided in the request body will be updated. If a field is provided, the new content will replace the existing content. If a field is not provided, the existing content will remain unchanged.

When updating the schema:

  • If field mappings are not provided and existing mappings reference fields that no longer exist, those mappings will be automatically removed

  • To preserve all existing mappings, ensure all referenced fields remain in the updated schema

  • For complete control, provide both schema and fieldMapping when updating the schema

Response Schema

{
  $ref: '#/$defs/testset',
  $defs: {
    testset: {
      type: 'object',
      description: 'A collection of Testcases that share the same schema.\nEach Testset defines the structure of its Testcases through a JSON schema.\nThe `fieldMapping` object maps top-level keys of the Testcase schema to their roles (input/expected output).\nFields not mentioned in the `fieldMapping` during creation or update are treated as metadata.\n\n## JSON Schema validation constraints supported:\n\n- **Required fields** - Fields listed in the schema\'s `required` array must be present in Testcases.\n- **Type validation** - Values must match the specified type (string, number, boolean, null, integer, object, array).\n- **Enum validation** - Values must be one of the options specified in the `enum` array.\n- **Object property validation** - Properties of objects must conform to their defined schemas.\n- **Array item validation** - Items in arrays must conform to the `items` schema.\n- **Logical composition** - Values must conform to at least one schema in the `anyOf` array.\n\nTestcases that fail validation will still be stored, but will include `validationErrors` detailing the issues.\nExtra fields in the Testcase data that are not in the schema will be stored but are ignored during validation.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the Testset.'
        },
        description: {
          type: 'string',
          description: 'The description of the Testset.'
        },
        fieldMapping: {
          type: 'object',
          description: 'Maps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.',
          properties: {
            expected: {
              type: 'array',
              description: 'Fields that represent expected outputs.',
              items: {
                type: 'string'
              }
            },
            inputs: {
              type: 'array',
              description: 'Fields that represent inputs to the AI system.',
              items: {
                type: 'string'
              }
            },
            metadata: {
              type: 'array',
              description: 'Fields that are not inputs or expected outputs.',
              items: {
                type: 'string'
              }
            }
          },
          required: [            'expected',
            'inputs',
            'metadata'
          ]
        },
        jsonSchema: {
          type: 'object',
          description: 'The JSON schema for each Testcase in the Testset.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the Testset.'
        }
      },
      required: [        'id',
        'description',
        'fieldMapping',
        'jsonSchema',
        'name'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoThe name of the Testset.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
testsetIdYes
jsonSchemaNoThe JSON schema for each Testcase in the Testset.
descriptionNoThe description of the Testset.
fieldMappingNoMaps top-level keys of the Testcase schema to their roles (input/expected output). Unmapped fields are treated as metadata.
upsert_scores
Idempotent
Inspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create or update a Score for a given Record and MetricConfig. If a Score with the specified Record ID and MetricConfig ID already exists, it will be updated. Otherwise, a new Score will be created. The score provided should conform to the schema defined by the MetricConfig; otherwise, validation errors will be reported.

Response Schema

{
  $ref: '#/$defs/score',
  $defs: {
    score: {
      type: 'object',
      description: 'A Score represents the evaluation of a Record against a specific MetricConfig. The actual `score` is stored as flexible JSON. While any JSON is accepted, it is expected to conform to the output schema defined by the MetricConfig. Any discrepancies will be noted in the `validationErrors` field, but the Score will still be stored.',
      properties: {
        metricConfigId: {
          type: 'string',
          description: 'The ID of the MetricConfig this Score is for.'
        },
        recordId: {
          type: 'string',
          description: 'The ID of the Record this Score is for.'
        },
        score: {
          type: 'object',
          description: 'The score of the Record, as arbitrary JSON. This data should ideally conform to the output schema defined by the associated MetricConfig. If it doesn\'t, validation errors will be captured in the `validationErrors` field.',
          additionalProperties: true
        },
        validationErrors: {
          type: 'array',
          description: 'Validation errors found in the Score data. If present, the Score doesn\'t fully conform to its MetricConfig\'s schema.',
          items: {
            type: 'object',
            properties: {
              message: {
                type: 'string',
                description: 'Human-readable error description.'
              },
              path: {
                type: 'string',
                description: 'JSON Pointer to the field with the validation error.'
              }
            },
            required: [              'message',
              'path'
            ]
          }
        }
      },
      required: [        'metricConfigId',
        'recordId',
        'score'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
scoreYesThe score of the Record, as arbitrary JSON. This data should ideally conform to the output schema defined by the associated MetricConfig. If it doesn't, validation errors will be captured in the `validationErrors` field.
recordIdYes
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
metricConfigIdYes
upsert_systemsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new system. If one with the same name in the project exists, it updates it instead.

Response Schema

{
  $ref: '#/$defs/system',
  $defs: {
    system: {
      type: 'object',
      description: 'A System Under Test (SUT).\n\nSystems are templates - to run evaluations, pair them with a SystemVersion that provides specific\nparameter values.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system.'
        },
        description: {
          type: 'string',
          description: 'The description of the system.'
        },
        name: {
          type: 'string',
          description: 'The name of the system. Unique within the project.'
        },
        productionVersion: {
          $ref: '#/$defs/system_version'
        },
        versions: {
          type: 'array',
          description: 'The versions of the system.',
          items: {
            type: 'object',
            description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
            properties: {
              id: {
                type: 'string',
                description: 'The ID of the system version.'
              },
              name: {
                type: 'string',
                description: 'The name of the system version.'
              }
            },
            required: [              'id',
              'name'
            ]
          }
        }
      },
      required: [        'id',
        'description',
        'name',
        'productionVersion',
        'versions'
      ]
    },
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoThe name of the system. Should be unique within the project. Default is "Default system"
configYesThe configuration of the system.
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).
projectIdYes
descriptionNoThe description of the system.
upsert_systems_versionsInspect

When using this tool, always use the jq_filter parameter to reduce the response size and improve performance.

Only omit if you're sure you don't need the data.

Create a new system version if it does not already exist. Does not set the created version to be the system's production version.

If there is already a system version with the same config, its name will be updated.

Response Schema

{
  $ref: '#/$defs/system_version',
  $defs: {
    system_version: {
      type: 'object',
      description: 'A SystemVersion defines the specific settings for a System Under Test.\n\nSystem versions contain parameter values that determine system behavior during evaluation.\nThey are immutable snapshots - once created, they never change.\n\nWhen running evaluations, you reference a specific systemVersionId to establish which system version to test.',
      properties: {
        id: {
          type: 'string',
          description: 'The ID of the system version.'
        },
        config: {
          type: 'object',
          description: 'The configuration of the system version.',
          additionalProperties: true
        },
        name: {
          type: 'string',
          description: 'The name of the system version.'
        },
        systemId: {
          type: 'string',
          description: 'The ID of the system the system version belongs to.'
        }
      },
      required: [        'id',
        'config',
        'name',
        'systemId'
      ]
    }
  }
}
ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoThe name of the system version. If creating a new system version and the name isn't provided, it will be autogenerated.
configYesThe configuration of the system version.
systemIdYes
jq_filterNoA jq filter to apply to the response to include certain fields. Consult the output schema in the tool description to see the fields that are available. For example: to include only the `name` field in every object of a results array, you can provide ".results[].name". For more information, see the [jq documentation](https://jqlang.org/manual/).

Verify Ownership

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [
    {
      "email": "your-email@example.com"
    }
  ]
}

The email address must match the email associated with your Glama account. Once verified, the connector will appear as claimed by you.

Sign in to verify ownership

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.