-
securityF
license-
qualityAI evaluation toolkit that measures inter-rater agreement (Fleiss' κ, Kendall's W) across multiple LLM providers. Evaluate prompt reliability, detect contested outputs, and track consensus trends over time.
Last updated