Offers community support through Discord channel
References GitHub for project hosting, stars, forks, and issue tracking
Supports interaction with Hugging Face datasets, enabling evaluation of data quality for datasets hosted on the platform
Provides evaluation capabilities for LaTeX formulas in datasets
Supports evaluation of Markdown formatting in datasets and content extraction quality assessment
Acknowledges MLflow as a related project in the model evaluation ecosystem
Integrates with OpenAI models like GPT-4o for LLM-based data quality assessment using various evaluation prompts
Integrates with pre-commit for code quality checks
Enables installation through PyPI package registry
Introduction
Dingo is a data quality evaluation tool that helps you automatically detect data quality issues in your datasets. Dingo provides a variety of built-in rules and model evaluation methods, and also supports custom evaluation methods. Dingo supports commonly used text datasets and multimodal datasets, including pre-training datasets, fine-tuning datasets, and evaluation datasets. In addition, Dingo supports multiple usage methods, including local CLI and SDK, making it easy to integrate into various evaluation platforms, such as OpenCompass.
Architecture Diagram

Quick Start
Related MCP server: Tigris MCP Server
Installation
Example Use Cases
1. Evaluate LLM chat data
2. Evaluate Dataset
Command Line Interface
Evaluate with Rule Sets
Evaluate with LLM (e.g., GPT-4o)
GUI Visualization
After evaluation (with result_save.bad=True), a frontend page will be automatically generated. To manually start the frontend:
Where output_directory contains the evaluation results with a summary.json file.

Online Demo
Try Dingo on our online demo: (Hugging Face)🤗
Local Demo
Try Dingo in local:

Google Colab Demo
Experience Dingo interactively with Google Colab notebook:
MCP Server
Dingo includes an experimental Model Context Protocol (MCP) server. For details on running the server and integrating it with clients like Cursor, please see the dedicated documentation:
Video Demonstration
To help you get started quickly with Dingo MCP, we've created a video walkthrough:
https://github.com/user-attachments/assets/aca26f4c-3f2e-445e-9ef9-9331c4d7a37b
This video demonstrates step-by-step how to use Dingo MCP server with Cursor.
Data Quality Metrics
Dingo provides comprehensive data quality assessment through both rule-based and prompt-based evaluation metrics. These metrics cover multiple quality dimensions including effectiveness, completeness, similarity, security, and more.
📊 View Complete Metrics Documentation →
Our evaluation system includes:
Text Quality Assessment Metrics: Pre-training data quality evaluation using DataMan methodology and enhanced multi-dimensional assessment
SFT Data Assessment Metrics: Honest, Helpful, Harmless evaluation for supervised fine-tuning data
Classification Metrics: Topic categorization and content classification
Multimodality Assessment Metrics: Image classification and relevance evaluation
Rule-Based Quality Metrics: Automated quality checks using heuristic rules for effectiveness and similarity detection
Factuality Assessment Metrics: Two-stage factuality evaluation based on GPT-5 System Card
etc
Most metrics are backed by academic sources to ensure objectivity and scientific rigor.
Using LLM Assessment in Evaluation
To use these assessment prompts in your evaluations, specify them in your configuration:
You can customize these prompts to focus on specific quality dimensions or to adapt to particular domain requirements. When combined with appropriate LLM models, these prompts enable comprehensive evaluation of data quality across multiple dimensions.
Hallucination Detection & RAG System Evaluation
For detailed guidance on using Dingo's hallucination detection capabilities, including HHEM-2.1-Open local inference and LLM-based evaluation:
📖 View Hallucination Detection Guide →
Factuality Assessment
For comprehensive guidance on using Dingo's two-stage factuality evaluation system:
📖 View Factuality Assessment Guide →
Rule Groups
Dingo provides pre-configured rule groups for different types of datasets:
Group | Use Case | Example Rules |
| General text quality |
,
,
, etc. |
| Fine-tuning datasets | Rules from
plus
for hallucination detection |
| RAG system evaluation |
,
for response consistency |
| Hallucination detection |
with LLM-based evaluation |
| Pre-training datasets | Comprehensive set of 20+ rules including
,
, etc. |
To use a specific rule group:
Feature Highlights
Multi-source & Multi-modal Support
Data Sources: Local files, Hugging Face datasets, S3 storage
Data Types: Pre-training, fine-tuning, and evaluation datasets
Data Modalities: Text and image
Rule-based & Model-based Evaluation
Built-in Rules: 20+ general heuristic evaluation rules
LLM Integration: OpenAI, Kimi, and local models (e.g., Llama3)
Hallucination Detection: HHEM-2.1-Open local model and GPT-based evaluation
RAG System Evaluation: Response consistency and context alignment assessment
Custom Rules: Easily extend with your own rules and models
Security Evaluation: Perspective API integration
Flexible Usage
Interfaces: CLI and SDK options
Integration: Easy integration with other platforms
Execution Engines: Local and Spark
Comprehensive Reporting
Quality Metrics: 7-dimensional quality assessment
Traceability: Detailed reports for anomaly tracking
User Guide
Custom Rules, Prompts, and Models
If the built-in rules don't meet your requirements, you can create custom ones:
Custom Rule Example
Custom LLM Integration
See more examples in:
Execution Engines
Local Execution
Spark Execution
Evaluation Reports
After evaluation, Dingo generates:
Summary Report (
summary.json): Overall metrics and scoresDetailed Reports: Specific issues for each rule violation
Report Description:
score:
num_good/totaltype_ratio: The count of type / total, such as:
QUALITY_BAD_COMPLETENESS/totalname_ratio: The count of name / total, such as:
QUALITY_BAD_COMPLETENESS-RuleColonEnd/total
Example summary:
Future Plans
Richer graphic and text evaluation indicators
Audio and video data modality evaluation
Small model evaluation (fasttext, Qurating)
Data diversity evaluation
Limitations
The current built-in detection rules and model methods focus on common data quality problems. For specialized evaluation needs, we recommend customizing detection rules.
Acknowledgments
Contribution
We appreciate all the contributors for their efforts to improve and enhance Dingo. Please refer to the Contribution Guide for guidance on contributing to the project.
License
This project uses the Apache 2.0 Open Source License.
This project uses fasttext for some functionality including language detection. fasttext is licensed under the MIT License, which is compatible with our Apache 2.0 license and provides flexibility for various usage scenarios.
Citation
If you find this project useful, please consider citing our tool: