Skip to main content
Glama

Dingo MCP Server

by MigoXLab
slimpajama_data_evaluated_by_rule.md4.54 kB
# Slimpajama Dataset ## Dataset Introduction This dataset aims to evaluate the accuracy of the built-in rules in dingo. Therefore, the open-source dataset Slimpajama was selected, and data was extracted from it to construct the test set. | Field Name | Description | |--------------|-------------------------------------------------------------------------------| | data_id | Data ID, without special meaning, can be modified according to user needs | | content | Data to be tested | | language | Language type | | error_status | Data status, True for negative examples, False for positive examples | | type_list | Negative example types for negative data, empty list for positive data | | name_list | Negative example names for negative data, empty list for positive data | | reason_list | Negative example descriptions for negative data, empty list for positive data | Links: https://huggingface.co/datasets/chupei/slimpajama_badcase_rule https://huggingface.co/datasets/chupei/slimpajama_goodcase_rule ### Dataset Composition | Type | Count | |-------------------------------------------------|-------| | Positive examples | 82 | | Negative examples: RuleAlphaWords | 27 | | Negative examples: RuleCapitalWords | 26 | | Negative examples: RuleCharNumber | 5 | | Negative examples: RuleDocRepeat | 17 | | Negative examples: RuleHtmlEntity | 3 | | Negative examples: RuleLineEndWithEllipsis | 5 | | Negative examples: RuleLineEndWithTerminal | 5 | | Negative examples: RuleLineStartWithBulletpoint | 6 | | Negative examples: RuleLoremIpsum | 5 | | Negative examples: RuleMeanWordLength | 12 | | Negative examples: RuleNoPunc | 7 | | Negative examples: RuleSentenceNumber | 8 | | Negative examples: RuleSpecialCharacter | 4 | | Negative examples: RuleStopWord | 24 | | Negative examples: RuleSymbolWordRatio | 5 | | Negative examples: RuleUniqueWords | 7 | | Negative examples: RuleWordNumber | 7 | ## Rules Introduction This test uses the built-in **pretrain** as the eval_group. For specific rules included, please refer to: [Group Introduction](../../groups.md).<br> For rules within the group, please refer to: [Rules Introduction](../../rules.md). ## Evaluation Results ### Definitions After evaluation, both positive and negative data will generate corresponding summary files. Therefore, the results need to be defined with clear concepts. | Term | Description | |----------|--------------------------------------------------------------------------------| | TP | True Positive: Number of positive examples correctly identified | | FP | False Positive: Number of negative examples incorrectly identified as positive | | TN | True Negative: Number of negative examples correctly identified | | FN | False Negative: Number of positive examples incorrectly identified as negative | | Accuracy | TP / (TP + FP) Ratio of positive examples in the identified positives | | Recall | TP / (TP + FN) Ratio of positive examples correctly identified | | F1 | (Accuracy + Recall) / 2 | ### Results Display | Dataset Name | TP | FP | TN | FN | Accuracy% | Recall% | F1 | |--------------|----|----|-----|----|-----------|---------|------| | slimpajama | 78 | 5 | 103 | 4 | 94 | 95 | 94.5 | ## Evaluation Method Translate this markdown into English. ```python from dingo.config import InputArgs from dingo.exec import Executor input_data = { "eval_group": "pretrain", "input_path": "chupei/slimpajama_badcase_rule", "save_data": True, "save_correct": True, "save_raw": True, "max_workers": 10, "batch_size": 10, "data_format": "jsonl", "column_content": "content", } input_args = InputArgs(**input_data) executor = Executor.exec_map["local"](input_args) result = executor.execute() print(result) ```

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MigoXLab/dingo'

If you have feedback or need assistance with the MCP directory API, please join our Discord server