エンティティ識別

2 つのデータセットが同じエンティティからのものであるかどうかを識別します。 2 つのグループのデータが同じ本体からのものかどうかを確認する

これは、MCP (Model Context Protocol) サーバーです。これは、MCP プロトコルをサポートするサーバーです。

データ比較ツール

このツールは、2つのデータセットを比較する包括的な方法を提供し、値の正確性と意味的等価性の両方を評価します。テキスト正規化と言語モデルを活用して、データが同じエンティティに由来するかどうかを判断します。

特徴

テキストの正規化: テキストを小文字に変換し、句読点を削除し、空白を正規化します。
値の比較: 値を直接かつ意味的に比較します (リストの順序は無視されます)。
JSON トラバーサル: JSON オブジェクト内の各キーを反復処理し、対応する値を比較します。
言語モデルの統合: 生成言語モデルを使用して意味の類似性を評価し、データが同じエンティティからのものであるかどうかの最終的な判断を提供します。

インストール

このツールを使用するには、必要な依存関係がインストールされていることを確認してください。pipを使ってインストールできます。

pip install genai

使用法

機能

normalize_text(テキスト) :
- 入力テキストを小文字に変換し、句読点を削除し、空白を正規化することで正規化します。
compare_values(val1, val2) :
- 2 つの値を正確に、また意味的に比較します。
- 値がリストの場合、意味的な比較では要素の順序は無視されます。
compare_json(json1, json2) :
- 2 つの JSON オブジェクトをキーごとに比較します。
- compare_valuesを使用して各キーの値を評価します。
- 言語モデルを統合して意味の類似性を評価し、最終的な判断を下します。

例

import json
import genai
import re

# Define your JSON objects
json1 = {
    "name": "John Doe",
    "address": "123 Main St, Anytown, USA",
    "hobbies": ["reading", "hiking", "coding"]
}

json2 = {
    "name": "john doe",
    "address": "123 Main Street, Anytown, USA",
    "hobbies": ["coding", "hiking", "reading"]
}

# Compare the JSON objects
comparison_results = compare_json(json1, json2)

# Generate final matching result
model1 = genai.GenerativeModel("gemini-2.0-flash-thinking-exp")
result_matching = model1.generate_content("综合这些信息，你认为可以判断两个数据来自同一主体吗？"+json.dumps(comparison_results, ensure_ascii=False, indent=4))
print(result_matching.text)

貢献

貢献を歓迎します！問題を報告したり、プルリクエストを送信してください。

ライセンス

このプロジェクトはMITライセンスの下で提供されています。詳細はLICENSEファイルをご覧ください。

接触

ご質問やご提案がございましたら、下記までご連絡ください。

メールアドレス: u3588064@connect.hku.hk
GitHub: u3588064@connect.hku.hk 。

微信 $qrcode\_for\_gh\_643efb7db5bc\_344(1)$

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

テキスト正規化と言語モデルの統合を通じて正確な等価性と意味的等価性を比較し、2 つのデータセットが同じエンティティに属しているかどうかを判断するのに役立つ MCP サーバー。

データ比較ツール

Related MCP Servers

MCP-Ragdocs
qpd-v
-
security
A
license
-
quality
A Model Context Protocol (MCP) server that enables semantic search and retrieval of documentation using a vector database (Qdrant). This server allows you to add documentation from URLs or local files and then search through them using natural language queries.
Last updated -
14
74
JavaScript
Apache 2.0
kb-mcp-server
Geeksfino
-
security
A
license
-
quality
An MCP server aimed to be portable, local, easy and convenient to support semantic/graph based retrieval of txtai "all in one" embeddings database. Any txtai embeddings db in tar.gz form can be loaded
Last updated -
26
Python
MIT License
Baike-Render
Zzzccs123
-
security
F
license
-
quality
A MCP server that fetches and renders Baidu Baike (Chinese Wikipedia) discussion content, allowing users to access encyclopedia article discussions and generate readable analysis of the structured data.
Last updated -
TypeScript
Zanny's Persistent Memory Manager
zannyonear1h1
-
security
F
license
-
quality
A custom MCP server that allows storage, retrieval, and management of text-based information with natural language commands and keyword detection.
Last updated -
TypeScript

View all related MCP servers

EntityIdentification