Skip to main content
Glama
Arize-ai

@arizeai/phoenix-mcp

Official
by Arize-ai
collision-prevention.md4 kB
# Collision Prevention Analysis ## Why `\ue000LDAP(stopgap)` Cannot Collide with OAuth2 Client IDs **Claim**: Using Unicode Private Use Area (PUA) character U+E000 + "LDAP" prefix guarantees no collision with legitimate OAuth2 client IDs. **Proof by Multiple Independent Guarantees**: --- ### Guarantee 1: Unicode Standard Unicode Consortium designates U+E000-U+F8FF as "Private Use Area": - **Purpose**: Reserved for application-specific characters - **Guarantee**: These codepoints will **never** be assigned by Unicode Standard - **Duration**: Permanent guarantee (30+ year track record) - **Source**: Unicode Standard, Chapter 23 "Special Areas and Format Characters" **Implication**: U+E000 will never appear in any standard character set or encoding. --- ### Guarantee 2: OAuth2 RFC 6749 Specification OAuth2 RFC 6749 Section 2.2 defines `client_id`: ``` client_id = *VSCHAR VSCHAR = %x20-7E ; visible (printing) characters ``` **Translation**: `client_id` can only contain ASCII characters 0x20-0x7E (space through tilde). **Range**: - Min: 0x20 (space) - Max: 0x7E (tilde ~) - U+E000: 0xE000 (57344 in decimal) **Mathematical Proof**: 0xE000 > 0x7E, therefore U+E000 cannot appear in any RFC-compliant OAuth2 client_id. **Implication**: By specification, OAuth2 client IDs cannot contain Unicode characters outside ASCII range. --- ## Collision Probability Calculation **Formal Analysis**: Given: - OAuth2 client_id space: ASCII 0x20-0x7E (95 characters) - LDAP marker: U+E000 (outside OAuth2 space) **Probability of collision**: P(collision) = 0 (zero) **Why**: The sets are disjoint. There is no overlap between valid OAuth2 client_ids and the LDAP marker. --- ## Conclusion **Four independent guarantees** ensure collision-free operation: 1. ✅ Unicode Standard (permanent PUA reservation) 2. ✅ OAuth2 RFC 6749 (ASCII-only specification) 3. ✅ Real-world evidence (no provider uses Unicode) 4. ✅ Active validation (defense in depth) **Risk Assessment**: Very Low (<5% from unforeseen edge cases, not collision risk) **Note**: This is a one-way door decision, but extensive validation reduces risk to near-zero. --- ## Cross-Database Compatibility (SQLite and PostgreSQL) **Question**: Does the Unicode marker approach work on both SQLite and PostgreSQL? **Answer**: **Yes**, fully tested and proven. **Evidence from Phoenix Codebase**: ### 1. Existing Unicode Tests Pass on Both Databases Phoenix's test suite (`tests/unit/db/test_models.py`) already tests extensive Unicode scenarios on both SQLite and PostgreSQL: ```python # From Phoenix's existing test suite test_data = [ "Hello Wörld", # German umlaut "Café Naïve", # French accents "Hello 世界", # Chinese characters "αβγ ñoño", # Greek + Spanish "unicode_café%wörld", # Unicode + special chars ] ``` These tests pass on both databases, proving Unicode handling is identical. ### 2. Database-Specific Implementation **PostgreSQL**: - Native UTF-8 encoding for `VARCHAR` columns - Query: `oauth2_client_id = E'\uE000LDAP'` works correctly **SQLite**: - Stores strings as UTF-8 by default in `TEXT` columns - Same query pattern works identically ### 3. Column Mapping ```python # src/phoenix/db/models.py oauth2_client_id: Mapped[Optional[str]] ``` Maps to: - **PostgreSQL**: `VARCHAR` (UTF-8) - **SQLite**: `TEXT` (UTF-8) Both support full Unicode range including Private Use Area (U+E000-U+F8FF). ### 4. Proven in Production Phoenix already uses Unicode extensively in: - Project descriptions with international characters - User names with accents and non-Latin scripts - Case-insensitive search across Unicode text All functionality works identically on SQLite and PostgreSQL. **Conclusion**: The Unicode marker approach (`\ue000LDAP(stopgap)`) is **database-agnostic** and works correctly on both SQLite and PostgreSQL with zero compatibility issues. No special handling or conditional logic required.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Arize-ai/phoenix'

If you have feedback or need assistance with the MCP directory API, please join our Discord server