google-workspace-unlimited

Overview Schema Related Servers Score Discussions

google-workspace-unlimited
docs

qdrant_text_indexing_strategy.md•7.52 KiB

# Qdrant Text Indexing Strategy for Card Building ## Overview Our card building system stores component metadata in Qdrant. Beyond vector embeddings, we can leverage Qdrant's full-text indexing features to enable: - Faster keyword searches - Fuzzy matching for user input variations - Phrase matching for multi-word component names - Language-aware stemming for documentation search ## Current Architecture Our `mcp_gchat_cards_v7` collection has these payload fields that could benefit: | Field | Current Type | Content Example | Text Index Candidate? | |-------|-------------|-----------------|----------------------| | `name` | keyword | "DecoratedText" | Yes - for component lookup | | `full_path` | keyword | "card_framework.v2.widgets.DecoratedText" | Yes - for path search | | `docstring` | text | "A widget that displays..." | **High value** | | `relationships.nl_descriptions` | text | "decorated text with icon, button..." | **High value** | | `relationships.child_classes` | list | ["Icon", "Button", "OnClick"] | Array filter works | ## Feature Analysis ### 1. ASCII Folding (v1.16.0+) **What it does**: Converts Unicode characters to ASCII equivalents (ã→a, é→e) **Use case for us**: - Users might search "cafe" when docs contain "café" - Component descriptions might use special chars: "naïve implementation" **Where to apply**: ```python # On docstring and nl_descriptions fields client.create_payload_index( collection_name="mcp_gchat_cards_v7", field_name="docstring", field_schema=models.TextIndexParams( type=models.TextIndexType.TEXT, tokenizer=models.TokenizerType.WORD, ascii_folding=True, # Enable ), ) ``` **Impact**: Low priority for our use case - component names are ASCII-only. --- ### 2. Stemming (Snowball) **What it does**: Reduces words to root form (running/runs/runner → "run") **Use case for us**: - Search "buttons" matches "Button", "ButtonList" - Search "selecting" matches "SelectionInput" - Search "clickable" matches "OnClick" **Where to apply**: ```python # On docstring and nl_descriptions for natural language search client.create_payload_index( collection_name="mcp_gchat_cards_v7", field_name="relationships.nl_descriptions", field_schema=models.TextIndexParams( type=models.TextIndexType.TEXT, tokenizer=models.TokenizerType.WORD, stemmer=models.SnowballParams( type=models.Snowball.SNOWBALL, language=models.SnowballLanguage.ENGLISH ) ), ) ``` **Impact**: **High value** - Helps match user queries like "add a clickable icon" to components that have "click" in their relationships. --- ### 3. Stopwords **What it does**: Filters common words (the, is, at, which, on) **Use case for us**: - Ignore "the", "a", "with" in searches - Focus on meaningful terms: "button with icon" → search "button icon" **Where to apply**: ```python client.create_payload_index( collection_name="mcp_gchat_cards_v7", field_name="docstring", field_schema=models.TextIndexParams( type=models.TextIndexType.TEXT, tokenizer=models.TokenizerType.WORD, stopwords=models.StopwordsSet( languages=[models.Language.ENGLISH], custom=[ "widget", # Too generic in our context "component", # Too generic "gchat", # Domain-specific noise ] ), ), ) ``` **Impact**: Medium - Helps with longer docstring searches, reduces noise. --- ### 4. Phrase Search **What it does**: Matches exact word sequences ("machine learning" as a phrase) **Use case for us**: - Match "decorated text" exactly, not "text" AND "decorated" separately - Match "button list" vs just "button" or "list" - Match "overflow menu" as exact phrase **Where to apply**: ```python client.create_payload_index( collection_name="mcp_gchat_cards_v7", field_name="name", # Component names field_schema=models.TextIndexParams( type=models.TextIndexType.TEXT, tokenizer=models.TokenizerType.WORD, lowercase=True, phrase_matching=True, # Enable ), ) # Search with phrase: client.scroll( collection_name="mcp_gchat_cards_v7", scroll_filter=models.Filter( must=[ models.FieldCondition( key="name", match=models.MatchText(text='"Decorated Text"') # Quotes for phrase ) ] ) ) ``` **Impact**: **High value** - Critical for multi-word component names. --- ## Recommended Index Configuration ```python from qdrant_client import models def create_text_indices(client, collection_name: str): """Create optimized text indices for card component search.""" # 1. Component name index - phrase matching for multi-word names client.create_payload_index( collection_name=collection_name, field_name="name", field_schema=models.TextIndexParams( type=models.TextIndexType.TEXT, tokenizer=models.TokenizerType.WORD, lowercase=True, phrase_matching=True, ), ) # 2. Docstring index - stemming + stopwords for NL search client.create_payload_index( collection_name=collection_name, field_name="docstring", field_schema=models.TextIndexParams( type=models.TextIndexType.TEXT, tokenizer=models.TokenizerType.WORD, lowercase=True, stemmer=models.SnowballParams( type=models.Snowball.SNOWBALL, language=models.SnowballLanguage.ENGLISH ), stopwords=models.Language.ENGLISH, ascii_folding=True, ), ) # 3. Relationship descriptions - stemming for NL matching client.create_payload_index( collection_name=collection_name, field_name="relationships.nl_descriptions", field_schema=models.TextIndexParams( type=models.TextIndexType.TEXT, tokenizer=models.TokenizerType.WORD, lowercase=True, stemmer=models.SnowballParams( type=models.Snowball.SNOWBALL, language=models.SnowballLanguage.ENGLISH ), ), ) ``` --- ## Multi-Module Considerations For supporting multiple modules (Gmail, Sheets, etc.), we should: 1. **Add module prefix to component names** in payload: - `gchat:Button` vs `gmail:Button` - Already supported by SymbolGenerator's prefix system 2. **Create separate collections** per module: - `mcp_gchat_cards_v7` - `mcp_gmail_messages_v7` - Keeps indices focused and performant 3. **Or use filtered searches** with a `module` field: ```python client.scroll( scroll_filter=models.Filter( must=[ models.FieldCondition(key="module", match=models.MatchValue(value="gchat")), models.FieldCondition(key="name", match=models.MatchText(text="Button")), ] ) ) ``` --- ## Implementation Priority | Feature | Priority | Effort | Benefit | |---------|----------|--------|---------| | Phrase matching on `name` | High | Low | Critical for multi-word components | | Stemming on `nl_descriptions` | High | Low | Better NL query matching | | Stopwords on `docstring` | Medium | Low | Cleaner searches | | ASCII folding | Low | Low | Edge case handling | | Multi-module support | High | Medium | Required for Gmail, Sheets, etc. | --- ## Next Steps 1. Add text index creation to `scripts/initialize_v7_collection.py` 2. Test phrase search for component lookup 3. Benchmark stemmed vs non-stemmed NL queries 4. Plan multi-module collection strategy

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dipseth/google-workspace-unlimited'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

qdrant_text_indexing_strategy.md•7.52 KiB