search_within_kurum_yonetmelik
Search within specific Turkish institutional regulations using advanced query operators to find relevant articles without loading entire documents, returning only matching content sorted by relevance.
Instructions
Search for a keyword within a specific Institutional Regulation's articles with advanced query operators.
This tool is optimized for large regulations. Instead of loading the entire regulation into context, it:
Fetches the full content
Splits it into individual articles (madde)
Returns only the articles that match the search query
Sorts results by relevance score (based on match count)
Query Syntax (operators must be uppercase):
Simple keyword: kontrol
Exact phrase: "ihracat kontrol"
AND operator: nükleer AND ihracat (both terms must be present)
OR operator: denetim OR teftiş (at least one term must be present)
NOT operator: kontrol NOT iptal (first term present, second must not be)
Combinations: "ihracat kontrol" AND nükleer NOT silah
Returns formatted text with:
Article number and title
Relevance score (higher = more matches)
Full article content for matching articles
Example use cases:
Search for "nükleer" in Nuclear Export Regulation (42641)
Search for "disiplin AND ceza" in disciplinary regulations
Search for "görev OR yetki" in organizational regulations
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| mevzuat_no | Yes | The regulation number to search within (e.g., '42641', '42638', '42613') | |
| keyword | Yes | Search query supporting advanced operators: simple keyword ("kontrol"), exact phrase ("ihracat kontrol"), AND/OR/NOT operators (nükleer AND ihracat, denetim OR teftiş, kontrol NOT iptal). Operators must be uppercase. | |
| mevzuat_tertip | No | Regulation series from search results (e.g., '5') | 5 |
| case_sensitive | No | Whether to match case when searching (default: False) | |
| max_results | No | Maximum number of matching articles to return (1-50, default: 25) |
Implementation Reference
- mevzuat_mcp_server.py:1472-1561 (handler)The primary handler function for the 'search_within_kurum_yonetmelik' tool, registered via @app.tool(). Fetches full regulation content (mevzuat_tur=7 for Kurum Yönetmeliği), searches articles using imported helpers, and returns formatted results.async def search_within_kurum_yonetmelik( mevzuat_no: str = Field( ..., description="The regulation number to search within (e.g., '42641', '42638', '42613')" ), keyword: str = Field( ..., description='Search query supporting advanced operators: simple keyword ("kontrol"), exact phrase ("ihracat kontrol"), AND/OR/NOT operators (nükleer AND ihracat, denetim OR teftiş, kontrol NOT iptal). Operators must be uppercase.' ), mevzuat_tertip: str = Field( "5", description="Regulation series from search results (e.g., '5')" ), case_sensitive: bool = Field( False, description="Whether to match case when searching (default: False)" ), max_results: int = Field( 25, ge=1, le=50, description="Maximum number of matching articles to return (1-50, default: 25)" ) ) -> str: """ Search for a keyword within a specific Institutional Regulation's articles with advanced query operators. This tool is optimized for large regulations. Instead of loading the entire regulation into context, it: 1. Fetches the full content 2. Splits it into individual articles (madde) 3. Returns only the articles that match the search query 4. Sorts results by relevance score (based on match count) Query Syntax (operators must be uppercase): - Simple keyword: kontrol - Exact phrase: "ihracat kontrol" - AND operator: nükleer AND ihracat (both terms must be present) - OR operator: denetim OR teftiş (at least one term must be present) - NOT operator: kontrol NOT iptal (first term present, second must not be) - Combinations: "ihracat kontrol" AND nükleer NOT silah Returns formatted text with: - Article number and title - Relevance score (higher = more matches) - Full article content for matching articles Example use cases: - Search for "nükleer" in Nuclear Export Regulation (42641) - Search for "disiplin AND ceza" in disciplinary regulations - Search for "görev OR yetki" in organizational regulations """ logger.info(f"Tool 'search_within_kurum_yonetmelik' called: {mevzuat_no}, keyword: '{keyword}'") try: # Get full content content_result = await mevzuat_client.get_content( mevzuat_no=mevzuat_no, mevzuat_tur=7, # Kurum Yönetmeliği mevzuat_tertip=mevzuat_tertip ) if content_result.error_message: return f"Error fetching regulation content: {content_result.error_message}" # Search within articles matches = search_articles_by_keyword( markdown_content=content_result.markdown_content, keyword=keyword, case_sensitive=case_sensitive, max_results=max_results ) result = ArticleSearchResult( mevzuat_no=mevzuat_no, mevzuat_tur=7, keyword=keyword, total_matches=len(matches), matching_articles=matches ) if len(matches) == 0: return f"No articles found containing '{keyword}' in Kurum Yönetmeliği {mevzuat_no}" return format_search_results(result) except Exception as e: logger.exception(f"Error in tool 'search_within_kurum_yonetmelik' for {mevzuat_no}") return f"An unexpected error occurred while searching Kurum Yönetmeliği {mevzuat_no}: {str(e)}"
- article_search.py:176-252 (helper)Key helper function that parses markdown into articles, applies advanced query matching (AND/OR/NOT/exact phrases) using _matches_query, computes relevance scores, and returns top matching MaddeMatch objects.def search_articles_by_keyword( markdown_content: str, keyword: str, case_sensitive: bool = False, max_results: int = 50 ) -> List[MaddeMatch]: """ Search for keyword within articles with support for advanced operators. Query syntax: - Simple keyword: "yatırımcı" - Exact phrase: "mali sıkıntı" - AND operator: yatırımcı AND tazmin - OR operator: yatırımcı OR müşteri - NOT operator: yatırımcı NOT kurum - Combinations: "mali sıkıntı" AND yatırımcı NOT kurum Args: markdown_content: Full legislation content in markdown keyword: Search query with optional operators (AND, OR, NOT, "exact phrase") case_sensitive: Whether to match case max_results: Maximum number of matching articles to return Returns: List of matching articles sorted by relevance (score based on match count) """ articles = split_into_articles(markdown_content) matches = [] for article in articles: content = article['madde_content'] # Check if article matches query matches_query, score = _matches_query(content, keyword, case_sensitive) if matches_query and score > 0: # Generate preview (first occurrence of a search term) search_content = content if case_sensitive else content.lower() search_keyword = keyword if case_sensitive else keyword.lower() # Try to find first quoted phrase or first word preview_terms = re.findall(r'"([^"]*)"', search_keyword) if not preview_terms: # Use first word (excluding operators) words = re.split(r'\s+(?:AND|OR|NOT)\s+', search_keyword) preview_terms = [w.strip() for w in words if w.strip() and w.strip() not in ('AND', 'OR', 'NOT')] preview = "" if preview_terms: first_term = preview_terms[0] if case_sensitive else preview_terms[0].lower() if first_term in search_content: keyword_pos = search_content.find(first_term) start = max(0, keyword_pos - 100) end = min(len(content), keyword_pos + len(first_term) + 100) preview = content[start:end] if start > 0: preview = "..." + preview if end < len(content): preview = preview + "..." if not preview: preview = content[:200] + "..." matches.append(MaddeMatch( madde_no=article['madde_no'], madde_title=article['madde_title'], madde_content=content, match_count=score, preview=preview )) # Sort by score (most relevant first) matches.sort(key=lambda x: x.match_count, reverse=True) return matches[:max_results]
- article_search.py:19-26 (schema)Pydantic model used to structure the search results before formatting, containing the list of matching articles.class ArticleSearchResult(BaseModel): """Search results within a legislation.""" mevzuat_no: str mevzuat_tur: int keyword: str total_matches: int matching_articles: List[MaddeMatch]
- article_search.py:254-272 (helper)Helper function that formats the ArticleSearchResult into a human-readable string output returned by the tool.def format_search_results(result: ArticleSearchResult) -> str: """Format search results as readable text.""" output = [] output.append(f"Keyword: '{result.keyword}'") output.append(f"Total matching articles: {result.total_matches}") output.append("") for i, match in enumerate(result.matching_articles, 1): output.append(f"=== MADDE {match.madde_no} ===") if match.madde_title: output.append(f"Title: {match.madde_title}") output.append(f"Matches: {match.match_count}") output.append("") output.append("Full content:") output.append(match.madde_content) output.append("") return "\n".join(output)
- mevzuat_client.py:557-725 (helper)Supporting method in MevzuatApiClientNew that retrieves the full markdown content of the regulation (used with mevzuat_tur=7 for Kurum Yönetmeliği), handling downloads and conversions.async def get_content( self, mevzuat_no: str, mevzuat_tur: int = 1, mevzuat_tertip: str = "3", resmi_gazete_tarihi: Optional[str] = None ) -> MevzuatArticleContent: """ Download and extract content from legislation. Tries HTML scraping first (most reliable), then falls back to file downloads. For Presidential Decisions (tur=20) and Circulars (tur=22), skip HTML and go directly to PDF. Args: mevzuat_no: Legislation number mevzuat_tur: Legislation type code (1=Kanun, 20=CB Kararı, 22=CB Genelgesi, etc.) mevzuat_tertip: Series number resmi_gazete_tarihi: Official Gazette date (DD/MM/YYYY) - required for CB Genelgesi (tur=22) """ # CB Kararları (tur=20) and CB Genelgesi (tur=22) are PDF-only, skip HTML scraping if mevzuat_tur not in [20, 22]: # Try HTML scraping first (most reliable method for other types) result = await self.get_content_from_html(mevzuat_no, mevzuat_tur, mevzuat_tertip) if result.markdown_content: return result logger.info(f"HTML scraping returned no content for {mevzuat_no}, trying file downloads") else: if mevzuat_tur == 20: logger.info("CB Kararı detected (tur=20), skipping HTML scraping, going directly to PDF") elif mevzuat_tur == 22: logger.info("CB Genelgesi detected (tur=22), skipping HTML scraping, going directly to PDF") cache_key = f"doc:{mevzuat_tur}.{mevzuat_tertip}.{mevzuat_no}" if self._cache_enabled else None if cache_key and self._cache: cached_content = self._cache.get(cache_key) if cached_content: logger.debug(f"Cache hit: {mevzuat_no}") return MevzuatArticleContent( madde_id=mevzuat_no, mevzuat_id=mevzuat_no, markdown_content=cached_content ) # Construct URLs based on mevzuat type if mevzuat_tur == 22: # CB Genelgesi - special PDF URL format if not resmi_gazete_tarihi: return MevzuatArticleContent( madde_id=mevzuat_no, mevzuat_id=mevzuat_no, markdown_content="", error_message="resmi_gazete_tarihi is required for CB Genelgesi (tur=22)" ) # Convert DD/MM/YYYY to YYYYMMDD parts = resmi_gazete_tarihi.split('/') if len(parts) == 3: date_str = f"{parts[2]}{parts[1].zfill(2)}{parts[0].zfill(2)}" else: return MevzuatArticleContent( madde_id=mevzuat_no, mevzuat_id=mevzuat_no, markdown_content="", error_message=f"Invalid date format: {resmi_gazete_tarihi}. Expected DD/MM/YYYY" ) pdf_url = self.GENELGE_PDF_URL_TEMPLATE.format(date=date_str, no=mevzuat_no) doc_url = None # Genelge has no DOC version elif mevzuat_tur == 20: # CB Kararı - PDF only, no DOC doc_url = None # CB Kararları have no DOC version pdf_url = self.PDF_URL_TEMPLATE.format(tur=mevzuat_tur, tertip=mevzuat_tertip, no=mevzuat_no) else: doc_url = self.DOC_URL_TEMPLATE.format(tur=mevzuat_tur, tertip=mevzuat_tertip, no=mevzuat_no) pdf_url = self.PDF_URL_TEMPLATE.format(tur=mevzuat_tur, tertip=mevzuat_tertip, no=mevzuat_no) # Try DOC first (skip for CB Genelgesi and CB Kararı which have no DOC version) if doc_url: try: logger.info(f"Trying DOC: {doc_url}") # Use separate headers for document download doc_headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36', 'Accept': 'application/msword, */*', } response = await self._http_client.get(doc_url, headers=doc_headers) response.raise_for_status() doc_bytes = response.content logger.info(f"Downloaded DOC: {len(doc_bytes)} bytes") # DOC files from mevzuat.gov.tr are actually HTML if len(doc_bytes) < 100: logger.warning(f"DOC file too small ({len(doc_bytes)} bytes), likely empty") raise Exception("DOC file is empty or too small") doc_stream = io.BytesIO(doc_bytes) result = self._md_converter.convert_stream(doc_stream, file_extension=".doc") markdown_content = result.text_content.strip() if result and result.text_content else "" if markdown_content: logger.info(f"DOC conversion successful for {mevzuat_no}") if cache_key and self._cache: self._cache.put(cache_key, markdown_content) return MevzuatArticleContent( madde_id=mevzuat_no, mevzuat_id=mevzuat_no, markdown_content=markdown_content ) except Exception as e: logger.info(f"DOC failed, trying PDF: {e}") # Try PDF fallback try: logger.info(f"Trying PDF: {pdf_url}") # For CB Kararı (tur=20) and CB Genelgesi (tur=22), ensure we have session cookies if mevzuat_tur in [20, 22]: await self._ensure_session() # Create temporary client with cookies to avoid deprecation warning async with httpx.AsyncClient( headers=self.HEADERS, cookies=self._cookies, timeout=self._http_client.timeout, follow_redirects=True ) as temp_client: response = await temp_client.get(pdf_url) else: response = await self._http_client.get(pdf_url) response.raise_for_status() pdf_bytes = response.content markdown_content = "" # For CB Kararı (tur=20) and CB Genelgesi (tur=22), use Mistral OCR (handles images + text) if mevzuat_tur in [20, 22] and self._mistral_client: doc_type = "CB Kararı" if mevzuat_tur == 20 else "CB Genelgesi" logger.info(f"Using Mistral OCR for {doc_type} PDF") markdown_content = await self._ocr_pdf_with_mistral(pdf_bytes, pdf_url) # Fallback to markitdown if OCR fails if not markdown_content: logger.warning("Mistral OCR failed, falling back to markitdown") pdf_stream = io.BytesIO(pdf_bytes) result = self._md_converter.convert_stream(pdf_stream, file_extension=".pdf") markdown_content = result.text_content.strip() if result and result.text_content else "" else: # Use markitdown for other types pdf_stream = io.BytesIO(pdf_bytes) result = self._md_converter.convert_stream(pdf_stream, file_extension=".pdf") markdown_content = result.text_content.strip() if result and result.text_content else "" if markdown_content: logger.info(f"PDF conversion successful for {mevzuat_no}") if cache_key and self._cache: self._cache.put(cache_key, markdown_content) return MevzuatArticleContent( madde_id=mevzuat_no, mevzuat_id=mevzuat_no, markdown_content=markdown_content ) except Exception as e: logger.error(f"PDF also failed: {e}") return MevzuatArticleContent( madde_id=mevzuat_no, mevzuat_id=mevzuat_no, markdown_content="", error_message=f"Both DOC and PDF download/conversion failed for {mevzuat_tur}.{mevzuat_tertip}.{mevzuat_no}" )