Skip to main content
Glama
googleSandy

Google Threat Intelligence MCP Server

by googleSandy

search_digital_threat_monitoring

Search historical documents from surface, deep, and dark web sources using Lucene syntax to find threats, leaks, and malicious content.

Instructions

Search for historical data in Digital Threat Monitoring (DTM) using Lucene syntax.

Digital theat monitoring is a collection of documents from surface, deep, and dark web sources.

To filter by document type or threat type, include the conditions within the query string using the fields __type and label_threat, respectively. Combine multiple conditions using Lucene boolean operators (AND, OR, NOT).

Examples of filtering in the query:

  • Single document type: (__type:forum_post) AND (body:security)

  • Multiple document types: (__type:(forum_post OR paste)) AND (body:security)

  • Single threat type: (label_threat:information-security/malware) AND (body:exploit)

  • Multiple threat types: (label_threat:(information-security/malware OR information-security/phishing)) AND (body:exploit)

  • Combined: (__type:document_analysis) AND (label_threat:information-security/information-leak/credentials) AND (body:password)

Important Considerations for Effective Querying:

  • Date/Time Filtering (since and until):

  • Input parameters since and until filter documents by their creation/modification time.

  • These must be strings in RFC3339 format, specifically ending with 'Z' to denote UTC.

  • Example: '2025-04-23T00:00:00Z'

  • Pagination for More Than 25 Results:

    • A single API call returns at most size results (maximum 25).

    • To retrieve more results, you must paginate:

      1. Make your initial search request.

      2. The response dictionary will contain a key named page.

      3. If this page key holds a non-empty string value, there are more results available.

      4. To fetch the next page, make a subsequent API call. This call MUST include the exact same parameters as your original request (query, size, since, until, doc_type, etc.), PLUS the page parameter set to the token value received in the previous response's page field.

      5. Continue this process, using the new page token from each response, until the page field is absent or empty in the response, indicating the end of the results.

Tokenization:

  • DTM breaks documents into tokens.

  • Example: "some-domain.com" -> "some", "domain", "com".

  • Wildcard/Regex queries match single tokens, not phrases.

Special Characters:

  • Escape with : + - & | ! ( ) { } [ ] ^ " ~ * ? : / and space.

  • Example: To find "(1+1):2", query (1+1):2

Case Sensitivity:

  • DTM entity values are often lowercased.

  • Boolean operators (AND, OR, NOT) MUST be UPPERCASE.

Domain Search Nuances:

  • Use wildcards/regex on fields like doc.domain.

  • Example: doc.domain:google.*.dev

  • Avoid pattern searches on group_network.

Performance Limit:

  • Searches timeout after 60 seconds.

  • For broad or complex queries, it is highly recommended to use the since and until parameters to add time delimiters. This narrows the search scope and helps prevent timeouts.

Noise Reduction:

  • Use typed entities for higher precision.

  • Example: organization:"Acme Corp"

  • Prefer typed entities over free text searches.

The following fields and their meanings can be used to compose a query using Lucene syntax (including combining them with AND, OR, and NOT operators along with parentheses):

  • author.identity.name - The handle used by the forum post author

  • subject - The subject line of the forum post

  • body - The body text of the content

  • inet_location.url - What URL content was found

  • language - The content language

  • title - The title of the web page

  • channel.name - The Telegram channel name

  • domain - A DNS domain name

  • cve - A CVE entry by ID

__type: one of the following

  • web_content_publish - General website content

  • domain_discovery - Newly discovered domain names

  • forum_post - Darkweb forum posts

  • message - Chat messages like Telegram

  • paste - Paste site content like Pastebin

  • shop_listing - Items for sale on the dark web

  • email_analysis - Suspicious emails

  • tweet - Tweets from Twitter on cybersecurity topics.

  • document_analysis - Documents (PDF, Office, text) from VirusTotal, including malicious and corporate confidential files.

label_threat: one of the following

  • information-security/anonymization - Anonymization

  • information-security/apt - Advanced Persistent Threat

  • information-security/botnet - Botnet

  • information-security/compromised - Compromised Infrastructure

  • information-security/doxing - Personal Information Disclosure

  • information-security/exploit - Exploits

  • information-security/phishing - Phishing

  • information-security/information-leak - Information Leak

  • information-security/information-leak/confidential - Confidential Information Leak

  • information-security/information-leak/credentials - Credential Leak

  • information-security/information-leak/payment-cards - Credit Card Leak

  • information-security/malicious-activity - Malicious Activity

  • information-security/malicious-infrastructure - Malicious Infrastructure

  • information-security/malware - Malware

  • information-security/malware/ransomware - Ransomware

  • information-security/malware/ransomware-victim-listing - Ransomware Victim Listing

  • information-security/security-research - Security Research

  • information-security/spam - Spam

Args: query (required): The Lucene-like query string for your document search. size (optional): The number of results to return in each page (0 to 25). Defaults to 10. since (optional): The timestamp to search for documents since (RFC3339 format). until (optional): The timestamp to search for documents from (RFC3339 format). page (optional): The page ID to fetch the page for. This is only used when paginating through pages greater than the first page of results. truncate (optional): The number of characters (as a string) to truncate all documents fields in the response (e.g., '500'). sanitize (optional): If true (default), any HTML content in the document fields are sanitized to remove links, scripts, etc.

Returns: A dictionary containing the list of documents found and search metadata.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
sizeNo
sinceNo
untilNo
pageNo
truncateNo
sanitizeNo
api_keyNo

Implementation Reference

  • The tool is registered with FastMCP via the @server.tool() decorator using the function name (converted to snake_case by MCP).
    @server.tool()
  • The async function search_digital_threat_monitoring implements the full tool logic: builds params, POSTs to /dtm/docs/search with a Lucene query, handles timeouts/errors, sanitizes responses, and manages pagination via the link header.
    async def search_digital_threat_monitoring(
        query: str,
        ctx: Context,
        size: int = 10,
        since: str = None,
        until: str = None,
        page: str = None,
        truncate: str = None,
        sanitize: bool = True,
        api_key: str = None,
    ) -> dict:
      """Search for historical data in Digital Threat Monitoring (DTM) using Lucene syntax.
    
      Digital theat monitoring is a collection of documents from surface, deep, and dark web sources.
    
      To filter by document type or threat type, include the conditions within the `query` string
      using the fields `__type` and `label_threat`, respectively. Combine multiple conditions
      using Lucene boolean operators (AND, OR, NOT).
    
      Examples of filtering in the query:
      - Single document type: `(__type:forum_post) AND (body:security)`
      - Multiple document types: `(__type:(forum_post OR paste)) AND (body:security)`
      - Single threat type: `(label_threat:information-security/malware) AND (body:exploit)`
      - Multiple threat types: `(label_threat:(information-security/malware OR information-security/phishing)) AND (body:exploit)`
      - Combined: `(__type:document_analysis) AND (label_threat:information-security/information-leak/credentials) AND (body:password)`
    
      Important Considerations for Effective Querying:
    
      -   **Date/Time Filtering (`since` and `until`)**:
        -   Input parameters `since` and `until` filter documents by their creation/modification time.
        -   These must be strings in RFC3339 format, specifically ending with 'Z' to denote UTC.
        -   Example: `'2025-04-23T00:00:00Z'`
    
      -   **Pagination for More Than 25 Results**:
          -   A single API call returns at most `size` results (maximum 25).
          -   To retrieve more results, you must paginate:
              1.  Make your initial search request.
              2.  The response dictionary will contain a key named `page`.
              3.  If this `page` key holds a non-empty string value, there are more results available.
              4.  To fetch the next page, make a subsequent API call. This call MUST include the *exact same parameters* as your original request (query, size, since, until, doc_type, etc.), PLUS the `page` parameter set to the token value received in the previous response's `page` field.
              5.  Continue this process, using the new `page` token from each response, until the `page` field is absent or empty in the response, indicating the end of the results.
    
      Tokenization:
      - DTM breaks documents into tokens.
      - Example: "some-domain.com" -> "some", "domain", "com".
      - Wildcard/Regex queries match single tokens, not phrases.
    
      Special Characters:
      - Escape with \: ` + - & | ! ( ) { } [ ] ^ " ~ * ? : / ` and space.
      - Example: To find "(1+1):2", query \(1\+1\)\:2
    
      Case Sensitivity:
      - DTM entity values are often lowercased.
      - Boolean operators (AND, OR, NOT) MUST be UPPERCASE.
    
      Domain Search Nuances:
      - Use wildcards/regex on fields like `doc.domain`.
      - Example: doc.domain:google.*.dev
      - Avoid pattern searches on `group_network`.
    
      Performance Limit:
      - Searches timeout after 60 seconds.
      - For broad or complex queries, it is highly recommended to use the `since` and `until` parameters to add time delimiters. This narrows the search scope and helps prevent timeouts.
    
      Noise Reduction:
      - Use typed entities for higher precision.
      - Example: organization:"Acme Corp"
      - Prefer typed entities over free text searches.
      
      The following fields and their meanings can be used to compose a query using Lucene syntax (including combining them with AND, OR, and NOT operators along with parentheses):
      * author.identity.name - The handle used by the forum post author
      * subject - The subject line of the forum post
      * body - The body text of the content
      * inet_location.url - What URL content was found 
      * language - The content language
      * title - The title of the web page 
      * channel.name - The Telegram channel name
      * domain - A DNS domain name
      * cve - A CVE entry by ID 
    
      __type: one of the following
      * web_content_publish - General website content
      * domain_discovery - Newly discovered domain names
      * forum_post - Darkweb forum posts
      * message - Chat messages like Telegram
      * paste - Paste site content like Pastebin
      * shop_listing - Items for sale on the dark web
      * email_analysis - Suspicious emails
      * tweet - Tweets from Twitter on cybersecurity topics.
      * document_analysis - Documents (PDF, Office, text) from VirusTotal, including malicious and corporate confidential files.
    
      label_threat: one of the following
      * information-security/anonymization - Anonymization
      * information-security/apt - Advanced Persistent Threat
      * information-security/botnet - Botnet
      * information-security/compromised - Compromised Infrastructure
      * information-security/doxing - Personal Information Disclosure
      * information-security/exploit - Exploits
      * information-security/phishing - Phishing
      * information-security/information-leak - Information Leak
      * information-security/information-leak/confidential - Confidential Information Leak
      * information-security/information-leak/credentials - Credential Leak
      * information-security/information-leak/payment-cards - Credit Card Leak
      * information-security/malicious-activity - Malicious Activity
      * information-security/malicious-infrastructure - Malicious Infrastructure
      * information-security/malware - Malware
      * information-security/malware/ransomware - Ransomware
      * information-security/malware/ransomware-victim-listing - Ransomware Victim Listing
      * information-security/security-research - Security Research
      * information-security/spam - Spam
    
      Args:
        query (required): The Lucene-like query string for your document search.
        size (optional): The number of results to return in each page (0 to 25). Defaults to 10.
        since (optional): The timestamp to search for documents since (RFC3339 format).
        until (optional): The timestamp to search for documents from (RFC3339 format).
        page (optional): The page ID to fetch the page for. This is only used when paginating through pages greater than the first page of results.
        truncate (optional): The number of characters (as a string) to truncate all documents fields in the response (e.g., '500').
        sanitize (optional): If true (default), any HTML content in the document fields are sanitized to remove links, scripts, etc.
    
      Returns:
        A dictionary containing the list of documents found and search metadata.
      """
      async with vt_client(ctx, api_key=api_key) as client:
        params = {
            "size": size,
            "since": since,
            "until": until,
            "page": page,
            "truncate": truncate,
            "sanitize": str(sanitize).lower(),
        }
        params = {k: v for k, v in params.items() if v is not None}
        path = f"/dtm/docs/search?{urllib.parse.urlencode(params)}"
    
        try:
          res = await client.post_async(
              path=path, json_data={"query": query}
          )
    
          if "text/html" in res.headers.get("Content-Type", ""):
            response_text = await res.text_async()
            if "request timed out" in response_text.lower():
              return {"error": "The request timed out. Please try reducing the scope of your query by using `since` and `until` parameters to add time delimiters"}
            logging.error(response_text)
            return {"error": f"API returned an HTML error page instead of JSON: {response_text}"}
          
          res_json = await res.json_async()
        except (asyncio.TimeoutError, TimeoutError): # Catch both
          return {"error": "The request timed out. Please try reducing the scope of your query by using `since` and `until` parameters to add time delimiters"}
        except json.JSONDecodeError as json_error:
          logging.error(f"Failed to parse JSON response: {json_error}")
          return {"error": f"Failed to parse server response: {json_error}."}
        except Exception as e:
          logging.error(f"An unexpected error occurred: {e} (type: {type(e)})")
          return {"error": f"An unexpected error occurred: {e}"}
    
        # Remove unnecessary information
        if "docs" in res_json:
          for i in range(len(res_json["docs"])):
            res_json["docs"][i].pop("__meta", None)
            res_json["docs"][i].pop("entities", None)
           
        link_header = res.headers.get("link")
        if link_header and 'rel="next"' in link_header:
            try:
                url_part = link_header.split(';')[0].strip().strip('<>')
                query_string = urllib.parse.urlparse(url_part).query
                next_page = urllib.parse.parse_qs(query_string).get('page', [None])[0]
                if next_page:
                    res_json["page"] = next_page
            except (IndexError, AttributeError):
                # Could not parse link header, proceed without it
                pass
    
        return utils.sanitize_response(res_json)
  • The function signature defines the input schema: query (str), ctx (Context), size (int, default 10), since, until, page, truncate (optional str), and sanitize (bool, default True). The return type is dict.
    async def search_digital_threat_monitoring(
        query: str,
        ctx: Context,
        size: int = 10,
        since: str = None,
        until: str = None,
        page: str = None,
        truncate: str = None,
        sanitize: bool = True,
        api_key: str = None,
    ) -> dict:
      """Search for historical data in Digital Threat Monitoring (DTM) using Lucene syntax.
    
      Digital theat monitoring is a collection of documents from surface, deep, and dark web sources.
    
      To filter by document type or threat type, include the conditions within the `query` string
      using the fields `__type` and `label_threat`, respectively. Combine multiple conditions
      using Lucene boolean operators (AND, OR, NOT).
    
      Examples of filtering in the query:
      - Single document type: `(__type:forum_post) AND (body:security)`
      - Multiple document types: `(__type:(forum_post OR paste)) AND (body:security)`
      - Single threat type: `(label_threat:information-security/malware) AND (body:exploit)`
      - Multiple threat types: `(label_threat:(information-security/malware OR information-security/phishing)) AND (body:exploit)`
      - Combined: `(__type:document_analysis) AND (label_threat:information-security/information-leak/credentials) AND (body:password)`
    
      Important Considerations for Effective Querying:
    
      -   **Date/Time Filtering (`since` and `until`)**:
        -   Input parameters `since` and `until` filter documents by their creation/modification time.
        -   These must be strings in RFC3339 format, specifically ending with 'Z' to denote UTC.
        -   Example: `'2025-04-23T00:00:00Z'`
    
      -   **Pagination for More Than 25 Results**:
          -   A single API call returns at most `size` results (maximum 25).
          -   To retrieve more results, you must paginate:
              1.  Make your initial search request.
              2.  The response dictionary will contain a key named `page`.
              3.  If this `page` key holds a non-empty string value, there are more results available.
              4.  To fetch the next page, make a subsequent API call. This call MUST include the *exact same parameters* as your original request (query, size, since, until, doc_type, etc.), PLUS the `page` parameter set to the token value received in the previous response's `page` field.
              5.  Continue this process, using the new `page` token from each response, until the `page` field is absent or empty in the response, indicating the end of the results.
    
      Tokenization:
      - DTM breaks documents into tokens.
      - Example: "some-domain.com" -> "some", "domain", "com".
      - Wildcard/Regex queries match single tokens, not phrases.
    
      Special Characters:
      - Escape with \: ` + - & | ! ( ) { } [ ] ^ " ~ * ? : / ` and space.
      - Example: To find "(1+1):2", query \(1\+1\)\:2
    
      Case Sensitivity:
      - DTM entity values are often lowercased.
      - Boolean operators (AND, OR, NOT) MUST be UPPERCASE.
    
      Domain Search Nuances:
      - Use wildcards/regex on fields like `doc.domain`.
      - Example: doc.domain:google.*.dev
      - Avoid pattern searches on `group_network`.
    
      Performance Limit:
      - Searches timeout after 60 seconds.
      - For broad or complex queries, it is highly recommended to use the `since` and `until` parameters to add time delimiters. This narrows the search scope and helps prevent timeouts.
    
      Noise Reduction:
      - Use typed entities for higher precision.
      - Example: organization:"Acme Corp"
      - Prefer typed entities over free text searches.
      
      The following fields and their meanings can be used to compose a query using Lucene syntax (including combining them with AND, OR, and NOT operators along with parentheses):
      * author.identity.name - The handle used by the forum post author
      * subject - The subject line of the forum post
      * body - The body text of the content
      * inet_location.url - What URL content was found 
      * language - The content language
      * title - The title of the web page 
      * channel.name - The Telegram channel name
      * domain - A DNS domain name
      * cve - A CVE entry by ID 
    
      __type: one of the following
      * web_content_publish - General website content
      * domain_discovery - Newly discovered domain names
      * forum_post - Darkweb forum posts
      * message - Chat messages like Telegram
      * paste - Paste site content like Pastebin
      * shop_listing - Items for sale on the dark web
      * email_analysis - Suspicious emails
      * tweet - Tweets from Twitter on cybersecurity topics.
      * document_analysis - Documents (PDF, Office, text) from VirusTotal, including malicious and corporate confidential files.
    
      label_threat: one of the following
      * information-security/anonymization - Anonymization
      * information-security/apt - Advanced Persistent Threat
      * information-security/botnet - Botnet
      * information-security/compromised - Compromised Infrastructure
      * information-security/doxing - Personal Information Disclosure
      * information-security/exploit - Exploits
      * information-security/phishing - Phishing
      * information-security/information-leak - Information Leak
      * information-security/information-leak/confidential - Confidential Information Leak
      * information-security/information-leak/credentials - Credential Leak
      * information-security/information-leak/payment-cards - Credit Card Leak
      * information-security/malicious-activity - Malicious Activity
      * information-security/malicious-infrastructure - Malicious Infrastructure
      * information-security/malware - Malware
      * information-security/malware/ransomware - Ransomware
      * information-security/malware/ransomware-victim-listing - Ransomware Victim Listing
      * information-security/security-research - Security Research
      * information-security/spam - Spam
    
      Args:
        query (required): The Lucene-like query string for your document search.
        size (optional): The number of results to return in each page (0 to 25). Defaults to 10.
        since (optional): The timestamp to search for documents since (RFC3339 format).
        until (optional): The timestamp to search for documents from (RFC3339 format).
        page (optional): The page ID to fetch the page for. This is only used when paginating through pages greater than the first page of results.
        truncate (optional): The number of characters (as a string) to truncate all documents fields in the response (e.g., '500').
        sanitize (optional): If true (default), any HTML content in the document fields are sanitized to remove links, scripts, etc.
    
      Returns:
        A dictionary containing the list of documents found and search metadata.
      """
  • The sanitize_response helper is called at the end of the handler to recursively remove empty values from the response dict.
    def sanitize_response(data: typing.Any) -> typing.Any:
      """Removes empty dictionaries and lists recursively from a response."""
      if isinstance(data, dict):
        sanitized_dict = {}
        for key, value in data.items():
          sanitized_value = sanitize_response(value)
          if sanitized_value is not None:
            sanitized_dict[key] = sanitized_value
        return sanitized_dict
      elif isinstance(data, list):
        sanitized_list = []
        for item in data:
          sanitized_item = sanitize_response(item)
          if sanitized_item is not None:
            sanitized_list.append(sanitized_item)
        return sanitized_list
      elif isinstance(data, str):
        return data if data else None
      else:
        return data
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses tokenization, special character escaping, case sensitivity, domain search nuances, timeout limits, and noise reduction strategies.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-structured with sections for filtering, pagination, tokenization, etc. Could be slightly more concise, but the detail justifies the length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all aspects of a complex search tool with 8 parameters, no output schema, and no annotations. Includes pagination, date formatting, and performance tips, making it self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, but the description compensates by explaining all parameters (query syntax, size, since/until format, page token, truncate, sanitize). Provides examples and defaults.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches historical data in Digital Threat Monitoring using Lucene syntax. It distinguishes from sibling tools that focus on other data types like threat reports or IOCs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides extensive guidance on when to use, including filtering by document/threat type, pagination, and performance tips. Lacks explicit 'when not to use' alternatives, but context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/googleSandy/gti-mcp-standalone'

If you have feedback or need assistance with the MCP directory API, please join our Discord server