Skip to main content
Glama
jkingsman

https://github.com/jkingsman/qanon-mcp-server

word_cloud_by_post_ids

Analyze word frequency in QAnon posts by ID range to identify common terms and themes for sociological research.

Instructions

Generate a word cloud analysis showing the most common words used in posts within a specified ID range.

Args:
    start_id: Starting post ID
    end_id: Ending post ID
    min_word_length: Minimum length of words to include (default: 3)
    max_words: Maximum number of words to return (default: 100)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
start_idYes
end_idYes
min_word_lengthNo
max_wordsNo

Implementation Reference

  • The primary handler function for the 'word_cloud_by_post_ids' tool. It selects posts within the specified ID range, extracts their text, generates a word cloud using the helper function, and returns a formatted analysis including date range and post count. The @mcp.tool() decorator registers this function as an MCP tool.
    @mcp.tool()
    def word_cloud_by_post_ids(
        start_id: int, end_id: int, min_word_length: int = 3, max_words: int = 100
    ) -> str:
        """
        Generate a word cloud analysis showing the most common words used in posts within a specified ID range.
    
        Args:
            start_id: Starting post ID
            end_id: Ending post ID
            min_word_length: Minimum length of words to include (default: 3)
            max_words: Maximum number of words to return (default: 100)
        """
        if start_id > end_id:
            return "Error: start_id must be less than or equal to end_id."
    
        # Collect posts within the ID range
        selected_posts = []
        for post in posts:
            post_id = post.get("post_metadata", {}).get("id", 0)
            if start_id <= post_id <= end_id:
                selected_posts.append(post)
    
        if not selected_posts:
            return f"No posts found with IDs between {start_id} and {end_id}."
    
        # Extract post texts
        post_texts = [post.get("text", "") for post in selected_posts]
    
        # Generate word cloud
        cloud = generate_word_cloud(post_texts, min_word_length, max_words)
    
        # Add additional information
        earliest_id = min(
            post.get("post_metadata", {}).get("id", 0) for post in selected_posts
        )
        latest_id = max(
            post.get("post_metadata", {}).get("id", 0) for post in selected_posts
        )
    
        earliest_date = min(
            post.get("post_metadata", {}).get("time", 0) for post in selected_posts
        )
        latest_date = max(
            post.get("post_metadata", {}).get("time", 0) for post in selected_posts
        )
    
        earliest_date_str = (
            datetime.fromtimestamp(earliest_date).strftime("%Y-%m-%d")
            if earliest_date
            else "Unknown"
        )
        latest_date_str = (
            datetime.fromtimestamp(latest_date).strftime("%Y-%m-%d")
            if latest_date
            else "Unknown"
        )
    
        result = f"Word Cloud Analysis for Post IDs {earliest_id} to {latest_id}\n"
        result += f"Date Range: {earliest_date_str} to {latest_date_str}\n"
        result += f"Total Posts Analyzed: {len(selected_posts)}\n\n"
        result += cloud
    
        return result
  • Helper utility function that processes a list of post texts to generate word frequency statistics, excluding stopwords and short words, and formats the output as a textual word cloud with frequency bars and percentages.
    def generate_word_cloud(
        post_texts: List[str], min_word_length: int = 3, max_words: int = 100
    ) -> str:
        """
        Generate a word cloud analysis from a list of post texts.
    
        Args:
            post_texts: List of text content from posts
            min_word_length: Minimum length of words to include (default: 3)
            max_words: Maximum number of words to return (default: 100)
    
        Returns:
            Formatted string with word frequency analysis
        """
        # Common words to exclude (stopwords)
        stopwords = {
            "the",
            "and",
            "a",
            "to",
            "of",
            "in",
            "is",
            "that",
            "for",
            "on",
            "with",
            "as",
            "by",
            "at",
            "from",
            "be",
            "this",
            "was",
            "are",
            "an",
            "it",
            "not",
            "or",
            "have",
            "has",
            "had",
            "but",
            "what",
            "all",
            "were",
            "when",
            "there",
            "can",
            "been",
            "one",
            "do",
            "did",
            "who",
            "you",
            "your",
            "they",
            "their",
            "them",
            "will",
            "would",
            "could",
            "should",
            "which",
            "his",
            "her",
            "she",
            "he",
            "we",
            "our",
            "us",
            "i",
            "me",
            "my",
            "im",
            "ive",
            "myself",
            "its",
            "it's",
            "about",
            "some",
            "then",
            "than",
            "into",
        }
    
        # Combine all texts and replace literal \n with actual newlines
        combined_text = " ".join([text.replace("\\n", " ") for text in post_texts if text])
    
        # Remove URLs
        combined_text = re.sub(r"https?://\S+", "", combined_text)
    
        # Remove special characters and convert to lowercase
        combined_text = re.sub(r"[^\w\s]", " ", combined_text.lower())
    
        # Split into words and count frequencies
        words = combined_text.split()
    
        # Filter out stopwords and short words
        filtered_words = [
            word for word in words if word not in stopwords and len(word) >= min_word_length
        ]
    
        # Count word frequencies
        word_counts = Counter(filtered_words)
    
        # Get the most common words
        most_common = word_counts.most_common(max_words)
    
        # Format the result
        if not most_common:
            return "No significant words found in the selected posts."
    
        total_words = sum(count for _, count in most_common)
    
        result = f"Word Cloud Analysis (top {len(most_common)} words from {total_words} total filtered words):\n\n"
    
        # Calculate the maximum frequency for scaling
        max_freq = most_common[0][1]
    
        # Create a visual representation of word frequencies
        for word, count in most_common:
            # Calculate percentage of total
            percentage = (count / total_words) * 100
            # Scale the bar length
            bar_length = int((count / max_freq) * 30)
            bar = "█" * bar_length
            result += f"{word}: {count} ({percentage:.1f}%) {bar}\n"
    
        return result
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions 'Generate a word cloud analysis' but doesn't disclose behavioral traits such as whether this is a read-only operation, potential performance impacts, rate limits, or what the output format looks like (e.g., image, text list). For a tool with no annotations, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose and then detailing parameters. Every sentence earns its place, but it could be slightly more concise by integrating the parameter explanations more seamlessly rather than as a separate 'Args' block.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and 4 parameters, the description is partially complete. It covers the purpose and parameters well but lacks information on behavioral aspects and output format. For a tool of this complexity, it should do more to compensate for the missing structured data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description includes an 'Args' section that explains all four parameters with clear semantics, such as 'Starting post ID' and 'Minimum length of words to include.' Since schema description coverage is 0%, this fully compensates by adding meaning beyond the bare schema, making the parameters well-understood.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Generate a word cloud analysis') and resource ('posts within a specified ID range'), distinguishing it from sibling tools like 'word_cloud_by_date_range' which uses date ranges instead of ID ranges. The verb+resource combination is precise and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by specifying 'posts within a specified ID range,' suggesting this tool is for analyzing posts by their IDs rather than by date, author, or other criteria. However, it doesn't explicitly state when to use this vs. alternatives like 'word_cloud_by_date_range' or when not to use it, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jkingsman/qanon-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server