word_cloud_by_post_ids
Generate word clouds to visualize the most frequent words in posts within a specified ID range, enabling analysis of common themes and language patterns.
Instructions
Generate a word cloud analysis showing the most common words used in posts within a specified ID range.
Args:
start_id: Starting post ID
end_id: Ending post ID
min_word_length: Minimum length of words to include (default: 3)
max_words: Maximum number of words to return (default: 100)
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| end_id | Yes | ||
| max_words | No | ||
| min_word_length | No | ||
| start_id | Yes |
Implementation Reference
- src/qanon_mcp/__init__.py:757-820 (handler)The main handler function for the 'word_cloud_by_post_ids' tool, decorated with @mcp.tool() for automatic registration. It collects posts by ID range, extracts texts, calls the helper to generate word cloud, and formats the output with stats.@mcp.tool() def word_cloud_by_post_ids( start_id: int, end_id: int, min_word_length: int = 3, max_words: int = 100 ) -> str: """ Generate a word cloud analysis showing the most common words used in posts within a specified ID range. Args: start_id: Starting post ID end_id: Ending post ID min_word_length: Minimum length of words to include (default: 3) max_words: Maximum number of words to return (default: 100) """ if start_id > end_id: return "Error: start_id must be less than or equal to end_id." # Collect posts within the ID range selected_posts = [] for post in posts: post_id = post.get("post_metadata", {}).get("id", 0) if start_id <= post_id <= end_id: selected_posts.append(post) if not selected_posts: return f"No posts found with IDs between {start_id} and {end_id}." # Extract post texts post_texts = [post.get("text", "") for post in selected_posts] # Generate word cloud cloud = generate_word_cloud(post_texts, min_word_length, max_words) # Add additional information earliest_id = min( post.get("post_metadata", {}).get("id", 0) for post in selected_posts ) latest_id = max( post.get("post_metadata", {}).get("id", 0) for post in selected_posts ) earliest_date = min( post.get("post_metadata", {}).get("time", 0) for post in selected_posts ) latest_date = max( post.get("post_metadata", {}).get("time", 0) for post in selected_posts ) earliest_date_str = ( datetime.fromtimestamp(earliest_date).strftime("%Y-%m-%d") if earliest_date else "Unknown" ) latest_date_str = ( datetime.fromtimestamp(latest_date).strftime("%Y-%m-%d") if latest_date else "Unknown" ) result = f"Word Cloud Analysis for Post IDs {earliest_id} to {latest_id}\n" result += f"Date Range: {earliest_date_str} to {latest_date_str}\n" result += f"Total Posts Analyzed: {len(selected_posts)}\n\n" result += cloud return result
- src/qanon_mcp/__init__.py:625-755 (helper)Supporting helper function used by word_cloud_by_post_ids (and similar tools) to process post texts: filters stopwords, counts word frequencies, generates a textual word cloud visualization with bars and percentages.def generate_word_cloud( post_texts: List[str], min_word_length: int = 3, max_words: int = 100 ) -> str: """ Generate a word cloud analysis from a list of post texts. Args: post_texts: List of text content from posts min_word_length: Minimum length of words to include (default: 3) max_words: Maximum number of words to return (default: 100) Returns: Formatted string with word frequency analysis """ # Common words to exclude (stopwords) stopwords = { "the", "and", "a", "to", "of", "in", "is", "that", "for", "on", "with", "as", "by", "at", "from", "be", "this", "was", "are", "an", "it", "not", "or", "have", "has", "had", "but", "what", "all", "were", "when", "there", "can", "been", "one", "do", "did", "who", "you", "your", "they", "their", "them", "will", "would", "could", "should", "which", "his", "her", "she", "he", "we", "our", "us", "i", "me", "my", "im", "ive", "myself", "its", "it's", "about", "some", "then", "than", "into", } # Combine all texts and replace literal \n with actual newlines combined_text = " ".join([text.replace("\\n", " ") for text in post_texts if text]) # Remove URLs combined_text = re.sub(r"https?://\S+", "", combined_text) # Remove special characters and convert to lowercase combined_text = re.sub(r"[^\w\s]", " ", combined_text.lower()) # Split into words and count frequencies words = combined_text.split() # Filter out stopwords and short words filtered_words = [ word for word in words if word not in stopwords and len(word) >= min_word_length ] # Count word frequencies word_counts = Counter(filtered_words) # Get the most common words most_common = word_counts.most_common(max_words) # Format the result if not most_common: return "No significant words found in the selected posts." total_words = sum(count for _, count in most_common) result = f"Word Cloud Analysis (top {len(most_common)} words from {total_words} total filtered words):\n\n" # Calculate the maximum frequency for scaling max_freq = most_common[0][1] # Create a visual representation of word frequencies for word, count in most_common: # Calculate percentage of total percentage = (count / total_words) * 100 # Scale the bar length bar_length = int((count / max_freq) * 30) bar = "█" * bar_length result += f"{word}: {count} ({percentage:.1f}%) {bar}\n" return result
- src/qanon_mcp/__init__.py:757-757 (registration)The @mcp.tool() decorator registers the word_cloud_by_post_ids function as an MCP tool in the FastMCP server.@mcp.tool()