Uses .env files for configuration management including API keys, model settings, and server transport options
Supports flexible storage for Data Product Requirement Prompts with both Google Cloud Storage (gs://) and local file paths for saving generated requirements documents
Generates structured Data Product Requirement Prompts as standardized markdown documents and supports loading organizational context from markdown files
Uses Poetry for dependency management and running the MCP server in development and production environments
Integrates with pytest for running comprehensive unit tests and coverage reporting of the planning agent functionality
Built as a Python-based MCP server that transforms business intents into structured Data Product Requirement Prompts through AI-powered conversations
Uses Ruff for code linting and quality checks as part of the development workflow
Data Planning Agent
An MCP (Model Context Protocol) agent that transforms high-level business intents into structured Data Product Requirement Prompts (Data PRPs) through AI-powered conversational refinement.
Overview
The Data Planning Agent is the first component in a multi-agent system for automated Business Intelligence dashboard generation. It helps data scientists and analysts gather comprehensive requirements by:
Starting with a vague business intent
Refining through AI-guided clarifying questions
Generating a structured, machine-readable Data PRP document
The output Data PRP serves as input for the Data Discovery Agent, enabling automated data source identification and analysis.
Features
๐ค AI-Powered Conversations: Uses Gemini 2.5 Pro for intelligent requirement gathering
โ Smart Questioning: Asks up to 4 focused questions at a time, biased toward multiple choice for efficiency
๐ Structured Output: Generates standardized Data PRP markdown documents
๐พ Flexible Storage: Supports both GCS (
gs://
) and local file paths๐จ Organizational Context: Load custom context files to tailor agent behavior to your organization
๐ MCP Integration: Full MCP server implementation (stdio + HTTP transports)
๐ฅ๏ธ Interactive CLI: Test conversations directly from the command line
๐ฏ Cursor Compatible: Works seamlessly as a Cursor MCP server
Installation
Prerequisites
Python 3.10 or higher
Poetry for dependency management
Gemini API key
Setup
Clone the repository:
Install dependencies using Poetry:
Create a
.env
file from the example:
Configure your environment variables in
.env
:
Usage
Interactive CLI Mode
The easiest way to test the Planning Agent:
This launches an interactive session that guides you through:
Entering your initial business intent
Answering clarifying questions
Generating and saving the final Data PRP
MCP Server Mode (for Cursor Integration)
Run as an MCP server for integration with Cursor:
Using with Cursor
Add this configuration to your ~/.cursor/mcp.json
:
Then use these MCP tools in Cursor:
1. start_planning_session
Start a new planning session:
Returns a session ID and initial clarifying questions.
2. continue_conversation
Continue the conversation with responses:
Returns follow-up questions or completion notification.
3. generate_data_prp
Generate the final Data PRP:
Returns the generated Data PRP markdown and file location.
Example Conversation Flow
Data PRP Output Format
The generated Data PRP follows this structure:
Organizational Context
The Planning Agent can be customized to your organization by loading context files that influence all AI interactions.
What is Organizational Context?
Context files are markdown documents that provide the AI with:
Company-specific terminology and standards
Standard operating procedures (SOPs)
Data governance policies
Technical constraints
Communication preferences
How to Use Context
Create a context directory (local or GCS):
mkdir ./contextAdd markdown files with your organizational knowledge:
# context/01_organization.md # context/02_sop.md # context/03_constraints.mdConfigure the agent to use your context:
# .env CONTEXT_DIR=./context # or for GCS: # CONTEXT_DIR=gs://my-bucket/planning-context/Files are loaded automatically when the agent starts
Example Context Files
See the context.example/
directory for real examples:
01_organization.md: Organizational background, team structure, communication style
02_sop.md: Standard operating procedures, terminology standards, data governance
03_constraints.md: Technical constraints, preferred analysis patterns, budget considerations
Benefits
Consistency: Agent uses your terminology and follows your SOPs
Governance: Automatically applies your data governance policies
Efficiency: No need to repeat organizational context in every conversation
Flexibility: Update context files without changing code
Context Behavior
Context is prepended to all AI prompts (initial questions, follow-ups, PRP generation)
Context is hidden from users - it silently guides agent behavior
Context is optional - agent works normally without it
Multiple files are concatenated alphabetically
Supports both local and GCS storage
Configuration
All configuration is managed through environment variables. See .env.example
for the complete list:
Variable | Description | Default |
| Gemini API key (required) | - |
| Gemini model to use |
|
| Default output directory |
|
| Context directory (local or GCS) | None |
| Transport mode (
or
) |
|
| HTTP server host |
|
| HTTP server port |
|
| Max conversation turns |
|
| Logging level |
|
Architecture
Components
MCP Server (
src/data_planning_agent/mcp/
)Stdio and HTTP transports
JSON-RPC 2.0 protocol
SSE support for real-time updates
Clients (
src/data_planning_agent/clients/
)GeminiClient
: Gemini API wrapper for conversationsStorageClient
: GCS and local file I/O
Core Logic (
src/data_planning_agent/core/
)ConversationManager
: Session state managementRequirementRefiner
: Conversation orchestrationPRPGenerator
: Data PRP markdown generation
Models (
src/data_planning_agent/models/
)PlanningSession
: Session data modelDataProductRequirementPrompt
: PRP schema
CLI (
src/data_planning_agent/cli/
)Interactive command-line interface
Integration with Data Discovery Agent
Testing
Run tests with pytest:
Development
Code Quality
Format code with Black:
Lint with Ruff:
Project Structure
Troubleshooting
Common Issues
Issue: GEMINI_API_KEY not set
Solution: Ensure your
.env
file contains a valid Gemini API key
Issue: Session timeout or max turns reached
Solution: Increase
MAX_CONVERSATION_TURNS
in.env
Issue: GCS write permission denied
Solution: Ensure your GCP credentials have write access to the bucket
Issue: Cursor can't connect to MCP server
Solution: Check that
MCP_TRANSPORT=stdio
and thecwd
path is correct
License
Apache License 2.0 - See LICENSE for details.
Contributing
Contributions are welcome! Please:
Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request
Related Projects
Data Discovery Agent - Discovers relevant datasets
Query Generation Agent - Generates SQL queries
Data Discovery Infrastructure - GCP infrastructure
Support
For issues, questions, or contributions, please open an issue on GitHub.
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Transforms high-level business intents into structured Data Product Requirement Prompts through AI-powered conversational refinement. Guides users through clarifying questions to gather comprehensive requirements for automated Business Intelligence dashboard generation.