# CKAN Analysis Components Examples
This directory contains examples demonstrating the analysis components of the CKAN MCP server.
## Files
### `analysis_demo.py` (Recommended)
A simplified, working demonstration of all three analysis components:
- **RelevanceScorer**: Scores datasets based on query relevance
- **UpdateFrequencyAnalyzer**: Categorizes dataset update patterns
- **SummaryBuilder**: Creates structured summaries of CKAN data
### `analysis.py`
A comprehensive demonstration with more detailed examples and real-world data access. This file has some advanced features that may require troubleshooting.
## Running the Examples
### Prerequisites
Set up environment variables for Toronto Open Data access:
```bash
export CKAN_BASE_URL='https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action'
export CKAN_SITE_URL='https://ckan0.cf.opendata.inter.prod-toronto.ca'
```
### Run the Demo
```bash
# Activate virtual environment
source venv/bin/activate
# Run the simplified demo (recommended)
python examples/analysis_demo.py
```
## Analysis Components Overview
### 1. RelevanceScorer
The RelevanceScorer ranks datasets based on how well they match a query using weighted scoring:
**Scoring Components:**
- **Title matches**: 15 points (highest weight)
- **Description matches**: 7 points
- **Tag matches**: 5 points
- **Organization matches**: 3 points
- **Resource matches**: 2 points (lowest weight)
**Example Output:**
```
Query: 'traffic ' -> Score: 29
Title match: ✓ (+15)
Description match: ✓ (+7)
Tag match: ✓ (+5)
Resource match: ✓ (+2)
```
### 2. UpdateFrequencyAnalyzer
The UpdateFrequencyAnalyzer categorizes dataset update patterns using:
**Methods:**
- **Explicit patterns**: Reads `refresh_rate` field (daily, weekly, monthly, etc.)
- **Inferred patterns**: Analyzes `metadata_modified` timestamps against configurable thresholds
**Frequency Categories:**
- `DAILY` - Real-time or daily updates
- `WEEKLY` - Weekly updates
- `MONTHLY` - Monthly updates
- `QUARTERLY` - Quarterly updates
- `ANNUALLY` - Annual updates
- `IRREGULAR` - As-needed updates
- `FREQUENT` - Recent updates (within 14 days)
- `MONTHLY` - Updates within 45 days
- `QUARTERLY` - Updates within 120 days
- `INFREQUENT` - Older updates (120+ days)
**Thresholds (Customizable):**
- Frequent: 14 days
- Monthly: 45 days
- Quarterly: 120 days
### 3. SummaryBuilder
The SummaryBuilder creates structured, truncated summaries of CKAN data:
**Package Summary:**
- Truncated description (200 chars max)
- Key metadata (created, modified dates)
- Resource counts (total vs datastore-enabled)
- Dataset URL generation
- Organization and tag information
**Resource Summary:**
- Resource metadata (format, size, datastore status)
- Last modified information
- Datastore analysis (fields, record counts, sample data)
**Example Output:**
```
Package Summary:
--------------------
ID: traffic-volumes-toronto
Title: Traffic Volumes - Toronto Transportation
Description: This dataset contains traffic volume counts...
Organization: City of Toronto
Tags: ['transportation', 'traffic', 'real-time', 'api']
Resource Count: 2
Datastore Resources: 1
URL: https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/traffic-volumes-toronto
```
## Configuration
The analysis components use configurable weights and thresholds:
```python
# Relevance scoring weights
RelevanceWeights(
title=15, # Title match weight
description=7, # Description match weight
tags=5, # Tag match weight
organization=3, # Organization match weight
resource=2 # Resource match weight
)
# Frequency analysis thresholds
FrequencyThresholds(
frequent_days=14, # Days to consider "frequent"
monthly_days=45, # Days to consider "monthly"
quarterly_days=120 # Days to consider "quarterly"
)
```
## Usage in Practice
These analysis components are used throughout the CKAN MCP server to:
1. **Rank search results** - RelevanceScorer ensures most relevant datasets appear first
2. **Provide update insights** - UpdateFrequencyAnalyzer helps users understand data freshness
3. **Generate summaries** - SummaryBuilder creates concise, readable summaries for AI assistants
The components work together to provide intelligent data discovery and analysis capabilities for open data portals.