# š Release v2.20.0: Complete Monitoring Suite
Major release introducing comprehensive URL monitoring capabilities with GUI dashboard, scheduled automation, and 4x performance improvements.
---
## šÆ Key Features
### š GUI URL Monitoring Dashboard
A complete monitoring interface in the Streamlit admin GUI:
- **Metrics Dashboard**: Real-time display of monitored sites, unique sources, and last check time
- **Dual Check Modes**:
- Quick (Fast): Last-Modified headers only (~1-2s per site)
- Full (Comprehensive): Structure discovery with new/missing page detection (~10-60s per site)
- **Organized Results Display**: 5 tabs for Changed, New Pages, Missing, Unchanged, and Session Statistics
- **Bulk Actions**: Re-scrape all, remove all, view stats per site
- **Individual Actions**: Re-scrape, remove, or open URL for each document
- **Export Functionality**: Download results as JSON for analysis
**Usage**: Launch admin GUI with `streamlit run admin_gui.py`, navigate to "š URL Monitoring"
### š
Scheduled Monitoring System
Complete automation for hands-off monitoring:
- **monitor_daily.py**: Quick daily checks using Last-Modified headers
- **monitor_weekly.py**: Comprehensive weekly checks with structure discovery
- **monitor_fast.py**: High-performance async version (4.2x faster)
- **Centralized Configuration**: `monitor_config.json` for all settings
- **Windows Task Scheduler Integration**:
- `setup_monitoring_tasks.bat`: Install scheduled tasks
- `remove_monitoring_tasks.bat`: Uninstall tasks
- Wrapper scripts for daily/weekly runs
- **Exit Codes**: 0=success, 2=changes detected, 3=failures for CI/CD integration
- **JSON Output**: Timestamped results with per-site statistics
### ā” Performance Optimizations (4.2x Speedup)
Dramatic performance improvements through async/await:
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| **Total Time** | 7.70s | 1.83s | **4.2x faster** |
| **Throughput** | 4.4 docs/sec | 18.0 docs/sec | **4.1x faster** |
| **Per Document** | 0.227s | 0.055s | **4.1x faster** |
**Implementation:**
- Async/await with `aiohttp` for concurrent HTTP requests
- Configurable concurrency (default: 10 simultaneous requests)
- Semaphore-based rate limiting to prevent overwhelming servers
- Real-time progress bars with `tqdm`
- Graceful error handling for network issues
**Benchmarking Tools:**
- `benchmark_url_monitoring.py`: Comprehensive performance measurement suite
- Track quick check, HTTP requests, and URL discovery performance
- JSON output for historical comparison
---
## š Bug Fixes
- **Datetime Comparison**: Fixed "can't compare offset-naive and offset-aware datetimes" errors
- **Security Path Validation**: Project directory automatically included in allowed paths
- **Unicode Console Output**: Replaced Unicode characters with ASCII for Windows compatibility
---
## š¦ Dependencies Added
- `requests>=2.31.0`: HTTP library for synchronous requests
- `aiohttp>=3.9.0`: Async HTTP client for concurrent operations
- `tqdm>=4.65.0`: Progress bars for real-time feedback
---
## š Documentation
### New Documentation
- **WEB_MONITORING_GUIDE.md**: Complete guide to URL monitoring features (500+ lines)
- **MONITORING_SETUP.md**: Setup and automation guide (400+ lines)
- **PERFORMANCE.md**: Performance analysis, benchmarks, and best practices (400+ lines)
### Updated Documentation
- **README.md**: Updated with v2.20.0 features and usage examples
- **.gitignore**: Enhanced with monitoring output patterns
---
## š Quick Start
### Install/Upgrade
```bash
# Clone or pull latest
git clone https://github.com/MichaelTroelsen/tdz-c64-knowledge.git
cd tdz-c64-knowledge
# Install dependencies
pip install -e .
```
### Use GUI Dashboard
```bash
streamlit run admin_gui.py
# Navigate to "š URL Monitoring"
```
### Run Fast Monitoring
```bash
# Quick async check (4.2x faster)
python monitor_fast.py
# With custom concurrency
python monitor_fast.py --concurrent 20
# Daily synchronous check
python monitor_daily.py --output daily_results.json
# Weekly comprehensive check
python monitor_weekly.py --output weekly_results.json
```
### Setup Automation (Windows)
```bash
# Run as Administrator
setup_monitoring_tasks.bat
# This creates:
# - Daily task: Runs at 2:00 AM
# - Weekly task: Runs Sundays at 3:00 AM
```
### Run Benchmarks
```bash
# Measure your system's performance
python benchmark_url_monitoring.py --output my_benchmark.json
```
---
## š Performance Details
### Baseline (Synchronous)
- 34 documents checked in 7.70 seconds
- Sequential HTTP requests
- 4.4 documents/second throughput
### Optimized (Async)
- 33 documents checked in 1.83 seconds
- Concurrent HTTP requests (10 simultaneous)
- 18.0 documents/second throughput
- **4.2x faster overall**
### Scaling Recommendations
| Documents | Concurrency | Est. Time |
|-----------|-------------|-----------|
| 1-50 | 10 | 2-5s |
| 50-100 | 15-20 | 3-7s |
| 100-500 | 20-30 | 10-20s |
| 500+ | 30-50 | 20-40s |
See [PERFORMANCE.md](PERFORMANCE.md) for detailed analysis.
---
## š ļø Technical Details
### Architecture Changes
- New async monitoring pipeline using `asyncio` and `aiohttp`
- Semaphore-based concurrency control
- Connection pooling via `ClientSession`
- Progress tracking with `tqdm.asyncio`
### GUI Integration
- New "š URL Monitoring" page in admin interface
- Real-time check execution with Streamlit
- Session state management for results
- Pandas DataFrame integration for table displays
### Monitoring Features
- Scrape session grouping by base URL
- Last-Modified header tracking
- New page discovery via website crawling
- Missing page detection (404 handling)
- Configurable depth and page limits
- Automatic database timestamp updates
---
## š Files Added/Modified
### New Files (14)
- `MONITORING_SETUP.md` - Setup guide
- `PERFORMANCE.md` - Performance documentation
- `monitor_daily.py` - Daily monitoring script
- `monitor_weekly.py` - Weekly monitoring script
- `monitor_fast.py` - High-performance async version
- `monitor_config.json` - Configuration file
- `benchmark_url_monitoring.py` - Benchmarking suite
- `setup_monitoring_tasks.bat` - Windows scheduler setup
- `remove_monitoring_tasks.bat` - Task removal
- `run_monitor_daily.bat` - Daily wrapper
- `run_monitor_weekly.bat` - Weekly wrapper
- Plus additional support files
### Modified Files (4)
- `admin_gui.py` - Added URL Monitoring page (+337 lines)
- `pyproject.toml` - Added dependencies
- `README.md` - Updated with v2.20.0 features
- `.gitignore` - Enhanced patterns
**Total**: ~3,800 lines of code added
---
## š” Migration Notes
### From v2.19.0 or earlier
1. **Install new dependencies**:
```bash
pip install -e . # Installs aiohttp, tqdm
```
2. **No breaking changes** - All existing functionality preserved
3. **Optional**: Set up scheduled monitoring
```bash
# Windows (as admin)
setup_monitoring_tasks.bat
# Linux
# Add to crontab (see MONITORING_SETUP.md)
```
4. **New features available immediately**:
- GUI dashboard in admin interface
- Fast async monitoring scripts
- Benchmarking tools
---
## š Acknowledgments
Built with:
- [Streamlit](https://streamlit.io/) - Admin GUI framework
- [aiohttp](https://docs.aiohttp.org/) - Async HTTP client
- [tqdm](https://tqdm.github.io/) - Progress bars
- [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) - HTML parsing
---
## š Full Changelog
**Features:**
- GUI URL Monitoring Dashboard with comprehensive interface
- Scheduled monitoring system (daily/weekly)
- High-performance async monitoring (4.2x faster)
- Benchmarking suite for performance measurement
- Progress indicators with real-time feedback
- Windows Task Scheduler integration
- Per-site statistics and session grouping
- Export functionality (JSON)
**Bug Fixes:**
- Fixed datetime timezone comparison errors
- Enhanced security path validation
- Windows console Unicode compatibility
**Documentation:**
- Complete monitoring guide (WEB_MONITORING_GUIDE.md)
- Setup and automation guide (MONITORING_SETUP.md)
- Performance analysis (PERFORMANCE.md)
**Dependencies:**
- Added: requests, aiohttp, tqdm
**Performance:**
- 4.2x speedup for URL checking operations
- Configurable concurrent request limits
- Optimized network operations
---
## š® Future Plans
Potential enhancements for future releases:
- Distributed monitoring across workers
- ML-based anomaly detection
- Email/Slack notification integration
- Caching with TTL for HEAD requests
- Adaptive concurrency based on response times
---
**Full Documentation**: [README.md](README.md)
**Monitoring Guide**: [WEB_MONITORING_GUIDE.md](WEB_MONITORING_GUIDE.md)
**Setup Guide**: [MONITORING_SETUP.md](MONITORING_SETUP.md)
**Performance**: [PERFORMANCE.md](PERFORMANCE.md)
---
š¤ Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>