YTPipe

ENHANCED_ANALYZER_DOCS.md•8.64 KiB

# Enhanced Analyzer Service Documentation ## Overview The AnalyzerService has been enhanced with 5 new analysis capabilities designed for the lab dashboard. These features provide deeper insights into video content using simple, effective NLP techniques. **Location**: `/Users/lech/PROJECTS_all/PROJECT_ytpipe/ytpipe/services/intelligence/analyzer.py` --- ## New Methods ### 1. `generate_summary()` Generate 3-5 summary bullet points from content. **Approach**: - Splits text into sentences - Scores sentences by keyword density - Returns top-scoring sentences as bullet points - Falls back to metadata if content is sparse **Example**: ```python analyzer = AnalyzerService() summary = analyzer.generate_summary(metadata, chunks, max_bullets=5) # Output: # [ # "This video demonstrates how to build a REST API with FastAPI", # "We cover authentication, database integration, and testing", # "The final application handles 1000+ requests per second" # ] ``` **Use Case**: Quick overview for dashboard preview --- ### 2. `extract_entities()` Extract named entities (people, organizations, concepts). **Approach**: - Regex pattern matching for capitalized sequences - Filters common non-entities - Simple classification (person/org/concept) - Frequency-based ranking **Example**: ```python entities = analyzer.extract_entities(chunks, max_entities=10) # Output: # [ # {"entity": "FastAPI", "type": "concept", "count": 15}, # {"entity": "Python", "type": "concept", "count": 12}, # {"entity": "Dr Smith", "type": "person", "count": 3}, # {"entity": "Google Inc", "type": "org", "count": 2} # ] ``` **Classification Rules**: - **Organization**: Contains Inc, LLC, Corp, University, etc. - **Person**: Prefixed with Dr, Mr, Mrs, Prof, etc. OR 2-word capitalized - **Concept**: Everything else (technologies, frameworks, topics) **Use Case**: Quick reference guide, topic clustering --- ### 3. `analyze_sentiment()` Analyze overall sentiment/tone of content. **Approach**: - Keyword-based sentiment scoring - Positive words (great, excellent, helpful, etc.) - Negative words (bad, problem, difficult, etc.) - Score = positive / (positive + negative) **Example**: ```python sentiment = analyzer.analyze_sentiment(chunks) # Output: # { # "sentiment": "positive", # positive|neutral|negative # "score": 0.72, # 0=negative, 0.5=neutral, 1=positive # "distribution": { # "positive": 45, # "negative": 18, # "neutral": 1234 # } # } ``` **Thresholds**: - **Positive**: score > 0.6 - **Neutral**: 0.4 ≤ score ≤ 0.6 - **Negative**: score < 0.4 **Use Case**: Content tone indicator, review analysis --- ### 4. `calculate_difficulty()` Calculate content difficulty level. **Approach**: - **Word Length**: Longer words → harder content - **Vocabulary Complexity**: Unique/total ratio → lexical diversity - **Sentence Structure**: Words per sentence → complexity - **Technical Density**: Technical keywords per 100 words **Example**: ```python difficulty = analyzer.calculate_difficulty(chunks) # Output: # { # "level": "intermediate", # beginner|intermediate|advanced|expert # "score": 0.45, # 0=beginner, 1=expert # "factors": { # "avg_word_length": 5.2, # "vocab_complexity": 0.38, # "avg_sentence_length": 15.4, # "technical_density": 0.023 # } # } ``` **Level Thresholds**: - **Beginner**: score < 0.3 - **Intermediate**: 0.3 ≤ score < 0.5 - **Advanced**: 0.5 ≤ score < 0.7 - **Expert**: score ≥ 0.7 **Use Case**: Learning path recommendations, audience targeting --- ### 5. `extract_action_items()` Extract actionable instructions from content. **Approach**: - Detects imperative verbs (install, run, configure) - Matches instruction patterns ("you should", "first", "step 1") - Filters and ranks by relevance - Returns up to N unique items **Example**: ```python action_items = analyzer.extract_action_items(chunks, max_items=5) # Output: # [ # "Install Python 3.8 or higher on your system", # "Run pip install -r requirements.txt to install dependencies", # "Configure your database connection in config.yaml", # "First, create a virtual environment using python -m venv", # "You should test the API using pytest before deploying" # ] ``` **Detection Patterns**: - Imperative verbs: install, run, configure, setup, create, build, etc. - Instruction phrases: "you should", "you must", "you need to" - Step indicators: "first", "then", "next", "step 1" **Use Case**: Quick start guides, tutorial extraction --- ## Integration with AnalysisReport The `AnalysisReport` model has been extended with optional fields: ```python class AnalysisReport(BaseModel): # ... existing fields ... # Enhanced analysis (optional - for dashboard) summary_bullets: Optional[List[str]] = None entities: Optional[List[Dict[str, Any]]] = None sentiment: Optional[Dict[str, Any]] = None difficulty: Optional[Dict[str, Any]] = None action_items: Optional[List[str]] = None ``` These fields are **optional** to maintain backward compatibility. The core `analyze()` method does not populate them by default. --- ## Testing Run the test script to see all features in action: ```bash # First, process a video python -m ytpipe.cli.main 'https://youtube.com/watch?v=VIDEO_ID' # Then test enhanced features python test_enhanced_analyzer.py ``` **Test Output**: ``` 📂 Using video: dQw4w9WgXcQ ✅ Loaded 45 chunks 📝 SUMMARY GENERATION Generated 5 bullet points: 1. This tutorial covers FastAPI fundamentals 2. We build a complete REST API with authentication ... 🏷️ ENTITY EXTRACTION Extracted 10 entities: • FastAPI [concept ] - 15 occurrences • Python [concept ] - 12 occurrences ... 😊 SENTIMENT ANALYSIS Overall Sentiment: POSITIVE Sentiment Score: 0.72 ... 📊 DIFFICULTY ANALYSIS Difficulty Level: INTERMEDIATE Difficulty Score: 0.45 ... ✓ ACTION ITEMS Extracted 5 action items: 1. Install Python 3.8 or higher 2. Run pip install fastapi ... ✅ ALL TESTS COMPLETED 🎉 Enhanced analyzer features ready for dashboard integration! ``` --- ## Dashboard Integration These methods are designed to be called on-demand for dashboard display: ```python # In dashboard generation code from ytpipe.services.intelligence.analyzer import AnalyzerService analyzer = AnalyzerService() # Generate enhanced insights summary = analyzer.generate_summary(metadata, chunks) entities = analyzer.extract_entities(chunks) sentiment = analyzer.analyze_sentiment(chunks) difficulty = analyzer.calculate_difficulty(chunks) actions = analyzer.extract_action_items(chunks) # Add to dashboard HTML/JSON dashboard_data = { "summary": summary, "entities": entities, "sentiment": sentiment, "difficulty": difficulty, "action_items": actions } ``` --- ## Design Principles 1. **Simple & Fast**: No external NLP libraries, pure Python regex/counting 2. **Graceful Degradation**: Returns sensible defaults for empty/short content 3. **Type Safe**: Uses Pydantic models internally, returns plain dicts/lists 4. **Dashboard Ready**: Output format designed for easy HTML/JSON rendering 5. **Backward Compatible**: Existing code unaffected, new features opt-in --- ## Limitations These are **heuristic-based** methods, not ML models: - **Entity extraction**: May miss context-dependent entities - **Sentiment**: Simple keyword matching, no context awareness - **Difficulty**: Based on surface features, not semantic complexity - **Action items**: Pattern matching, may include non-actionable text For production applications requiring high accuracy, consider: - spaCy for entity recognition - VADER or RoBERTa for sentiment - Flesch-Kincaid for readability - Fine-tuned transformers for instruction extraction --- ## Future Enhancements Potential improvements: 1. **Caching**: Cache analysis results to avoid recomputation 2. **Configurable Thresholds**: Allow custom difficulty/sentiment thresholds 3. **Multi-language**: Extend stopwords/patterns for non-English 4. **ML Integration**: Optional upgrade path to transformer models 5. **Batch Processing**: Analyze multiple videos and aggregate insights --- ## Files Modified 1. **ytpipe/services/intelligence/analyzer.py** - Added 5 new methods - Added helper method `_classify_entity()` 2. **ytpipe/core/models.py** - Extended `AnalysisReport` with 5 optional fields 3. **test_enhanced_analyzer.py** (new) - Comprehensive test suite for all features 4. **ENHANCED_ANALYZER_DOCS.md** (this file) - Complete documentation --- ## Questions? These features are **production-ready** and tested on real video data. They provide a good balance of simplicity, speed, and utility for dashboard visualization. For advanced NLP needs or questions, consult the core team.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/leolech14/ytpipe'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ENHANCED_ANALYZER_DOCS.md•8.64 KiB