SpeechPulse
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@SpeechPulseAnalyze emotion and urgency in audio file /path/to/call.wav"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
SpeechPulse
Voice Emotion Understanding MCP Server
SpeechPulse analyzes speech audio to detect emotions, assess urgency, and detect sarcasm using prosodic features (pitch, energy, rhythm). Built with pure Python standard library for zero ML dependencies in the Lite tier.
Features
Emotion Detection: Recognizes 7 emotions (happy, excited, angry, sad, tired, anxious, neutral) using coefficient of variation (CV) thresholds
Urgency Assessment: 4-level urgency detection (low, medium, high, critical) based on speaking patterns
Sarcasm Detection: Identifies sarcasm by comparing text sentiment with audio emotion
Zero ML Dependencies: Lite tier uses pure Python standard library (no numpy/scipy/librosa)
MCP Compatible: Exposes tools via Model Context Protocol for integration with Claude Desktop and other MCP clients
Installation
From PyPI (when published)
pip install speechpulseFrom Source
git clone https://github.com/sophieMiao/speechpulse.git
cd speechpulse
pip install -e ".[dev]"Quick Start
As MCP Server
Add to your MCP client configuration (e.g., Claude Desktop):
{
"mcpServers": {
"speechpulse": {
"command": "python",
"args": ["-m", "speechpulse"],
"env": {
"SPEECHPULSE_TIER": "lite"
}
}
}
}As Python Library
from speechpulse.analyzer import SpeechAnalyzer
# Initialize analyzer
analyzer = SpeechAnalyzer()
# Analyze emotion
result = analyzer.analyze("path/to/audio.wav")
print(f"Primary emotion: {result['emotion']['primary']}")
# Assess urgency
urgency = analyzer.assess_urgency("path/to/audio.wav")
print(f"Urgency level: {urgency.level}")
# Detect sarcasm (requires text in Lite tier)
sarcasm = analyzer.detect_sarcasm(
"path/to/audio.wav",
text="这真是太棒了"
)
print(f"Is sarcastic: {sarcasm.is_sarcastic}")
# Full analysis
full = analyzer.full_analysis("path/to/audio.wav", text="我受够了!")
print(full['summary'])
print(full['interpretation'])CLI Usage
# Start MCP server with stdio transport (default)
python -m speechpulse
# Start with SSE transport
python -m speechpulse --transport sse --port 8080
# Enable verbose logging
python -m speechpulse -vMCP Tools
analyze_audio
Analyze audio for emotion and basic features.
Parameters:
audio_path(string, required): Path to WAV audio filetext(string, optional): Transcription text for context
Returns: Emotion detection results, speaker state, and raw audio features
assess_urgency
Assess urgency level from audio prosody.
Parameters:
audio_path(string, required): Path to audio filetext(string, optional): Text for keyword-based urgency detection
Returns: Urgency score, level, and reasoning
detect_sarcasm
Detect sarcasm by comparing text sentiment with audio emotion.
Parameters:
audio_path(string, required): Path to audio filetext(string, optional): Transcription text (recommended)
Returns: Sarcasm detection result with confidence and indicators
full_analysis
Perform complete analysis (emotion + urgency + sarcasm).
Parameters:
audio_path(string, required): Path to audio filetext(string, optional): Transcription text
Returns: Complete analysis with summary and interpretation
health_check
Check server health and capabilities.
Returns: Status, version, tier, and available capabilities
Architecture
speechpulse/
├── types.py # Core data types (AudioFeatures, EmotionResult, etc.)
├── config.py # Configuration management
├── utils.py # Audio loading and processing utilities
├── audio_features.py # Feature extraction (pitch, energy, etc.)
├── emotion.py # CV-based emotion rule engine
├── urgency.py # Urgency assessment logic
├── sarcasm.py # Sarcasm detection
├── analyzer.py # Main analysis pipeline
├── server.py # MCP server implementation
├── asr.py # ASR stub (Standard/Pro tier)
└── ml_emotion.py # ML emotion stub (Pro tier)Technical Details
Audio Processing
Pure Python: Uses only
wave,struct,math, andarraymodulesFormat Support: WAV files with 8/16/24/32-bit PCM
Resampling: Linear interpolation to 16kHz
Framing: 32ms frames with 50% overlap, Hamming window
Feature Extraction
Pitch: Autocorrelation-based F0 detection (50-500 Hz range)
Energy: RMS energy per frame
Zero Crossing Rate: Voice/unvoiced discrimination
Silence Ratio: Pause pattern analysis
Emotion Recognition
Uses coefficient of variation (CV = std/mean) to avoid gender bias while maintaining discriminative power:
# Example: Happy emotion rule (using coefficient of variation)
"happy": {
"conditions": [
("pitch_cv", ">", 0.15), # High pitch variation (lively)
("energy_mean", ">", 0.3), # Moderate-high energy
("energy_cv", ">", 0.2), # Energy fluctuation
],
"weight": 0.8,
}Urgency Assessment
Based on 5 factors:
Speaking rate (fast/medium/slow)
Volume level (high/medium/low)
Pitch variation (high/medium/low)
Pause pattern (few/normal/many pauses)
Keyword detection (when text provided)
Tiers
Lite Tier (Current)
✅ Rule-based emotion recognition
✅ Prosodic urgency assessment
✅ Keyword-based sarcasm detection
✅ Pure Python (no ML dependencies)
❌ No ASR (provide text manually)
❌ WAV format only
Standard Tier (Planned)
ASR with faster-whisper
Additional audio formats (MP3, FLAC, etc.)
Speaker diarization
Pro Tier (Planned)
Qwen2-Audio integration
Context-aware emotion analysis
Nuanced emotion detection
Real-time streaming
Development
Setup
# Clone repository
git clone https://github.com/sophieMiao/speechpulse.git
cd speechpulse
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"Running Tests
# Run all tests
python -m pytest tests/
# Run specific test file
python tests/test_all.py
# Run integration tests
python tests/test_integration.pyDemo
# Run demo script
python examples/demo.pyConfiguration
Environment variables:
Variable | Default | Description |
|
| Service tier (lite/standard/pro) |
|
| Target sample rate |
|
| Analysis frame size |
|
| Frame hop size |
Limitations
Lite tier requires text for sarcasm detection: Provide transcription via
textparameterWAV format only: Convert other formats to WAV before analysis
Rule-based emotions: ML-based nuanced emotion detection in Pro tier
Optimized for Chinese/English: Full multilingual support in Pro tier
Contributing
Fork the repository
Create a feature branch (
git checkout -b feature/amazing-feature)Commit changes (
git commit -m 'Add amazing feature')Push to branch (
git push origin feature/amazing-feature)Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
Built with MCP SDK
Inspired by prosodic analysis research in speech emotion recognition
CV approach based on gender-fair emotion recognition research
Support
GitHub Issues: https://github.com/sophieMiao/speechpulse/issues
Made with ❤️ for voice emotion understanding
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sophieMiao/speechpulse'
If you have feedback or need assistance with the MCP directory API, please join our Discord server