Limitless MCP Server

SPEECH_BIOMARKERS.md•13.9 KiB

# Speech Biomarker Analysis: Methodology and Clinical Validation **Version:** 1.1 **Date:** January 2025 **Authors:** Boris Djordjevic (199 Longevity), Enhanced implementation based on foundation by Ryan Boyle (ipvr9) **Status:** Legacy Documentation > **Note:** This document describes the v0.4-0.5 complex biomarker system. For the current simplified approach, see [SPEECH_VITALITY_INDEX.md](./SPEECH_VITALITY_INDEX.md). The complex system has been replaced with the Speech Vitality Index (SVI) which provides a single, reliable score from quality conversations only. ## Executive Summary This document describes a rigorous statistical methodology for extracting speech biomarkers from conversational data recorded via the Limitless Pendant. The system provides clinically-relevant metrics including speech rate, pause patterns, and vocabulary complexity with proper confidence intervals, statistical significance testing, and population-based percentile rankings. ## 1. Introduction and Rationale ### 1.1 Clinical Background Speech pattern analysis has emerged as a powerful tool for health monitoring and cognitive assessment. Research over the past 40 years has established that speech biomarkers can serve as early indicators for: - **Cognitive decline** (detectable up to 6 years before clinical diagnosis) - **Neurological conditions** (Parkinson's disease, multiple sclerosis, Alzheimer's disease) - **Mental health conditions** (depression, anxiety, bipolar disorder) - **Acute cognitive states** (fatigue, stress, medication effects) ### 1.2 Advantages of Speech Biomarkers 1. **Non-invasive and continuous monitoring** 2. **Early detection capabilities** before physical symptoms manifest 3. **Objective measurement** reducing subjective assessment bias 4. **Ecological validity** through natural conversation analysis 5. **Real-time assessment** for immediate health insights ### 1.3 Previous Limitations Traditional speech biomarker implementations suffer from: - Arbitrary thresholds without statistical validation - Lack of confidence intervals and uncertainty quantification - No population-based comparisons - Insufficient sample size considerations - Missing statistical significance testing ## 2. Methodology ### 2.1 Data Source and Structure **Input:** Limitless Pendant lifelogs containing structured conversation data **Speaker Identification:** User segments identified by `speakerIdentifier === "user"` **Content Structure:** Each segment contains: - `content`: Transcribed speech text - `startOffsetMs`/`endOffsetMs`: Precise timing information - `speakerName`: Speaker identification - `speakerIdentifier`: User classification ### 2.2 Data Quality Assessment #### 2.2.1 Validation Criteria Segments are validated against multiple quality criteria: ```typescript // Quality validation logic if (wordCount < 3) → "too_few_words" if (durationMs < 500) → "too_short_duration" if (wordsPerMinute > 400 || wordsPerMinute < 30) → "unrealistic_speech_rate" if (content.length < 10) → "too_brief_content" if (nonSpeechPatterns.test(content)) → "minimal_speech" ``` #### 2.2.2 Outlier Detection Statistical outlier removal using Interquartile Range (IQR) method: - Q1 = 25th percentile, Q3 = 75th percentile - IQR = Q3 - Q1 - Outliers: values < Q1 - 1.5×IQR or > Q3 + 1.5×IQR #### 2.2.3 Quality Scoring ``` Quality Score = (Valid Segments / Total Segments) × (1 - Outlier Rate) ``` ### 2.3 Core Biomarker Extraction #### 2.3.1 Speech Rate Analysis **Primary Metric:** Words per minute (WPM) **Standard Calculation:** ``` WPM = (Word Count / Duration in milliseconds) × 60,000 ``` **Effective Speech Rate Calculation (v0.5.1+):** To better reflect natural conversational patterns, the system now calculates an effective speech rate that excludes long pauses: ``` 1. Group segments by conversation/lifelog 2. Calculate speaking time including only pauses < 2 seconds 3. Effective WPM = (Total Words / Effective Speaking Time) × 60,000 ``` This approach provides a more accurate measure of actual speaking rate by excluding extended pauses that occur naturally in monologues or when the speaker is thinking. **Population Norms:** 120-180 WPM (Tauroza & Allison, 1990) #### 2.3.2 Pause Pattern Analysis **Calculation:** Time gaps between consecutive speech segments ``` Pause Duration = Next Segment Start Time - Current Segment End Time ``` **Classification:** - Short pause: < 500ms - Medium pause: 500ms - 2s - Long pause: 2s - 5s - Extended pause: > 5s **Clinical Significance:** Long pauses (>2s) associated with cognitive processing delays #### 2.3.3 Vocabulary Complexity **Enhanced Calculation:** Type-Token Ratio (TTR) combined with word length ``` TTR = Unique Words / Total Words Word Length = Average characters per word Complexity Score = (TTR × 10) + (Word Length × 0.5) ``` **Rationale:** TTR measures lexical diversity; word length indicates linguistic sophistication ### 2.4 Statistical Methodology #### 2.4.1 Confidence Intervals 95% confidence intervals calculated using t-distribution for small samples: ``` CI = mean ± t(α/2, df) × (s / √n) ``` Where: - t(α/2, df) = critical t-value for α=0.05, degrees of freedom = n-1 - s = sample standard deviation - n = sample size #### 2.4.2 Trend Analysis Linear regression analysis for temporal trends: ``` y = β₀ + β₁x + ε ``` Where: - y = speech rate values - x = time points (hours since first measurement) - β₁ = slope (WPM change per hour) **Statistical Significance:** p < 0.05 for slope coefficient #### 2.4.3 Population Percentile Rankings Comparison against simulated normal distributions: - Speech Rate: N(150, 30) - mean 150 WPM, SD 30 - Pause Duration: N(1.2, 0.5) - mean 1.2s, SD 0.5s - Vocabulary Complexity: N(6.0, 1.5) - estimated population norm ### 2.5 Temporal Analysis #### 2.5.1 Circadian Pattern Detection **Method:** Analysis of variance (ANOVA) across hourly groups **Test Statistic:** F-ratio comparing between-group to within-group variance **Significance:** p < 0.05 indicates significant time-of-day effects #### 2.5.2 Weekly Trend Analysis **Grouping:** Segments aggregated by calendar week **Metrics:** Weekly mean speech rate with confidence intervals **Comparison:** Week-over-week statistical testing ## 3. Clinical Interpretation Framework ### 3.1 Reliability Assessment **High Reliability:** - ≥100 valid segments - Quality score ≥80% - Adequate for clinical-grade trend detection **Medium Reliability:** - 30-99 valid segments - Quality score 60-79% - Suitable for monitoring with caution **Low Reliability:** - <30 valid segments - Quality score <60% - Insufficient for clinical interpretation ### 3.2 Population Comparison **Speech Rate Interpretation:** - >85th percentile: Above average (may indicate high processing speed) - 15-85th percentile: Normal range - <15th percentile: Below average (monitor for decline) ### 3.3 Trend Significance **Effect Size Classification:** - Large effect: |slope| > 5 WPM/hour (clinically significant) - Medium effect: |slope| 2-5 WPM/hour (monitor closely) - Small effect: |slope| < 2 WPM/hour (likely normal variation) ## 4. Algorithm Implementation ### 4.1 Core Analysis Pipeline ```typescript 1. extractAndValidateSegments(lifelogs) ├── Filter user segments (speakerIdentifier === "user") ├── Apply quality validation criteria ├── Remove outliers using IQR method └── Generate quality metrics 2. calculateStatisticalMetrics(validSegments) ├── Speech rate: mean, confidence intervals, standard error ├── Pause duration: gaps between segments ├── Vocabulary complexity: TTR + word length └── Words per turn: segment word counts 3. performTrendAnalysis(segments) ├── Linear regression: time vs. speech rate ├── Calculate R², p-value, confidence intervals └── Assess statistical significance 4. generatePopulationComparisons(metrics) ├── Compare against normal distributions ├── Calculate percentile rankings └── Provide clinical context 5. assessTimePatterns(segments) ├── Group by hour of day ├── ANOVA for circadian effects └── Weekly trend analysis ``` ### 4.2 Data Quality Pipeline ```typescript 1. Segment Validation ├── Word count ≥ 3 ├── Duration ≥ 500ms ├── Realistic speech rate (30-400 WPM) ├── Minimum content length ≥ 10 characters └── Exclude non-speech utterances 2. Outlier Detection ├── Calculate Q1, Q3, IQR for speech rates ├── Remove values outside [Q1-1.5×IQR, Q3+1.5×IQR] └── Flag segments with quality issues 3. Reliability Assessment ├── Quality score = (valid/total) × (1-outlier_rate) ├── Sample size adequacy check └── Generate data collection recommendations ``` ## 5. Evidence Base and References ### 5.1 Speech Rate Research **Tauroza, S., & Allison, D. (1990).** Speech rates in British English. *Applied Linguistics*, 11(1), 90-105. - Established normal speech rate range: 120-180 WPM - Foundation for population comparison norms **Tsanas, A., et al. (2012).** Accurate telemonitoring of Parkinson's disease progression by noninvasive speech tests. *IEEE Transactions on Biomedical Engineering*, 59(12), 3398-3408. - Speech rate as early biomarker for neurological conditions - Validation of automated speech analysis ### 5.2 Cognitive Assessment Literature **König, A., et al. (2015).** Automatic speech analysis for the assessment of patients with predementia and Alzheimer's disease. *Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring*, 1(1), 112-124. - Speech patterns as cognitive decline indicators - Clinical validation of automated analysis **Weiner, J., et al. (2017).** Language changes in Alzheimer's disease: A systematic review. *Alzheimer's Research & Therapy*, 9(1), 1-17. - Comprehensive review of speech biomarkers - Clinical significance of vocabulary complexity ### 5.3 Statistical Methodology **Cohen, J. (1988).** Statistical power analysis for the behavioral sciences. Lawrence Erlbaum Associates. - Effect size interpretation guidelines - Sample size requirements for reliable detection **Field, A. (2013).** Discovering statistics using IBM SPSS statistics. Sage Publications. - Confidence interval calculation methods - Trend analysis statistical procedures ## 6. Validation and Quality Assurance ### 6.1 Technical Validation **Unit Testing:** Comprehensive test suite covering: - Statistical calculation accuracy - Confidence interval precision - Outlier detection effectiveness - Data quality assessment reliability **Integration Testing:** End-to-end pipeline validation with: - Simulated data with known properties - Edge case handling (minimal data, extreme values) - Cross-platform compatibility verification ### 6.2 Clinical Validation Framework **Planned Validation Studies:** 1. **Concurrent Validity:** Correlation with established clinical speech assessments 2. **Test-Retest Reliability:** Stability of measurements over short intervals 3. **Criterion Validity:** Association with health outcomes and biomarkers 4. **Longitudinal Validation:** Long-term tracking of speech pattern changes ### 6.3 Ethical Considerations **Privacy Protection:** - All analysis performed locally - No health data transmitted to external servers - User control over data retention and sharing **Clinical Use Guidelines:** - Results for monitoring and tracking only - Not for diagnostic purposes without clinical validation - Recommendations for healthcare provider consultation ## 7. Future Enhancements ### 7.1 Short-term Improvements - **Phonetic Analysis:** Voice quality and articulation metrics - **Emotional Prosody:** Mood indicators from speech patterns - **Multi-language Support:** Analysis across different languages ### 7.2 Long-term Research Directions - **Machine Learning Integration:** Advanced pattern recognition - **Multi-modal Analysis:** Combining speech with other biomarkers - **Predictive Modeling:** Early warning systems for health changes ## 8. Conclusion This speech biomarker analysis system represents a significant advancement in digital health monitoring by providing: 1. **Statistical Rigor:** Proper confidence intervals, significance testing, and population comparisons 2. **Clinical Relevance:** Evidence-based metrics with established health associations 3. **Quality Assurance:** Comprehensive data validation and reliability assessment 4. **Transparency:** Open methodology suitable for peer review and validation The system transforms conversational data into clinically-meaningful insights while maintaining scientific standards appropriate for health monitoring applications. --- ## Appendix A: Statistical Formulas ### Confidence Interval Calculation ``` CI = x̄ ± t(α/2,n-1) × (s/√n) Where: x̄ = sample mean t(α/2,n-1) = critical t-value s = sample standard deviation n = sample size α = significance level (0.05 for 95% CI) ``` ### Linear Regression for Trends ``` β₁ = Σ(xi - x̄)(yi - ȳ) / Σ(xi - x̄)² β₀ = ȳ - β₁x̄ R² = 1 - (SSres / SStot) Where SSres = Σ(yi - ŷi)², SStot = Σ(yi - ȳ)² ``` ### Type-Token Ratio ``` TTR = |{unique words}| / |{total words}| ``` ## Appendix B: Implementation Notes ### Performance Characteristics - **Processing Speed:** ~650,000 tokens/second (using WinkNLP) - **Memory Usage:** <80MB for typical analysis - **Scalability:** Linear with number of speech segments ### Dependencies - **WinkNLP:** Fast, lightweight NLP processing - **Statistical Libraries:** Custom implementation for clinical requirements - **TypeScript:** Type safety and maintainability --- **Document Version Control:** - v1.0 (June 4, 2025): Initial methodology documentation - v1.1 (June 4, 2025): Added effective speech rate calculation methodology (excludes long pauses >2s) - Ready for peer review and clinical validation **Contact Information:** - Technical Implementation: Boris Djordjevic, 199 Longevity - Foundation Framework: Ryan Boyle (ipvr9) - Repository: https://github.com/199-biotechnologies/limitless-bettermcp

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/199-mcp/mcp-limitless'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

SPEECH_BIOMARKERS.md•13.9 KiB