Skip to main content
Glama
cryptoyoginya

harness-health-engineering

🧬 Harness Health Engineering

A local AI lab that helps you discover what actually improves your life.

Physiology streams in from Whoop Β· lived experience goes in by text Β· voice Β· selfie Β· an on-device agent runs n-of-1 experiments and tells you what actually makes life better β€” all as plain Markdown you own.

Live demo

↑ a black-and-white product page Β· cryptoyoginya.github.io/harness-health-engineering

demo local-first method north star RAG Whoop License


🧭 Explore

⭐ The one idea

πŸ”¬ The engine

πŸ†š vs Whoop journal

πŸ”„ How it works

πŸ“₯ Three ways in

🀳 Photo diaries

🧱 Architecture

πŸ› οΈ Technology

πŸš€ Deploy your own


The one idea

Every health gadget you've owned optimised a number. Recovery. HRV. Steps. And somewhere along the way the number became the point, and your actual life β€” whether you felt good, did meaningful work, saw people you love β€” fell out of frame. That's Goodhart's law wearing a fitness band: when a measure becomes the target, it stops measuring anything that matters.

This project flips it. The top-level metric here is not a body score. It is one honest question:

⭐ North-star: "Is my life actually better?"

Everything else β€” recovery, sleep, HRV, supplements, training load β€” is demoted to what it really is: an instrument in service of that question. The system will happily tell you that your body looks great this week and your life doesn't, and then help you fix the right thing. No wearable can say that, because no wearable knows what your good life looks like.

Built for people who have tried enough supplements, trackers, routines, and protocols to know that the hard question is not β€œwhat is optimal?” but β€œwhat is optimal for me, in the life I actually live?”. Not for people who want motivation, streaks, badges, or a prettier sleep chart.

Related MCP server: MCP Index Notes

The engine: n-of-1 experiments πŸ”¬

Correlations are guesses. "You sleep worse when you drink" can't tell you if the drink did it or if a hard day caused both. So instead of guessing, the harness runs n-of-1 trials β€” single-subject experiments, the real methodology personalised medicine uses to decide if something works for one specific person. Every experiment is pre-registered:

Step

Rule

Hypothesis

a specific causal claim

One variable

change exactly one thing β€” the rule everyone breaks

Baseline

a measured "before"

Duration + criterion

written before the data, so you can't fool yourself

Verdict

merge β†’ it becomes a standing rule, or revert β†’ drop it and log why

This is the difference between "I tried a thing once" and knowledge that compounds. The payoff is a sentence no tracker will ever give you:

"Creatine moved nothing for you in three weeks β€” stop paying for it." "Caffeine before 14:00 bought you +35 min of deep sleep β€” it's a rule now."

One variable at a time. A clock. A criterion. A verdict that becomes a rule. That's the whole game, and it's why this gets smarter every week instead of just logging more.

stateDiagram-v2
    direction LR
    [*] --> proposed: /exp new Β· or agent proposes
    proposed --> active: baseline + criterion set
    active --> active: /exp extend
    active --> merged: criterion met β†’ becomes a rule
    active --> reverted: missed / stopped β†’ logged why
    merged --> [*]
    reverted --> [*]

You drive it from anywhere: /exp new to start, /exp extend to give it more time, /exp stop to call it β€” or just ask the agent to design a tighter one. Either way it ends in a verdict, not a vibe.

Why it's different from Whoop's journal

Whoop's journaling is genuinely good β€” and it has a ceiling. Here's where this goes that it structurally can't:

Wearable journal

Harness Health Engineering

Top metric

a body score

your life quality

Evidence

correlation

n-of-1 causation β†’ rules

Scope

body only

body Γ— work Γ— people Γ— supplements Γ— bloodwork

Output

dashboards

decisions and experiments

Reasoning

a black box

on-device, cited, interrogable in plain language

Memory

a feed

a versioned record you can ask "why was March hard?"

Guardrail

β€”

flags metric-tyranny: proxy up, life flat β†’ that's a fail

The discipline is the product. The data is just raw material. The full scientific rationale β€” n-of-1 design, surrogate-endpoint failure, evidence labelling, confounding β€” plus a clinician-facing brief with references is in METHODOLOGY.md.

How it works

flowchart TD
    W["Whoop API v2"] -->|"OAuth2 Β· auto 9:00 Β· serialized refresh"| SYNC["sync.mjs"]
    H(["You Β· text Β· voice Β· selfie"]) -->|"events Β· mood Β· energy Β· social Β· work Β· body"| BOT["Telegram bot<br/>capture layer Β· on-device"]
    SYNC -->|"recovery Β· HRV Β· sleep Β· strain"| RAW["01_raw Β· daily record"]
    BOT --> RAW
    RAW --> SRC["02_sources Β· weekly notes<br/>FACT / INFERENCE"]
    SRC --> SYN["04_synthesis Β· patterns + life-quality"]
    SYN --> EXP["05_decisions Β· n-of-1 experiments<br/>one variable Β· criterion Β· verdict"]
    EXP --> RULE["CLAUDE.md Β· rules that stuck"]
    EXP -.->|"merge / revert"| RAW
    NS["⭐ north-star:<br/>is life better?"] --- SYN

    AGENT["on-device agent<br/>RAG Β· cited synthesis Β· MCP"] --- RAW
    AGENT --- SYN
    AGENT --- EXP

The loop: Signal β†’ Ingest β†’ Source note β†’ Synthesis β†’ Experiment β†’ Verdict β†’ Rule β†’ repeat. Objective body data arrives on its own; you add a 40-second diary; on Sundays the agent scores your week against the north-star and finds what's actually moving it.

What it tracks

Daily (auto): recovery, HRV, resting HR, sleep, strain β€” from Whoop. Daily (you, ~40s β€” by text, voice, or selfie): events (your impressions journal), mood, energy, social, work, movement, supplements. Weekly: one integral "is life better?" 1–5. Quarterly: six life dimensions β€” emotion, connection, body, meaning, autonomy, growth.

Three ways in πŸ“₯

Lived experience is messy, and you shouldn't have to sit at a keyboard to capture it. The bot takes three modalities β€” mix them freely, several per day:

Mode

How

Processing

Privacy

✍️ Text

type a line

appended to today's record

local

πŸŽ™οΈ Voice

send a voice note

transcribed on-device (Whisper ONNX) β†’ text

audio never leaves the machine

🀳 Selfie

send a photo

archived to a local visual diary, reviewed on demand by the agent

biometric β€” never committed to git

A photo is auto-classified on-device (CLIP, zero-shot β€” the image never leaves the machine) into one of three local diaries, each read against its own canon β€” all descriptive, never diagnostic. No caption needed; a caption simply overrides the guess:

Type (auto)

Diary

Canon the agent uses

selfie (default)

face

skin / fluid / affect signals over time

food

meals

plate composition Β· protein Β· fibre Β· processing Β· timing

stool

gut

Bristol Stool Scale β€” type, colour, frequency

Every message lands as a timestamped line - 14:30 … in 01_raw/health/YYYY-MM-DD.md, so the shape of the day is preserved β€” not flattened into one average.

A gentle daily rhythm. A morning body brief, a nudge or two with a question rotating across life domains, an evening recap β€” /quiet mutes the lot. Commands set the rest: /north your direction Β· /habit what's already standing (15k steps, omega for years β€” so it's treated as baseline, not re-discovered) Β· /exp to run, extend, or /exp slip to log a lapse on an n-of-1 (one active at a time β€” truly one variable, so the effect is attributable) Β· /week Β· /month Β· /year for horizons.

Numbers stay where they belong: only Whoop signals are summarised numerically. Mood, energy, and meaning are never averaged into a score β€” a single honest "is life better?" is more valid than an arithmetic of parts.

Three photo diaries, one on-device eye 🀳

Your face, your plate, and your gut all leave visible traces of how you live β€” and a wearable sees none of them. The harness turns photos into longitudinal signals, while staying strictly on the safe side of the line: it describes and compares over time, it never diagnoses.

The on-device eye: how routing works πŸ‘οΈ

Send any photo β€” no caption, no menu. The bot identifies what it is with CLIP zero-shot image classification running locally (ONNX, the same on-device stack as the voice transcriber). The image never leaves the machine; nothing is uploaded.

flowchart TD
    P["πŸ“· photo<br/>(Telegram)"] --> DL["bot: download"]
    DL --> CLIP{"on-device CLIP<br/>zero-shot Β· ONNX"}
    CLIP -->|face| S1["photos/ Β· face"]
    CLIP -->|meal| S2["photos/food/ Β· food"]
    CLIP -->|toilet| S3["photos/stool/ Β· gut"]
    S1 --> STORE["local Β· gitignored<br/>only a filename ref in the daily log"]
    S2 --> STORE
    S3 --> STORE
    STORE -.->|"on request: 'review my …'"| AGENT["agent vision<br/>(strong model)"]
    AGENT --> OBS["dated observations Β· local"]
    OBS --> HYP["hypothesis"] --> EXP["n-of-1 experiment"]
  • Selfie is the safe default β€” the classifier must clear a confidence margin to file a photo as food or stool; otherwise it stays in the neutral face diary.

  • A caption always wins β€” write "food" / "stool" / "selfie" to override the guess.

  • Storage is the only automatic step. Content is never auto-analysed β€” the deep read happens only when you ask the agent ("review my selfies / food / stool"), keeping a strong model's quality without sending anything anywhere by default. (Warm classification β‰ˆ 150 ms.)

1 Β· Face β€” the visual diary

What the agent reads from a face β€” over time, not in one shot:

Group

Signals

May reflect

Cross-checked against

πŸ’§ Fluid / puffiness

under-eye bags, facial fullness, lid heaviness

water retention, fatigue

sleep, salt/alcohol at night, cycle phase, stress

🎨 Skin tone

redness/flush, sallowness, pallor, blotchiness

vascular reaction, tiredness

alcohol, heat/exertion, recovery, hydration

🧴 Texture / breakouts

spot count & location, shine vs dryness

hormonal pattern, hydration

cycle phase, sugar/dairy (hypothesis), stress, sleep

πŸ‘οΈ Eyes

sclera redness, dark circles, clarity of gaze

tiredness, irritation

sleep, alcohol, screens, allergy

🌳 Affect / vitality

jaw/brow tension, downturned vs lit-up

mood, energy

mood/energy 1–5, "lived as wanted?"

2 Β· Food β€” read against the plate canon

Method: the plate canon β€” ~Β½ vegetables Β· ~ΒΌ protein Β· ~ΒΌ complex carbs Β· + healthy fat.

Signal

May reflect

Cross-checked against

protein present?

satiety, stable glucose

energy, afternoon sugar cravings

fibre / veg share

gut transit, fullness

stool, energy

processing level (whole vs ultra-processed)

inflammation (hypothesis)

mood, energy

refined sugar / fast carbs

glucose swings

energy, sleep, skin

meal timing (late eating)

overnight recovery

sleep, next-day recovery

No calorie counting β€” unreliable from a photo; the agent reads composition and timing, not numbers. Described neutrally, never moralised β€” an eating-disorder guardrail keeps the focus on food β†’ how you feel, not control or guilt.

3 Β· Gut β€” the Bristol Stool Scale

Method: the clinical Bristol Stool Scale (type 1–7, with 3–4 as the healthy middle) plus colour.

Read

May reflect

Cross-checked against

type 1–2 (hard lumps)

slow transit, constipation-leaning

water, fibre, magnesium, travel

type 3–4

normal

β€”

type 6–7 (loose / watery)

fast transit

trigger foods, stress, FODMAPs, caffeine

colour (pale-clay Β· black-tarry Β· red)

bile flow Β· possible bleed

red-flag β†’ doctor

Black/tarry stool or visible blood β†’ see a doctor, not a diary entry β€” no interpretation attempted.

Hard limits, across all three diaries

  • Not a diagnosis β€” any worrying sign resolves to "see a doctor," never an interpretation.

  • Constitution β‰  trend β€” innate features (dark circles, face shape) are constants, not changes.

  • Shooting noise β€” light, angle, makeup, time of day distort more than physiology; mismatched shots β†’ low confidence.

  • Biology lags β€” skin breakouts (and gut shifts) surface days after a trigger; never pinned to "yesterday."

  • One shot β‰  a pattern β€” value is the trend; strongest evidence = paired shots, "morning after X vs morning without X" β€” already almost an n-of-1.

Architecture: the knowledge pyramid 🧱

Everything is plain Markdown in a layered pyramid β€” raw signal at the base, decisions at the top. Each layer only consumes the one below it, so evidence flows upward and nothing is asserted without a trail back to its source.

Layer

Holds

Example

⭐ 00_context

north-star, metrics, the domain frame

"is life better?", leading vs lagging metrics

πŸ“₯ 01_raw

daily records β€” you + Whoop

2026-06-13.md, photos, voice notes

πŸ”– 02_sources

weekly notes, evidence-labelled

FACT / INFERENCE per week

πŸ“š 03_wiki

what you've learned, baselines

personal HRV, supplement stack

🧩 04_synthesis

patterns + life-quality

the cross-layer story

πŸ”¬ 05_decisions

n-of-1 experiments

one variable Β· criterion Β· verdict

🎁 06_outputs

finished artefacts

specs, talks

flowchart TB
    CAP["✍️ πŸŽ™οΈ 🀳 capture Β· ⌚ Whoop"] --> R["πŸ“₯ 01 Β· raw"]
    R --> S["πŸ”– 02 Β· sources<br/>FACT / INFERENCE"]
    S --> Y["🧩 04 · synthesis<br/>patterns · life-quality"]
    Y --> D["πŸ”¬ 05 Β· decisions<br/>n-of-1 experiments"]
    D --> RU["βœ… rules that stuck"]
    D -. "merge / revert" .-> R
    NS["⭐ 00 · north-star"] -. "is life better?" .-> Y

The retrieval, hooks, and agent all read this pyramid β€” never raw guesses. The full scientific rationale lives in METHODOLOGY.md; the layer contracts in AGENTS.md.

Technology

Layer

Stack

Ingest

Whoop API v2, OAuth2 (serialized, race-safe token refresh), Node 22, zero-dep

Capture

free Telegram bot (local, long-polling) β€” text Β· voice Β· selfie β€” + Markdown

Voice

whisper (ONNX, on-device) + prebuilt ffmpeg-static β€” transcription, audio never leaves the device

Vision

on-device CLIP (ONNX) auto-routes photos β†’ selfie / food / stool diaries; deep review by agent on demand β€” non-diagnostic

Knowledge base

layered Markdown pyramid 00β†’06 with frontmatter contracts + evidence labels

Retrieval

hybrid RAG β€” multilingual-e5-small (ONNX) + sqlite-vec + FTS5 BM25, fused via RRF

Agent

MCP server (kb_search / kb_think / kb_backlinks) for any MCP client

Control plane

Claude Code hooks (evidence + frontmatter linters), permissions, working-memory invariant

Automation

macOS launchd β€” morning sync + brief, hands-off

Quality

kb-doctor health-check, weekly dream-cycle audit

Quickstart

πŸ“– Standing up your own copy? Follow the full step-by-step in SETUP.md β€” Telegram bot, Whoop app, .env, auto-start, and an honest privacy breakdown.

corepack enable
pnpm run setup
cp .env.example .env            # WHOOP + Telegram bot tokens
pnpm whoop:auth                 # one-time OAuth β†’ .whoop/ (gitignored)
pnpm whoop:sync                 # pull physiology + push a morning brief
pnpm kb:index
pnpm kb:think "what actually moves my life quality?"
pnpm kb:doctor

Daily rhythm: morning sync + brief run themselves Β· evening text/voice/photo your diary to the bot Β· Sunday ask the agent to review the week.

Privacy & safety

Local-first, but honest about the edges. Your record lives on your disk; pre-processing (voice→text, photo-type, search) runs on-device; photos and sensitive diaries are git-ignored and never pushed; secrets (.env, .whoop/) are git-ignored and deny-read by the agent. What does leave the machine, by design: messages transit Telegram (the Bot API is not end-to-end encrypted), Claude/Anthropic processes what you send the agent to analyse, and Whoop returns your physiology. Nothing is sold or posted publicly. Full breakdown in SETUP.md.

Not a medical device. The agent never diagnoses; any worrying signal resolves to one recommendation β€” see a specialist.

Contributing

Built as one person's body-as-codebase, designed to generalise to any self-quantifier. PRs especially welcome on: wearable adapters (Oura, Garmin, Apple Health), a live Whoop MCP server, richer experiment designs, and synthesis evals. See CONTRIBUTING.md and the open issues. (Yes, Bryan, you too.)

License

MIT Β© 2026 Christina Vinter. See LICENSE.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

–Maintainers
–Response time
–Release cycle
–Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cryptoyoginya/harness-health-engineering'

If you have feedback or need assistance with the MCP directory API, please join our Discord server