# AgentOps Phase 1.2: Anti-Slop Hardening + Cost Transparency
**Status**: ✅ Complete
**Date**: 2026-01-29
**Tests**: 12/12 passing
## Summary
Implemented P0 anti-slop fixes and cost transparency improvements based on ChatGPT o1 reasoning model feedback.
## Changes Implemented
### P0-1: Evidence Provenance Enforcement ✅
**Problem**: Evidence URLs only validated by prompt, not by code.
**Solution**: Added code-level enforcement that evidence URLs must be from collected signals.
**Files Modified**:
- `agentops/runner/lib/template.ts`
- Added `normalizeURL()` function (handles arXiv URL variations, trailing slashes)
- Added `validateEvidenceProvenance()` function
- Updated `validateProposalsPayload()` signature to accept `collectedSignals` parameter
- `agentops/runner/lib/synthesis.ts`
- Pass collected signals to validator
**Tests Added** (2):
- Rejects URLs not in collected signals
- Normalizes arXiv URLs correctly (http/https, /abs vs /pdf)
---
### P0-2: Fix Numeric Impact Regex ✅
**Problem**: False positives on "MCP v2", "stage 2", etc. False negatives on "2x" at EOL.
**Solution**: New unit-based pattern that only rejects numbers with impact units (%, ms, x, times).
**Files Modified**:
- `agentops/runner/lib/template.ts` (line 253)
- OLD: `/\d+%|\d+x\s|by\s+\d+|reduce.*\d+|improve.*\d+/i`
- NEW: `/\b\d+(\.\d+)?\s*(%|ms|milliseconds?|seconds?|minutes?|hours?|x|×|times)(\b|$)|by\s+\d+(\.\d+)?\s*(%|ms|x|×|times)/i`
**What it catches** (true positives):
- "40%" → ✅ Rejected
- "2x faster" → ✅ Rejected
- "100ms improvement" → ✅ Rejected
- "by 40%" → ✅ Rejected
**What it allows** (false positives fixed):
- "MCP v2" → ✅ Allowed
- "Improve stage 2 gating" → ✅ Allowed
- "Reduce 404 errors" → ✅ Allowed
**Tests Added** (4):
- Allows version numbers (v2, v2.0.1)
- Allows stage numbers (stage 2)
- Rejects percentage claims (40%)
- Rejects multiplier claims (2x)
---
### P0-3: Add --fixtures Flag ✅
**Problem**: `--dry-run` still costs money (makes LLM calls).
**Solution**: True zero-cost mode using hardcoded fixture data.
**Files Modified**:
- `agentops/runner/cli.ts`
- Added `--fixtures` flag parsing
- Updated help text
- `agentops/runner/daily-dev-brief.ts`
- Added `fixturesMode` to `DailyBriefOptions` interface
- Pass option to synthesis
- `agentops/runner/lib/synthesis.ts`
- Added `SynthesisOptions` interface
- Early return in `synthesizeProposals()` if fixturesMode is true
- Returns fixture data from `agentops/fixtures/proposals.example.json`
- Cost: $0.00
**Usage**:
```bash
# Zero cost, no LLM calls
npm run agentops:daily -- --fixtures
# Previous behavior (LLM call, costs money, no GitHub issue)
npm run agentops:daily -- --dry-run
# Production (LLM call + GitHub issue creation)
npm run agentops:daily
```
---
### P0-4: Cost Calculation Transparency ✅
**Problem**: Costs calculated from hardcoded pricing, not labeled as calculated, not model-aware.
**Solution**: Created pricing config per model with transparency metadata.
**Files Created**:
- `agentops/runner/lib/llm/pricing.ts`
- `ANTHROPIC_PRICING` config for all models
- `calculateCost()` function with model-specific pricing
- Handles Sonnet 4.5 large context tier (>200K tokens)
- Includes pricing source URL and date
**Files Modified**:
- `agentops/runner/lib/llm/types.ts`
- Renamed `cost_usd` → `cost_usd_calculated`
- Added `cost_metadata` with pricing transparency
- `agentops/runner/lib/llm/provider.ts`
- Import and use `calculateCost()` for Anthropic
- Updated OpenAI handler with metadata
- `agentops/runner/lib/synthesis.ts`
- Use `cost_usd_calculated` instead of `cost_usd`
**Pricing Transparency**:
```json
{
"cost_usd_calculated": 0.069,
"cost_metadata": {
"model": "claude-sonnet-4-5-20250929",
"inputPricePerMToken": 3.00,
"outputPricePerMToken": 15.00,
"pricingSource": "https://www.anthropic.com/pricing",
"pricingDate": "2026-01-29"
}
}
```
**Tests Added** (4):
- Sonnet 4.5 standard pricing (≤200K)
- Sonnet 4.5 large context pricing (>200K)
- Opus 4.5 pricing
- Unknown model fallback
---
### P1-1: Per-Source Observability ✅
**Problem**: Can't debug which sources degraded or became slower.
**Solution**: Added per-source counts and timings to metadata.
**Files Modified**:
- `agentops/runner/lib/sources/types.ts`
- Added `signals_by_source: Record<string, number>`
- Added `elapsed_ms_by_source: Record<string, number>`
- `agentops/runner/lib/sources/collect.ts`
- Track timing for each source (repo, arxiv, rss, html)
- Count signals per source
- Include in returned metadata
**Example Output**:
```json
{
"signals_by_source": {
"repo": 3,
"arxiv": 12,
"rss": 5,
"html": 11
},
"elapsed_ms_by_source": {
"repo": 450,
"arxiv": 890,
"rss": 320,
"html": 1200
}
}
```
---
### P1-2: Add signals.json Artifact ✅
**Problem**: Can't reproduce how signals were subsetted/deduplicated.
**Solution**: Persist all collected signals before processing.
**Files Modified**:
- `agentops/runner/daily-dev-brief.ts`
- Store full signals collection in `collectedSignalsData`
- Write `signals.json` with all collected signals
- Add to artifact_index
**Artifact Structure**:
```json
{
"collected_at": "2026-01-29T12:00:00Z",
"total_collected": 31,
"signals": [...],
"metadata": {
"signals_by_source": {...},
"elapsed_ms_by_source": {...}
}
}
```
---
## Test Results
```
=== Testing Evidence Provenance Enforcement ===
✅ Evidence provenance: rejects URLs not in signals
✅ Evidence provenance: normalizes arXiv URLs
=== Testing Numeric Impact Regex ===
✅ Numeric regex: allows version numbers (v2)
✅ Numeric regex: allows stage numbers
✅ Numeric regex: rejects percentage claims (40%)
✅ Numeric regex: rejects multiplier claims (2x)
=== Testing Cost Calculation ===
✅ Cost calculation: Sonnet 4.5 standard pricing
✅ Cost calculation: Sonnet 4.5 large context pricing
✅ Cost calculation: Opus 4.5 pricing
✅ Cost calculation: unknown model fallback
=== Testing URL Normalization ===
✅ URL normalization: arXiv URLs
✅ URL normalization: trailing slashes
✨ All Phase 1.2 tests passed!
```
**Total**: 12/12 tests passing
---
## TypeScript Compilation
```bash
$ cd agentops && npx tsc --noEmit
# No errors
```
---
## Files Changed
### Modified (9 files)
1. `agentops/runner/lib/template.ts` - Evidence provenance + numeric regex fix
2. `agentops/runner/lib/synthesis.ts` - Fixtures mode + pass signals to validator
3. `agentops/runner/lib/llm/provider.ts` - Use pricing config
4. `agentops/runner/lib/llm/types.ts` - Update response types
5. `agentops/runner/lib/sources/types.ts` - Add per-source metadata fields
6. `agentops/runner/lib/sources/collect.ts` - Track per-source metrics
7. `agentops/runner/cli.ts` - Add --fixtures flag
8. `agentops/runner/daily-dev-brief.ts` - Fixtures mode + signals.json
9. `agentops/runner/types.ts` - (No changes needed, using existing types)
### Created (2 files)
1. `agentops/runner/lib/llm/pricing.ts` - Pricing config + calculateCost()
2. `agentops/tests/phase1.2.test.ts` - Test suite (12 tests)
---
## Risk Assessment
| Component | Risk | Status |
|-----------|------|--------|
| Evidence provenance | LOW | ✅ Clear errors if URLs don't match |
| Numeric regex | LOW | ✅ Extensive test coverage |
| Fixtures flag | LOW | ✅ Simple boolean, zero cost verified |
| Cost calculation | LOW | ✅ Official pricing, fallback for unknown models |
| Per-source metadata | LOW | ✅ Additive change, doesn't break existing |
**Overall Risk**: LOW
**No Regressions**: All existing functionality preserved
---
## Next Steps
1. ✅ Run integration test with `--fixtures` flag
2. ✅ Run integration test with `--dry-run` flag (verify cost metadata)
3. Monitor 3-5 production runs with new validation
4. Verify per-source timings help debug slow sources
---
## Maintenance Notes
### Pricing Updates
Update `agentops/runner/lib/llm/pricing.ts` quarterly:
1. Check https://www.anthropic.com/pricing
2. Update prices if changed
3. Update `lastUpdated` date
4. Run tests to verify calculations
5. Add new models as released
### Evidence Provenance
If too strict, can disable by not passing `collectedSignals` to `validateProposalsPayload()`.
---
## Documentation
See plan document for full context:
- `agentops/PHASE1.2_PLAN.md` (if exists)
- ChatGPT o1 feedback summary included in plan
---
**Implementation Time**: ~4 hours
**Lines of Code Added**: ~350
**Lines of Code Modified**: ~50
**Test Coverage**: 12 new tests, all passing