Skip to main content
Glama

Saiten β€” Agents League @ TechConnect Scoring Agent

Submission Track: 🎨 Creative Apps β€” GitHub Copilot

Overview

A multi-agent system that automatically scores all Agents League @ TechConnect hackathon submissions and generates ranking reports β€” just type @saiten-orchestrator score all in VS Code.

Designed with Orchestrator-Workers + Prompt Chaining + Evaluator-Optimizer patterns, 6 Copilot custom agents autonomously collect GitHub Issue submissions, evaluate them against track-specific rubrics, validate scoring consistency, and generate reports via an MCP (Model Context Protocol) server.


Agent Workflow

Design Patterns

  • Orchestrator-Workers: @saiten-orchestrator delegates to 5 specialized sub-agents

  • Prompt Chaining: Collect β†’ Score β†’ Review β†’ Report with Gates at each step

  • Evaluator-Optimizer: Reviewer validates scores, triggers re-scoring on FLAG

  • Handoff: Commenter posts feedback only after explicit user confirmation

  • SRP (Single Responsibility Principle): 1 agent = 1 responsibility

Reasoning Patterns

  • Chain-of-Thought (CoT): Scorer evaluates each criterion sequentially, building evidence chain before calculating weighted total

  • Evaluator-Optimizer Loop: Reviewer detects 5 bias types (central tendency, halo effect, leniency, range restriction, anchoring) β†’ FLAGs β†’ Scorer re-evaluates with specific guidance β†’ max 2 cycles

  • Gate-based Error Recovery: Each workflow step has a validation gate; failures trigger graceful degradation (skip + warn) rather than hard stops

  • Evidence-Anchored Scoring: Rubrics define explicit evidence_signals (positive/negative) per criterion; scorers must cite signals from actual submission content

Reliability Features

  • Exponential Backoff Retry: gh CLI calls retry up to 3 times on rate limits (429) and server errors (5xx) with exponential delay

  • Rate Limiting: Sliding-window rate limiter (30 calls/60s per tool) prevents GitHub API abuse

  • Input Validation: All MCP tool inputs validated at boundaries (Fail Fast) β€” scores 1-10, weighted_total 0-100, required fields checked

  • Corrupted Data Recovery: scores.json auto-backed up on parse failure, server continues with empty store

  • Idempotent Operations: Re-scoring safely overwrites existing entries by issue_number key

Workflow Diagram

flowchart TD User["πŸ‘€ User\n@saiten-orchestrator score all"] subgraph Orchestrator["πŸ† @saiten-orchestrator"] Route["Intent Routing\nUC-01~06"] Gate1{"Gate: MCP\nConnectivity"} Gate2{"Gate: Data\nCompleteness"} Gate3{"Gate: Score\nValidity"} Gate4{"Gate: Review\nPASS/FLAG?"} Integrate["Result Integration\n& User Report"] Handoff["[Handoff]\nπŸ’¬ Post Feedback"] end subgraph Collector["πŸ“₯ @saiten-collector"] C1["list_submissions()"] C2["get_submission_detail()"] C3["Data Validation"] end subgraph Scorer["πŸ“Š @saiten-scorer"] S1["get_scoring_rubric()"] S2["Rubric-based Evaluation\n1-10 score per criterion"] S3["Quality Self-Check"] S4["save_scores()"] end subgraph Reviewer["πŸ” @saiten-reviewer"] V1["Load scores.json"] V2["Statistical Outlier\nDetection (2Οƒ)"] V3["Rubric Consistency\nCheck"] V4["Bias Detection"] end subgraph Reporter["πŸ“‹ @saiten-reporter"] R1["generate_ranking_report()"] R2["Trend Analysis"] R3["Report Validation"] end subgraph Commenter["πŸ’¬ @saiten-commenter"] CM1["Generate Comment\nper Top N"] CM2["User Confirmation\n(Human-in-the-Loop)"] CM3["gh issue comment"] end subgraph MCP["⚑ saiten-mcp (FastMCP Server)"] T1["list_submissions"] T2["get_submission_detail"] T3["get_scoring_rubric"] T4["save_scores"] T5["generate_ranking_report"] end subgraph External["External"] GH["GitHub API\n(gh CLI)"] FS["Local Storage\ndata/ & reports/"] end User --> Route Route --> Gate1 Gate1 -->|OK| Collector Gate1 -->|FAIL| User C1 --> C2 --> C3 C3 --> Gate2 Gate2 -->|OK| Scorer Gate2 -->|"⚠️ Skip"| Integrate S1 --> S2 --> S3 S3 -->|PASS| S4 S3 -->|"FAIL: Re-evaluate"| S2 S4 --> Gate3 Gate3 -->|OK| Reviewer V1 --> V2 --> V3 --> V4 V4 --> Gate4 Gate4 -->|PASS| Reporter Gate4 -->|"FLAG: Re-score"| Scorer R1 --> R2 --> R3 R3 --> Integrate --> User Integrate --> Handoff Handoff -->|"User clicks"| Commenter CM1 --> CM2 --> CM3 Collector -.->|MCP| T1 & T2 Scorer -.->|MCP| T3 & T4 Reporter -.->|MCP| T5 T1 & T2 -.-> GH T4 & T5 -.-> FS CM3 -.-> GH style Orchestrator fill:#1a1a2e,stroke:#e94560,color:#fff style Collector fill:#16213e,stroke:#0f3460,color:#fff style Scorer fill:#16213e,stroke:#0f3460,color:#fff style Reviewer fill:#1a1a2e,stroke:#e94560,color:#fff style Reporter fill:#16213e,stroke:#0f3460,color:#fff style Commenter fill:#0f3460,stroke:#533483,color:#fff style MCP fill:#0f3460,stroke:#533483,color:#fff

Agent Roster

Agent

Role

SRP Responsibility

MCP Tools

πŸ† @saiten-orchestrator

Orchestrator

Intent routing, delegation, result integration

β€” (delegates all)

πŸ“₯ @saiten-collector

Worker

GitHub Issue data collection & validation

list_submissions, get_submission_detail

πŸ“Š @saiten-scorer

Worker

Rubric-based evaluation with quality gate

get_scoring_rubric, save_scores

πŸ” @saiten-reviewer

Evaluator

Score consistency review & bias detection

get_scoring_rubric, read scores

πŸ“‹ @saiten-reporter

Worker

Ranking report generation & trend analysis

generate_ranking_report

πŸ’¬ @saiten-commenter

Handoff

GitHub Issue feedback comments (user-confirmed)

gh issue comment

Design Principles Applied

Principle

How Applied

SRP

Each agent handles exactly 1 responsibility (6 agents Γ— 1 duty)

Fail Fast

Gates at every step; anomalies reported immediately

SSOT

All score data centralized in data/scores.json

Feedback Loop

Scorer β†’ Reviewer β†’ Re-score loop (Evaluator-Optimizer pattern)

Human-in-the-Loop

Commenter runs only after explicit user confirmation via Handoff

Transparency

Todo list shows progress; each Gate reports status

Idempotency

Re-scoring overwrites; safe to run multiple times

ISP

Each sub-agent receives only the tools and data it needs


System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ VS Code β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ πŸ† @saiten-orchestrator β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ πŸ“₯ @saiten-collector (Worker) β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ πŸ“Š @saiten-scorer (Worker) β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ πŸ” @saiten-reviewer (Evaluator) β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ πŸ“‹ @saiten-reporter (Worker) β”‚ β”‚ β”‚ β”‚ └── πŸ’¬ @saiten-commenter (Handoff) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ MCP (stdio) β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ ⚑ saiten-mcp (FastMCP Server / Python) β”‚ β”‚ β”‚ β”‚ β”œ list_submissions() ← gh CLI β†’ GitHub β”‚ β”‚ β”‚ β”‚ β”œ get_submission_detail() ← gh CLI β†’ GitHub β”‚ β”‚ β”‚ β”‚ β”œ get_scoring_rubric() ← YAML files β”‚ β”‚ β”‚ β”‚ β”œ save_scores() β†’ data/scores.json β”‚ β”‚ β”‚ β”‚ β”” generate_ranking_report() β†’ reports/*.md β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Setup

Prerequisites

  • Python 3.10+

  • uv (package manager)

  • gh CLI (GitHub CLI, authenticated)

  • VS Code + GitHub Copilot

Installation

# Clone the repository git clone <repo-url> cd FY26_techconnect_saiten # Create Python virtual environment uv venv source .venv/bin/activate # Windows: .venv\Scripts\activate # Install dependencies (production) uv pip install -e . # Install development dependencies (includes pytest + coverage) uv pip install -e ".[dev]" # Verify gh CLI authentication gh auth status

Environment Variables

No secrets are required for normal operation.

# Copy the template (optional β€” only needed for CI or non-VS Code environments) cp .env.example .env

Variable

Required

Description

GITHUB_TOKEN

No

gh CLI manages its own auth. Only set for CI environments

Security: This project uses gh CLI authentication and VS Code Copilot's built-in Azure OpenAI credentials. No API keys are stored in code or config files.

VS Code Configuration

.vscode/mcp.json automatically configures the MCP server. No additional setup required.


Usage

Type the following in the VS Code chat panel:

Command

Description

Agents Used

@saiten-orchestrator score all

Score all submissions

collector β†’ scorer β†’ reviewer β†’ reporter

@saiten-orchestrator score #48

Score a single submission

collector β†’ scorer β†’ reviewer β†’ reporter

@saiten-orchestrator ranking

Generate ranking report

reporter only

@saiten-orchestrator rescore #48

Re-score a submission

collector β†’ scorer β†’ reviewer β†’ reporter

@saiten-orchestrator show rubric for Creative

Display scoring rubric

Direct response (MCP)

@saiten-orchestrator review scores

Review score consistency

reviewer only


Project Structure

FY26_techconnect_saiten/ β”œβ”€β”€ .github/agents/ β”‚ β”œβ”€β”€ saiten-orchestrator.agent.md # πŸ† Orchestrator β”‚ β”œβ”€β”€ saiten-collector.agent.md # πŸ“₯ Data Collection Worker β”‚ β”œβ”€β”€ saiten-scorer.agent.md # πŸ“Š Scoring Worker β”‚ β”œβ”€β”€ saiten-reviewer.agent.md # πŸ” Score Reviewer (Evaluator) β”‚ β”œβ”€β”€ saiten-reporter.agent.md # πŸ“‹ Report Worker β”‚ └── saiten-commenter.agent.md # πŸ’¬ Feedback Commenter (Handoff) β”œβ”€β”€ src/saiten_mcp/ β”‚ β”œβ”€β”€ server.py # MCP Server + rate limiter + structured logging β”‚ β”œβ”€β”€ models.py # Pydantic data models with boundary validation β”‚ └── tools/ β”‚ β”œβ”€β”€ submissions.py # list_submissions, get_submission_detail β”‚ β”œβ”€β”€ rubrics.py # get_scoring_rubric β”‚ β”œβ”€β”€ scores.py # save_scores β”‚ └── reports.py # generate_ranking_report β”œβ”€β”€ data/ β”‚ β”œβ”€β”€ rubrics/ # Track-specific scoring rubrics (YAML) β”‚ └── scores.json # Scoring results (SSOT) β”œβ”€β”€ reports/ β”‚ └── ranking.md # Auto-generated ranking report β”œβ”€β”€ scripts/ β”‚ └── run_scoring.py # CLI scoring pipeline β”œβ”€β”€ tests/ β”‚ β”œβ”€β”€ conftest.py # Shared test fixtures β”‚ β”œβ”€β”€ test_models.py # Pydantic model validation tests β”‚ β”œβ”€β”€ test_parsers.py # Issue body parser tests β”‚ β”œβ”€β”€ test_rubrics.py # Rubric YAML integrity tests β”‚ β”œβ”€β”€ test_scores.py # Score persistence & validation tests β”‚ β”œβ”€β”€ test_reports.py # Report generation tests β”‚ β”œβ”€β”€ test_reliability.py # Retry, rate limiting, error handling tests β”‚ └── test_e2e.py # E2E integration tests β”œβ”€β”€ .vscode/mcp.json # MCP server config β”œβ”€β”€ AGENTS.md # Agent registry └── pyproject.toml

Testing

The project has a comprehensive test suite with 110 tests covering models, parsers, tools, reliability, and reports.

# Run all tests python -m pytest tests/ -v # Run with coverage report python -m pytest tests/ --cov=saiten_mcp --cov-report=term-missing # Run only unit tests (no network calls) python -m pytest tests/ -m "not e2e" -v # Run integration tests (requires gh CLI auth) python -m pytest tests/ -m e2e -v

Test Structure

Test File

Tests

What It Covers

test_models.py

17

Pydantic models, validation boundaries, evidence-anchored fields

test_parsers.py

28

Issue body parsing, track detection, URL extraction, checklists

test_rubrics.py

20

Rubric YAML integrity, weights, scoring policy, evidence signals

test_scores.py

9

Score persistence, idempotency, input validation, sorting

test_reports.py

8

Markdown report generation, empty/missing data edge cases

test_reliability.py

10

Retry logic, rate limiting, error handling, gh CLI resilience

test_e2e.py

5

End-to-end MCP tool calls with live GitHub data

Total

110

88% code coverage


Scoring Tracks

Track

Criteria

Notes

🎨 Creative Apps

5 criteria

Community Vote (10%) excluded; remaining 90% prorated to 100%

🧠 Reasoning Agents

5 criteria

Uses common overall criteria

πŸ’Ό Enterprise Agents

3 criteria

Custom 3-axis evaluation


Demo

The multi-agent workflow can be invoked directly from VS Code's chat panel:

Scoring a Single Submission

πŸ‘€ User: @saiten-orchestrator score #49 πŸ† @saiten-orchestrator β†’ Routes to collector β†’ scorer β†’ reviewer β†’ reporter πŸ“₯ @saiten-collector: Fetched Issue #49 (EasyExpenseAI) β”œβ”€ Track: Creative Apps β”œβ”€ Repo: github.com/chakras/Easy-Expense-AI β”œβ”€ README: 10,036 chars extracted └─ Gate: βœ… Data complete πŸ“Š @saiten-scorer: Evidence-anchored evaluation β”œβ”€ Accuracy & Relevance: 8/10 β”‚ Evidence: "5-agent Semantic Kernel pipeline with Azure Document Intelligence" β”œβ”€ Reasoning: 7/10 β”‚ Evidence: "Linear pipeline, no self-correction loop" β”œβ”€ Total: 73.9/100 └─ Gate: βœ… All criteria scored with evidence πŸ” @saiten-reviewer: Bias check passed β”œβ”€ Outlier check: PASS (within 2Οƒ) β”œβ”€ Evidence quality: PASS (no generic phrases) └─ Gate: βœ… PASS πŸ“‹ @saiten-reporter: Report saved β†’ reports/ranking.md

Scoring All Submissions

πŸ‘€ User: @saiten-orchestrator score all πŸ† @saiten-orchestrator: Processing 43 submissions across 3 tracks... β”œβ”€ πŸ“₯ Collecting β†’ πŸ“Š Scoring β†’ πŸ” Reviewing β†’ πŸ“‹ Reporting β”œβ”€ Progress tracked via Todo list └─ Final report: reports/ranking.md

Key Differentiators

  • Evidence-anchored scoring: Each criterion requires specific evidence from the submission, not generic phrases

  • Self-correction loop: Reviewer FLAGs biased scores β†’ Scorer re-evaluates β†’ until PASS

  • Real-time progress: Todo list updates visible in VS Code during multi-submission scoring

  • Human-in-the-loop: Feedback comments only posted after explicit user confirmation via Handoff


Troubleshooting

Issue

Cause

Solution

gh command failed

gh CLI not authenticated

Run gh auth login

scores.json corrupted

Interrupted write

Auto-restored from .json.bak backup

ValueError: issue_number must be positive

Bad input to save_scores

Check score data format matches schema

Invalid track name

Typo in track parameter

Use: creative-apps, reasoning-agents, or enterprise-agents

MCP server not starting

Python env mismatch

Ensure uv pip install -e . in the .venv

No submissions returned

Network or auth issue

Run gh api repos/microsoft/agentsleague-techconnect/issues --jq '.[0].number' to test

Corrupted Data Recovery

If data/scores.json becomes corrupted, the server automatically:

  1. Logs a warning with the parse error

  2. Creates a backup at data/scores.json.bak

  3. Continues with an empty score store

To restore manually:

cp data/scores.json.bak data/scores.json

Tech Stack

Layer

Technology

Agent Framework

VS Code Copilot Custom Agent (.agent.md) β€” Orchestrator-Workers pattern

MCP Server

Python 3.10+ / FastMCP (stdio transport)

Package Manager

uv

GitHub Integration

gh CLI / GitHub REST API with exponential backoff retry and rate limiting

Data Models

Pydantic v2 with boundary validation (scores 1-10, weighted_total 0-100)

Data Storage

JSON (scores) / YAML (rubrics) / Markdown (reports) with backup & recovery

Testing

pytest + pytest-cov β€” 110 tests, 88% coverage

Error Handling

Retry with backoff, rate limiting, input validation, corrupted file recovery


License

MIT

Install Server
A
security – no known vulnerabilities
F
license - not found
A
quality - confirmed to work

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aktsmm/FY26_techconnect_saiten'

If you have feedback or need assistance with the MCP directory API, please join our Discord server