spec.md•14.8 kB
# mcp-live-testing Specification
## Purpose
TBD - created by archiving change implement-comprehensive-live-tests. Update Purpose after archive.
## Requirements
### Requirement: Live Test Infrastructure
The testing framework SHALL provide infrastructure for executing tests against live Open5e and D&D 5e APIs with proper isolation, rate limiting, and performance tracking.
#### Scenario: Live tests are marked and skippable
- **GIVEN** the test suite contains live API tests
- **WHEN** a developer runs `pytest -m "not live"`
- **THEN** all live tests SHALL be skipped
- **AND** unit tests with mocks SHALL execute normally
#### Scenario: Live tests can be run selectively
- **GIVEN** the test suite contains both unit and live tests
- **WHEN** a developer runs `pytest -m live`
- **THEN** only live API tests SHALL execute
- **AND** unit tests SHALL be skipped
#### Scenario: Rate limiting prevents API throttling
- **GIVEN** multiple live tests execute in sequence
- **WHEN** tests make API calls to the same endpoint
- **THEN** the framework SHALL enforce minimum delay between calls
- **AND** no API rate limit errors SHALL occur
#### Scenario: Test database is isolated from production
- **GIVEN** a live test is executing
- **WHEN** the test writes to the cache
- **THEN** a temporary test database SHALL be used
- **AND** production cache SHALL NOT be affected
### Requirement: Spell Lookup Live Validation
The live test suite SHALL validate lookup_spell functionality against real API data including successful queries, filtering, edge cases, cache behavior, and error handling.
#### Scenario: Basic spell lookup by name succeeds
- **GIVEN** a well-known spell exists in the API
- **WHEN** lookup_spell is called with name="Magic Missile"
- **THEN** at least one result SHALL be returned
- **AND** the first result SHALL have name matching "Magic Missile"
- **AND** required fields (level, school, description) SHALL be present
#### Scenario: Spell not found returns empty results
- **GIVEN** a non-existent spell name
- **WHEN** lookup_spell is called with name="NonexistentSpell12345"
- **THEN** an empty list SHALL be returned
- **AND** no exception SHALL be raised
#### Scenario: Spell filtering by level works correctly
- **GIVEN** the API contains cantrips
- **WHEN** lookup_spell is called with level=0
- **THEN** all returned spells SHALL have level equal to 0
- **AND** at least 5 results SHALL be available
#### Scenario: Spell filtering by school works correctly
- **GIVEN** the API contains evocation spells
- **WHEN** lookup_spell is called with school="evocation"
- **THEN** all returned spells SHALL have school equal to "Evocation"
- **AND** at least 3 results SHALL be available
#### Scenario: Combined spell filters work correctly
- **GIVEN** the API contains wizard concentration spells
- **WHEN** lookup_spell is called with class_key="wizard" and concentration=True
- **THEN** all results SHALL be concentration spells
- **AND** all results SHALL be available to wizards
- **AND** at least 5 results SHALL be available
#### Scenario: Spell lookup caches results
- **GIVEN** lookup_spell has not been called for a specific query
- **WHEN** the same query is executed twice in succession
- **THEN** the second call SHALL use cached data
- **AND** the second call SHALL complete in under 50ms
#### Scenario: Spell limit parameter is respected
- **GIVEN** the API contains many spells
- **WHEN** lookup_spell is called with limit=5
- **THEN** exactly 5 or fewer results SHALL be returned
#### Scenario: Invalid spell parameters are handled gracefully
- **GIVEN** an invalid parameter value
- **WHEN** lookup_spell is called with an invalid school name
- **THEN** an empty list SHALL be returned OR an appropriate error SHALL be raised
- **AND** the system SHALL NOT crash
### Requirement: Creature Lookup Live Validation
The live test suite SHALL validate lookup_creature functionality against real API data including CR filtering, type filtering, and size filtering.
#### Scenario: Basic creature lookup by name succeeds
- **GIVEN** a well-known creature exists in the API
- **WHEN** lookup_creature is called with name="Goblin"
- **THEN** at least one result SHALL be returned
- **AND** the result SHALL have name matching "Goblin"
- **AND** required fields (challenge_rating, type, hit_points) SHALL be present
#### Scenario: Creature filtering by CR works correctly
- **GIVEN** the API contains CR 1 creatures
- **WHEN** lookup_creature is called with cr=1
- **THEN** all returned creatures SHALL have challenge_rating equal to "1"
- **AND** at least 3 results SHALL be available
#### Scenario: Creature filtering by CR range works correctly
- **GIVEN** the API contains creatures of various CRs
- **WHEN** lookup_creature is called with cr_min=5 and cr_max=10
- **THEN** all returned creatures SHALL have CR between 5 and 10 inclusive
- **AND** at least 5 results SHALL be available
#### Scenario: Creature filtering by type works correctly
- **GIVEN** the API contains beast creatures
- **WHEN** lookup_creature is called with type="beast"
- **THEN** all returned creatures SHALL have type equal to "beast"
- **AND** at least 5 results SHALL be available
#### Scenario: Creature filtering by size works correctly
- **GIVEN** the API contains Large creatures
- **WHEN** lookup_creature is called with size="Large"
- **THEN** all returned creatures SHALL have size equal to "Large"
- **AND** at least 3 results SHALL be available
#### Scenario: Creature lookup caches results
- **GIVEN** lookup_creature has not been called for a specific query
- **WHEN** the same query is executed twice in succession
- **THEN** the second call SHALL use cached data
- **AND** the second call SHALL complete in under 50ms
### Requirement: Equipment Lookup Live Validation
The live test suite SHALL validate lookup_equipment functionality for weapons, armor, magic items, and general equipment.
#### Scenario: Weapon lookup by name succeeds
- **GIVEN** a well-known weapon exists in the API
- **WHEN** lookup_equipment is called with type="weapon" and name="Longsword"
- **THEN** at least one result SHALL be returned
- **AND** the result SHALL contain weapon properties (damage_dice, damage_type)
#### Scenario: Armor lookup succeeds
- **GIVEN** the API contains armor items
- **WHEN** lookup_equipment is called with type="armor"
- **THEN** at least 5 armor items SHALL be returned
- **AND** each result SHALL contain armor_class information
#### Scenario: Magic item lookup with rarity filter works
- **GIVEN** the API contains rare magic items
- **WHEN** lookup_equipment is called with type="magic-item" and rarity="rare"
- **THEN** all returned items SHALL have rarity equal to "rare"
- **AND** at least 3 results SHALL be available
#### Scenario: Equipment type "all" searches across categories
- **GIVEN** the API contains various equipment types
- **WHEN** lookup_equipment is called with type="all" and name="sword"
- **THEN** results MAY include weapons, magic items, or other equipment
- **AND** all results SHALL have "sword" in the name
#### Scenario: Equipment lookup caches results
- **GIVEN** lookup_equipment has not been called for a specific query
- **WHEN** the same query is executed twice in succession
- **THEN** the second call SHALL use cached data
- **AND** the second call SHALL complete in under 50ms
### Requirement: Character Option Lookup Live Validation
The live test suite SHALL validate lookup_character_option functionality for classes, races, backgrounds, and feats.
#### Scenario: Class lookup succeeds
- **GIVEN** the API contains D&D classes
- **WHEN** lookup_character_option is called with type="class"
- **THEN** at least 12 classes SHALL be returned
- **AND** results SHALL include "Wizard", "Fighter", and "Cleric"
#### Scenario: Race lookup succeeds
- **GIVEN** the API contains D&D races
- **WHEN** lookup_character_option is called with type="race"
- **THEN** at least 9 races SHALL be returned
- **AND** results SHALL include "Human", "Elf", and "Dwarf"
#### Scenario: Background lookup succeeds
- **GIVEN** the API contains D&D backgrounds
- **WHEN** lookup_character_option is called with type="background"
- **THEN** at least 10 backgrounds SHALL be returned
#### Scenario: Feat lookup succeeds
- **GIVEN** the API contains D&D feats
- **WHEN** lookup_character_option is called with type="feat"
- **THEN** at least 20 feats SHALL be returned
#### Scenario: Character option name filtering works
- **GIVEN** the API contains classes
- **WHEN** lookup_character_option is called with type="class" and name="wiz"
- **THEN** at least one result matching "Wizard" SHALL be returned
#### Scenario: Character option lookup caches results
- **GIVEN** lookup_character_option has not been called for a specific query
- **WHEN** the same query is executed twice in succession
- **THEN** the second call SHALL use cached data
- **AND** the second call SHALL complete in under 50ms
### Requirement: Rule Lookup Live Validation
The live test suite SHALL validate lookup_rule functionality for conditions, damage types, skills, ability scores, and magic schools.
#### Scenario: Condition lookup succeeds
- **GIVEN** the API contains D&D conditions
- **WHEN** lookup_rule is called with rule_type="condition"
- **THEN** at least 10 conditions SHALL be returned
- **AND** results SHALL include "Prone", "Grappled", "Blinded"
#### Scenario: Damage type lookup succeeds
- **GIVEN** the API contains D&D damage types
- **WHEN** lookup_rule is called with rule_type="damage-type"
- **THEN** at least 10 damage types SHALL be returned
- **AND** results SHALL include "Fire", "Cold", "Slashing"
#### Scenario: Skill lookup succeeds
- **GIVEN** the API contains D&D skills
- **WHEN** lookup_rule is called with rule_type="skill"
- **THEN** exactly 18 skills SHALL be returned
- **AND** results SHALL include "Perception", "Stealth", "Athletics"
#### Scenario: Ability score lookup succeeds
- **GIVEN** the API contains D&D ability scores
- **WHEN** lookup_rule is called with rule_type="ability-score"
- **THEN** exactly 6 ability scores SHALL be returned
- **AND** results SHALL include "Strength", "Dexterity", "Constitution", "Intelligence", "Wisdom", "Charisma"
#### Scenario: Magic school lookup succeeds
- **GIVEN** the API contains D&D magic schools
- **WHEN** lookup_rule is called with rule_type="magic-school"
- **THEN** exactly 8 magic schools SHALL be returned
- **AND** results SHALL include "Evocation", "Abjuration", "Conjuration"
#### Scenario: Rule name filtering works
- **GIVEN** the API contains conditions
- **WHEN** lookup_rule is called with rule_type="condition" and name="grappl"
- **THEN** at least one result matching "Grappled" SHALL be returned
#### Scenario: Rule lookup caches results
- **GIVEN** lookup_rule has not been called for a specific query
- **WHEN** the same query is executed twice in succession
- **THEN** the second call SHALL use cached data
- **AND** the second call SHALL complete in under 50ms
### Requirement: Cache Behavior Validation
The live test suite SHALL validate that caching works correctly with live API data including cache hits, misses, and TTL enforcement.
#### Scenario: Cache miss triggers API call
- **GIVEN** the cache does not contain data for a query
- **WHEN** a tool function is called
- **THEN** an API request SHALL be made
- **AND** the response SHALL be stored in the cache
- **AND** the response time SHALL be consistent with API latency (>100ms)
#### Scenario: Cache hit avoids API call
- **GIVEN** the cache contains fresh data for a query
- **WHEN** the same tool function is called again
- **THEN** no API request SHALL be made
- **AND** data SHALL be retrieved from cache
- **AND** the response time SHALL be under 50ms
#### Scenario: Different queries use different cache entries
- **GIVEN** multiple different queries are executed
- **WHEN** each query is executed twice
- **THEN** each unique query SHALL have its own cache entry
- **AND** repeated queries SHALL hit their respective caches
### Requirement: Performance Validation
The live test suite SHALL validate that tool functions meet performance targets with live APIs and caching.
#### Scenario: Uncached API call completes within time limit
- **GIVEN** the cache is empty
- **WHEN** any tool function is called
- **THEN** the call SHALL complete within 3 seconds
- **AND** results SHALL be returned
#### Scenario: Cached lookup is fast
- **GIVEN** data is cached
- **WHEN** the same query is repeated
- **THEN** the call SHALL complete within 50ms
- **AND** correct results SHALL be returned
#### Scenario: Parallel requests are handled correctly
- **GIVEN** multiple different queries
- **WHEN** all queries are executed concurrently
- **THEN** all queries SHALL complete successfully
- **AND** total time SHALL be less than sequential execution
### Requirement: Error Handling Validation
The live test suite SHALL validate that tool functions handle errors gracefully including network issues, invalid parameters, and empty results.
#### Scenario: Invalid parameter type is handled gracefully
- **GIVEN** a tool function expects a string parameter
- **WHEN** called with an invalid type (e.g., very large negative number for limit)
- **THEN** the function SHALL either return empty results OR raise appropriate exception
- **AND** the system SHALL NOT crash
#### Scenario: Empty results are handled gracefully
- **GIVEN** a query matches no data
- **WHEN** a tool function is called
- **THEN** an empty list SHALL be returned
- **AND** no exception SHALL be raised
#### Scenario: Network timeout is handled (optional)
- **GIVEN** a simulated network timeout condition
- **WHEN** a tool function is called
- **THEN** the function SHALL retry OR raise a network error
- **AND** the system SHALL recover gracefully
### Requirement: Test Documentation
The project SHALL provide clear documentation for running and maintaining live tests.
#### Scenario: Developer can run live tests
- **GIVEN** a developer wants to validate against live APIs
- **WHEN** they read the testing documentation
- **THEN** instructions for running live tests SHALL be provided
- **AND** expected behavior SHALL be documented
- **AND** troubleshooting tips SHALL be included
#### Scenario: Developer can skip live tests
- **GIVEN** a developer wants to run only fast unit tests
- **WHEN** they read the testing documentation
- **THEN** instructions for skipping live tests SHALL be provided
- **AND** the pytest command SHALL be documented
#### Scenario: Test failures are interpretable
- **GIVEN** a live test fails
- **WHEN** a developer reviews the failure
- **THEN** the error message SHALL clearly indicate the failure reason
- **AND** guidance for resolution SHALL be provided (if applicable)