# Feature Specification: HackerNews MCP Server
**Feature Branch**: `001-hackernews-mcp-server`
**Created**: 2025-10-12
**Status**: Draft
**Input**: User description: "Create an HackerNews MCP server for interacting with HackerNews API at https://hn.algolia.com/api. Users should be able to search posts, find latest posts, find posts on the front page, and all other capabilities exposed by the api."
## User Scenarios & Testing *(mandatory)*
### User Story 1 - Search Stories and Comments (Priority: P1) 🎯 MVP
Users need to search Hacker News for specific topics, keywords, or discussions to find relevant content quickly.
**Why this priority**: Core search functionality is the primary use case for interacting with HN content programmatically.
**Independent Test**: Can be fully tested by searching for a keyword (e.g., "AI") and receiving relevant stories/comments, delivering immediate search value.
**Acceptance Scenarios**:
1. **Given** a user wants to find stories about a topic, **When** they search with query "machine learning" and tag "story", **Then** they receive a list of relevant stories sorted by relevance, including title, URL, author, points, and number of comments.
2. **Given** a user wants to find comments about a topic, **When** they search with query "React" and tag "comment", **Then** they receive a list of relevant comments with text, author, and parent story information.
3. **Given** a user wants to filter by URL, **When** they search restricting to URL field with query "github.com", **Then** they receive only stories linking to GitHub.
4. **Given** a user wants recent content, **When** they search by date with query "JavaScript", **Then** they receive results sorted by date (most recent first).
---
### User Story 2 - Get Front Page and Latest Stories (Priority: P2)
Users need to access current front page stories and latest submissions to stay updated with trending HN content.
**Why this priority**: Accessing trending and recent content is a common workflow for staying current with HN discussions.
**Independent Test**: Can be tested independently by retrieving front page stories and verifying they match current HN homepage.
**Acceptance Scenarios**:
1. **Given** a user wants to see trending stories, **When** they request front page stories, **Then** they receive the list of stories currently on the HN front page with all metadata (title, URL, author, points, comments).
2. **Given** a user wants latest submissions, **When** they request latest stories sorted by date, **Then** they receive the most recently posted stories.
3. **Given** a user wants latest Ask HN posts, **When** they filter for "ask_hn" tag sorted by date, **Then** they receive recent Ask HN posts.
4. **Given** a user wants latest Show HN posts, **When** they filter for "show_hn" tag sorted by date, **Then** they receive recent Show HN posts.
---
### User Story 3 - Retrieve Specific Items and User Profiles (Priority: P3)
Users need to fetch complete information about specific stories, comments, polls, or user profiles for detailed analysis.
**Why this priority**: Detailed item and user data retrieval enables deeper analysis and follow-up on specific content.
**Independent Test**: Can be tested by fetching a specific story ID and user profile, verifying complete data structure.
**Acceptance Scenarios**:
1. **Given** a user has a story ID, **When** they request that item, **Then** they receive complete story details including all nested comments (children).
2. **Given** a user wants to see user information, **When** they request a username, **Then** they receive the user's profile including karma, about text, and account creation date.
3. **Given** a user wants to see a comment thread, **When** they request a comment ID, **Then** they receive the comment with all reply chains.
4. **Given** a user wants poll results, **When** they request a poll item, **Then** they receive the poll question and all poll options with scores.
---
### User Story 4 - Advanced Filtering and Pagination (Priority: P4)
Users need to filter stories by time ranges, points, comment counts, and paginate through large result sets efficiently.
**Why this priority**: Advanced filtering enables power users to find precisely the content they need from HN's vast archive.
**Independent Test**: Can be tested by applying numeric filters (e.g., stories with >100 points from last week) and verifying results match criteria.
**Acceptance Scenarios**:
1. **Given** a user wants popular stories, **When** they filter for stories with points >= 100, **Then** they receive only highly-voted stories.
2. **Given** a user wants recent discussion, **When** they filter for stories with num_comments >= 50 created in the last 7 days, **Then** they receive stories with active discussions from the past week.
3. **Given** a search returns many results, **When** they specify page number, **Then** they receive the requested page of results with pagination metadata (nbPages, hitsPerPage).
4. **Given** a user wants more results per page, **When** they specify hitsPerPage=50, **Then** they receive 50 results instead of the default 20.
5. **Given** a user wants stories by specific author in date range, **When** they combine author tag and timestamp filters, **Then** they receive filtered results matching all criteria.
---
### Edge Cases
- What happens when a story/comment/user ID doesn't exist? → Return clear error message indicating item not found.
- How does the system handle rate limiting (10,000 requests/hour/IP)? → Track requests, provide clear rate limit exceeded messages with retry timing.
- What if search query returns zero results? → Return empty results array with helpful message suggesting query refinement.
- How to handle malformed search parameters? → Validate parameters, return clear error messages indicating which parameter is invalid and expected format.
- What if API is temporarily unavailable? → Implement retry logic with exponential backoff, provide timeout errors after reasonable attempts.
- How to handle very old items with potentially missing fields? → Handle optional fields gracefully, never fail on missing non-critical data.
## Requirements *(mandatory)*
### Functional Requirements
- **FR-001**: System MUST provide search capability for stories with query text and tag filtering
- **FR-002**: System MUST provide search capability for comments with query text and parent story context
- **FR-003**: System MUST support retrieving front page stories using the "front_page" tag
- **FR-004**: System MUST support retrieving latest stories sorted by date
- **FR-005**: System MUST support filtering by content type (story, comment, poll, pollopt, show_hn, ask_hn)
- **FR-006**: System MUST provide ability to retrieve specific items by ID with full nested comment trees
- **FR-007**: System MUST provide ability to retrieve user profiles by username
- **FR-008**: System MUST support numeric filtering on created_at_i (timestamp), points, and num_comments
- **FR-009**: System MUST support pagination with configurable page size (hitsPerPage) and page number
- **FR-010**: System MUST support combining multiple tags with AND/OR logic (e.g., author_pg,(story,poll))
- **FR-011**: System MUST support restricting search to specific attributes (e.g., URL-only search)
- **FR-012**: System MUST support searching by author using author_:USERNAME tag format
- **FR-013**: System MUST support searching comments for specific story using story_:ID tag format
- **FR-014**: System MUST support time range filtering using numeric filters on created_at_i
- **FR-015**: System MUST respect HN Algolia API rate limit (10,000 requests/hour per IP)
- **FR-016**: System MUST return result metadata including nbHits, nbPages, hitsPerPage, processingTimeMS
- **FR-017**: System MUST handle API errors gracefully with informative error messages
- **FR-018**: System MUST provide search by relevance (default) and search by date endpoints
### Non-Functional Requirements (Constitution-Driven)
*These requirements ensure constitution compliance:*
- **NFR-001 (Code Quality)**: All code MUST pass Biome linting with zero warnings
- **NFR-002 (Code Quality)**: All code MUST have complete type annotations
- **NFR-003 (Testing)**: All features MUST achieve ≥80% code coverage
- **NFR-004 (Testing)**: All MCP tools MUST have integration tests against live HN API
- **NFR-005 (UX Consistency)**: Error messages MUST be human-readable and actionable
- **NFR-006 (Dependencies)**: All external libraries MUST be researched via Context7 MCP before use
- **NFR-007 (Dependencies)**: All dependencies MUST use latest stable versions
- **NFR-008 (Documentation)**: All MCP tools MUST have usage examples
- **NFR-009 (Performance)**: MCP tool responses MUST complete in <2s (95th percentile)
- **NFR-010 (Observability)**: All operations MUST use structured logging with correlation IDs
- **NFR-011 (Code Reuse)**: Existing HN API client packages MUST be evaluated before custom HTTP implementation
### Key Entities
- **Story**: HN story/post with title, URL, author, points, text (for Ask HN), comment count, timestamps, and nested comments
- **Comment**: User comment with text, author, parent (story or comment), points, timestamps, and nested replies
- **Poll**: HN poll with question, options, votes, similar to stories but with poll-specific data
- **PollOption**: Individual option within a poll with text and vote count
- **User**: HN user profile with username, karma, about text, creation timestamp
- **SearchResult**: Container for search hits with query metadata (pagination, hit counts, processing time)
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-001**: Users can find relevant stories for any topic in under 2 seconds (95th percentile)
- **SC-002**: Users can retrieve any story with full comment tree in under 3 seconds
- **SC-003**: System handles 1000+ requests per hour without degradation
- **SC-004**: Search results match HN web interface accuracy (same content returned)
- **SC-005**: 95% of queries return results on first attempt without errors
- **SC-006**: Users can paginate through 100+ pages of results without failures
- **SC-007**: Rate limit tracking prevents API blacklisting (100% compliance)
- **SC-008**: Error messages enable users to correct invalid queries without external documentation
- **SC-009**: All API endpoints respond within rate limit constraints (10,000 req/hr)
- **SC-010**: Users can retrieve front page stories that match current HN homepage within 1-minute freshness
## Assumptions
- API endpoint at https://hn.algolia.com/api will remain stable and backwards compatible
- API rate limit of 10,000 requests/hour/IP is sufficient for typical MCP server usage patterns
- Users have basic familiarity with HN content types (stories, Ask HN, Show HN, comments)
- MCP server will be used primarily for read operations (search/retrieval), not write operations
- Default pagination of 20 items per page is reasonable for most use cases
- HN Algolia API search relevance algorithm meets user expectations
- Network latency to HN API is typical for public APIs (<500ms)
- Users can handle JSON response structures or will use MCP tools that format output appropriately
## Dependencies
- HN Algolia API at https://hn.algolia.com/api (external, public, free)
- Model Context Protocol (MCP) framework for tool definitions
- HTTP client library for API requests (to be selected based on implementation language)
- Rate limiting/tracking mechanism (implementation-specific)