# Data Model: HackerNews MCP Server
**Date**: October 30, 2025
**Purpose**: Define entities, relationships, validation rules, and state transitions
## Entity Definitions
### 1. HackerNewsPost
Represents any item from HackerNews (story, comment, poll, etc.)
**Fields**:
- `objectID`: string (unique identifier, e.g., "123456")
- `created_at`: string (ISO 8601 timestamp)
- `created_at_i`: number (Unix timestamp in seconds)
- `title`: string | null (story title, null for comments)
- `url`: string | null (story URL, null for Ask HN, comments, polls)
- `author`: string (username of post author)
- `points`: number | null (story points, null for comments)
- `story_id`: number | null (parent story ID for comments)
- `parent_id`: number | null (parent comment ID for nested comments)
- `comment_text`: string | null (comment content as HTML)
- `story_text`: string | null (Ask HN or Show HN text content)
- `num_comments`: number | null (comment count, null for comments themselves)
- `_tags`: string[] (content type tags: "story", "comment", "poll", etc.)
- `children`: HackerNewsPost[] (nested comments, for recursive tree structure)
**Validation Rules**:
```typescript
import { z } from "zod";
export const HackerNewsPostSchema = z.object({
objectID: z.string().min(1),
created_at: z.string().datetime(),
created_at_i: z.number().int().positive(),
title: z.string().nullable(),
url: z.string().url().nullable(),
author: z.string().min(1),
points: z.number().int().nonnegative().nullable(),
story_id: z.number().int().positive().nullable(),
parent_id: z.number().int().positive().nullable(),
comment_text: z.string().nullable(),
story_text: z.string().nullable(),
num_comments: z.number().int().nonnegative().nullable(),
_tags: z.array(z.string()),
children: z.array(z.lazy(() => HackerNewsPostSchema)).optional().default([])
});
export type HackerNewsPost = z.infer<typeof HackerNewsPostSchema>;
```
**Relationships**:
- Comment → Story: `story_id` references parent story's `objectID`
- Comment → Comment: `parent_id` references parent comment's `objectID`
- Recursive: `children` contains nested comment tree
**State Transitions**: N/A (read-only, no state changes)
---
### 2. HackerNewsUser
Represents a HackerNews user profile
**Fields**:
- `username`: string (unique username)
- `created_at`: string (ISO 8601 timestamp)
- `created_at_i`: number (Unix timestamp in seconds)
- `karma`: number (user's karma points)
- `about`: string | null (user bio/about text, may contain HTML)
**Validation Rules**:
```typescript
export const HackerNewsUserSchema = z.object({
username: z.string().min(1).max(15), // HN usernames max 15 chars
created_at: z.string().datetime(),
created_at_i: z.number().int().positive(),
karma: z.number().int().nonnegative(),
about: z.string().nullable()
});
export type HackerNewsUser = z.infer<typeof HackerNewsUserSchema>;
```
**Relationships**: N/A (standalone entity)
**State Transitions**: N/A (read-only, no state changes)
---
### 3. SearchQuery
Represents a search request with filters
**Fields**:
- `query`: string (search terms, can be empty for filter-only search)
- `tags`: string[] (content type filters)
- `numericFilters`: string[] (numeric condition filters)
- `page`: number (pagination page, 0-indexed)
- `hitsPerPage`: number (results per page, 1-1000)
**Validation Rules**:
```typescript
export const SearchQuerySchema = z.object({
query: z.string().max(1000).optional().default(""), // Reasonable search term limit
tags: z.array(z.enum([
"story",
"comment",
"poll",
"pollopt",
"show_hn",
"ask_hn",
"front_page",
"author_*", // Will be formatted as author_username
"story_*" // Will be formatted as story_id
])).optional().default([]),
numericFilters: z.array(z.string().regex(
/^(created_at_i|points|num_comments)(>|>=|<|<=|=)\d+$/,
"Must be format: field operator value (e.g., points>100)"
)).optional().default([]),
page: z.number().int().nonnegative().max(999).optional().default(0), // HN API max ~1000 pages
hitsPerPage: z.number().int().min(1).max(1000).optional().default(20)
});
export type SearchQuery = z.infer<typeof SearchQuerySchema>;
```
**Relationships**: N/A (request parameters)
**State Transitions**: N/A (immutable request)
---
### 4. SearchResult
Represents a paginated search response
**Fields**:
- `hits`: HackerNewsPost[] (matching posts)
- `nbHits`: number (total matching results across all pages)
- `page`: number (current page, 0-indexed)
- `nbPages`: number (total pages available)
- `hitsPerPage`: number (results per page)
- `processingTimeMS`: number (API processing time in milliseconds)
**Validation Rules**:
```typescript
export const SearchResultSchema = z.object({
hits: z.array(HackerNewsPostSchema),
nbHits: z.number().int().nonnegative(),
page: z.number().int().nonnegative(),
nbPages: z.number().int().nonnegative(),
hitsPerPage: z.number().int().positive(),
processingTimeMS: z.number().nonnegative()
});
export type SearchResult = z.infer<typeof SearchResultSchema>;
```
**Relationships**:
- Contains array of `HackerNewsPost` entities in `hits`
**State Transitions**: N/A (read-only response)
---
## MCP Tool Input/Output Schemas
### Tool: search_posts
**Input**:
```typescript
export const SearchPostsInputSchema = z.object({
query: z.string().optional().describe("Search terms (keywords to find in posts)"),
tags: z.array(z.enum([
"story", "comment", "poll", "pollopt", "show_hn", "ask_hn", "front_page"
])).optional().describe("Filter by content type (can specify multiple)"),
author: z.string().optional().describe("Filter by author username"),
storyId: z.number().int().positive().optional().describe("Filter by story ID (for comments)"),
minPoints: z.number().int().nonnegative().optional().describe("Minimum points threshold"),
maxPoints: z.number().int().nonnegative().optional().describe("Maximum points threshold"),
minComments: z.number().int().nonnegative().optional().describe("Minimum comment count"),
maxComments: z.number().int().nonnegative().optional().describe("Maximum comment count"),
dateAfter: z.string().datetime().optional().describe("Filter posts after this date (ISO 8601)"),
dateBefore: z.string().datetime().optional().describe("Filter posts before this date (ISO 8601)"),
sortByDate: z.boolean().optional().default(false).describe("Sort by date (newest first) vs relevance"),
page: z.number().int().nonnegative().optional().default(0).describe("Page number (0-indexed)"),
hitsPerPage: z.number().int().min(1).max(100).optional().default(20).describe("Results per page (max 100)")
});
export type SearchPostsInput = z.infer<typeof SearchPostsInputSchema>;
```
**Output**:
```typescript
{
content: [{
type: "text",
text: JSON.stringify({
results: HackerNewsPost[],
pagination: {
totalResults: number,
currentPage: number,
totalPages: number,
resultsPerPage: number
},
processingTimeMs: number
}, null, 2)
}]
}
```
---
### Tool: get_front_page
**Input**:
```typescript
export const GetFrontPageInputSchema = z.object({
page: z.number().int().nonnegative().optional().default(0).describe("Page number (0-indexed)"),
hitsPerPage: z.number().int().min(1).max(30).optional().default(30).describe("Results per page (front page typically shows 30)")
});
export type GetFrontPageInput = z.infer<typeof GetFrontPageInputSchema>;
```
**Output**: Same as search_posts (returns HackerNewsPost array with pagination)
---
### Tool: get_post
**Input**:
```typescript
export const GetPostInputSchema = z.object({
postId: z.string().min(1).describe("HackerNews post ID (objectID)")
});
export type GetPostInput = z.infer<typeof GetPostInputSchema>;
```
**Output**:
```typescript
{
content: [{
type: "text",
text: JSON.stringify({
post: HackerNewsPost, // Includes full comment tree in children
commentCount: number,
nestedLevels: number
}, null, 2)
}]
}
```
---
### Tool: get_user
**Input**:
```typescript
export const GetUserInputSchema = z.object({
username: z.string().min(1).max(15).describe("HackerNews username")
});
export type GetUserInput = z.infer<typeof GetUserInputSchema>;
```
**Output**:
```typescript
{
content: [{
type: "text",
text: JSON.stringify({
user: HackerNewsUser,
accountAgeYears: number,
avgKarmaPerYear: number
}, null, 2)
}]
}
```
---
## Error Response Schema
**All tools return this format on error**:
```typescript
export const ErrorResponseSchema = z.object({
error: z.string().describe("Human-readable error message"),
type: z.enum([
"validation_error", // Invalid input parameters
"not_found", // Post/user doesn't exist
"api_error", // HN API failure
"rate_limit", // Rate limit exceeded
"unknown" // Unexpected error
]).describe("Error category"),
details: z.record(z.any()).optional().describe("Additional context")
});
export type ErrorResponse = z.infer<typeof ErrorResponseSchema>;
```
**MCP Format**:
```typescript
{
content: [{
type: "text",
text: JSON.stringify(ErrorResponse, null, 2)
}],
isError: true
}
```
---
## API Client Interface
**Service layer abstraction**:
```typescript
export interface HNApiClient {
/**
* Search for posts with filters
* @throws ApiError on HTTP failures
* @throws RateLimitError when limit exceeded
*/
search(query: SearchQuery): Promise<SearchResult>;
/**
* Get front page posts (special search with front_page tag)
* @throws ApiError on HTTP failures
* @throws RateLimitError when limit exceeded
*/
getFrontPage(page: number, hitsPerPage: number): Promise<SearchResult>;
/**
* Get specific post by ID
* @throws ApiError on HTTP failures
* @throws NotFoundError if post doesn't exist
* @throws RateLimitError when limit exceeded
*/
getPost(postId: string): Promise<HackerNewsPost>;
/**
* Get user profile by username
* @throws ApiError on HTTP failures
* @throws NotFoundError if user doesn't exist
* @throws RateLimitError when limit exceeded
*/
getUser(username: string): Promise<HackerNewsUser>;
}
```
---
## Rate Limiter Interface
```typescript
export interface RateLimiter {
/**
* Check if request can proceed
* @throws RateLimitError if limit exceeded
*/
checkLimit(): Promise<void>;
/**
* Get current usage statistics
*/
getStats(): {
remainingTokens: number;
maxTokens: number;
percentUsed: number;
resetsAt: Date;
};
/**
* Get warning if approaching limit
*/
getWarning(): string | null; // Returns warning message if >80% used
}
```
---
## Validation Rules Summary
| Field | Constraints | Rationale |
|-------|------------|-----------|
| `objectID` | Non-empty string | HN IDs are always present and unique |
| `username` | 1-15 chars | HN username length limits |
| `created_at` | Valid ISO 8601 datetime | API returns standard format |
| `created_at_i` | Positive integer | Unix timestamps are positive |
| `karma` | Non-negative integer | Karma can't be negative |
| `points` | Non-negative integer (nullable) | Points can't be negative, null for comments |
| `num_comments` | Non-negative integer (nullable) | Comment count can't be negative, null for comments |
| `query` | Max 1000 chars | Reasonable search term limit |
| `page` | 0-999 | HN API pagination limits |
| `hitsPerPage` | 1-1000 (tool: 1-100) | API max 1000, tools limited to 100 for UX |
| `tags` | Enum of valid types | Prevent invalid tag values |
| `numericFilters` | Regex pattern | Ensure valid filter syntax |
---
## Derived Fields / Computed Values
**In get_user response**:
- `accountAgeYears`: Calculated from `created_at_i` to current time
- `avgKarmaPerYear`: `karma / accountAgeYears` (handle division by zero)
**In get_post response**:
- `commentCount`: Total comments in tree (recursive count of children)
- `nestedLevels`: Maximum depth of comment tree
**These are computed at response time, not stored**
---
## Next Steps
Proceed to contracts/ and quickstart.md generation.