We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/okets/folder-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server
remove-topics.md•5.37 kB
# Topics Feature Removal - Complete Documentation
## Overview
**Date**: 2025-01-21
**Objective**: Complete removal of the topics feature from the folder-mcp codebase
**Rationale**: User analysis of the database showed topics were "useless and probably just add to the complexity"
## Summary of Changes
The topics feature has been completely removed from the entire codebase, including:
- Type definitions and interfaces
- Database schema and SQL queries
- REST API endpoints
- Python dependencies (BERTopic)
- Stale Sprint 0 document aggregation system
- Migration files
## Detailed Changes by Category
### 1. Core Type System Changes
#### `src/types/index.ts`
- **Removed**: `topics` field from `SemanticMetadata` interface
- **Kept**: `keyPhrases: SemanticScore[]` and `readabilityScore: number | null`
#### `src/domain/semantic/interfaces.ts`
- **Removed**: `topics` from `SemanticData` interface
- **Removed**: `extractTopics()` method from `ISemanticExtractionService`
- **Removed**: `maxTopics` from `SemanticExtractionOptions`
- **Removed**: `topicSpecificity` from `qualityMetrics`
### 2. Database Schema Updates
#### `src/infrastructure/embeddings/sqlite-vec/schema.ts`
- **Removed**: `topics` column from `CHUNKS_TABLE` definition
- **Updated**: All SQL queries to exclude topic references
- **Modified**: JSON extraction queries to remove topic fields
### 3. Service Layer Cleanup
#### `src/domain/semantic/extraction-service.ts`
- **Deleted**: Entire `extractTopics()` method implementation
- **Removed**: Topic clustering service imports
- **Updated**: `calculateQualityMetrics()` to exclude `topicSpecificity`
#### `src/daemon/services/semantic-metadata-service.ts`
- **Removed**: `topTopics` from `FolderSemanticMetadata`
- **Removed**: `topics` from `DocumentSemanticSummary` and `SectionSemantics`
- **Updated**: SQL queries and aggregation logic
### 4. REST API Updates
#### `src/daemon/rest/types.ts`
- **Removed**: `topics` field from:
- `FolderInfo` interface
- `DocumentData` interface
- `DocumentOutline` interface
### 5. MCP Protocol Updates
#### `src/interfaces/mcp/types.ts`
- **Removed**: `matched_topics` from `semantic_context`
- **Removed**: `topics` from `semantic_metadata`
- **Removed**: `topTopics` from `SubfolderInfo` and `ListFoldersResponse`
#### `src/interfaces/mcp/endpoints.ts`
- **Removed**: `matched_topics` from search result construction
### 6. Python Dependencies
#### `src/infrastructure/embeddings/python/main.py`
- **Removed**: BERTopic import and availability check
- **Cleaned**: Dependency checking code
### 7. Deleted Files (Stale Code)
The following files were completely removed as they were part of the unused Sprint 0 document aggregation system or specifically for topic extraction:
- `src/domain/semantic/algorithms/topic-clustering.ts` - Topic clustering implementation
- `src/domain/content/topic-clustering.ts` - Content-level topic clustering
- `src/domain/semantic/document-aggregator.ts` - Document semantic aggregation
- `src/domain/semantic/onnx-document-enhancer.ts` - ONNX document enhancement
- `src/domain/semantic/document-semantic-service.ts` - Document semantic service
- `src/domain/semantic/python-document-enhancer.ts` - Python document enhancement
- `src/domain/search/semantic-aggregation.ts` - Search aggregation with topics
- `src/infrastructure/storage/semantic-data-provider.ts` - Semantic data provider
- `src/types/document-semantic.ts` - Document semantic types with topics
- `schema/migrations/001-add-document-semantic-fields.sql` - Obsolete migration
### 8. Minor Refactoring
#### Relationship Type Renaming
- **Changed**: `'topic'` to `'thematic'` in relationship types to avoid confusion
- **Files affected**:
- `src/application/serving/index.ts` - Updated `RelatedFileItem` interface
- `src/application/serving/knowledge.ts` - Already had `'thematic'` in implementation
## Technical Impact
### What Remains
- **Key Phrases**: Extraction via KeyBERT (GPU models) or N-gram (ONNX models)
- **Readability Scores**: Coleman-Liau formula calculation
- **Quality Metrics**: Multiword ratio and average words per phrase
### Performance Improvements
- Reduced database storage requirements
- Faster indexing without topic extraction
- Simpler codebase maintenance
- Cleaner API surface
### Breaking Changes
- REST API responses no longer include topic fields
- MCP protocol responses no longer include topic metadata
- Database schema changed (topics column removed)
## Migration Notes
For existing deployments:
1. Existing databases will still function (unused columns are ignored)
2. No data migration required - topics data simply won't be used
3. API clients should be updated to not expect topic fields
## Validation
- ✅ Build successful: `npm run build` completes without errors
- ✅ TypeScript compilation: No type errors related to topics
- ✅ No stale imports: All topic-related imports removed
- ✅ Clean codebase: All unused Sprint 0 code deleted
## Conclusion
The topics feature has been completely removed from the folder-mcp codebase. The system now focuses on:
- **Key phrase extraction** for semantic understanding
- **Readability scoring** for content complexity assessment
- **Vector similarity** for semantic search
This simplification reduces complexity while maintaining all essential semantic search capabilities.