memory_bank•13.8 kB
# BIP MCP Server Project - Memory Bank
## I. Project Mission & Core Goal
- **Mission**: To create an AI-powered assistant that integrates with the BIT Information Portal (BIP) to provide users (students, faculty) with intuitive, natural language access to their academic and administrative data.
- **Core Goal**: Enable users to ask questions in plain English and receive accurate, relevant answers sourced directly from BIP APIs, processed and synthesized by LLMs.
## II. Current System Architecture & Components
1. **`packages/mcp_server` (FastAPI Backend)**:
* **Role**: Main application logic, BIP API interaction, LLM orchestration.
* **Key Endpoints**:
* `/bip/session/bip`: Receives and stores BIP session cookies from the Chrome extension. Uses Starlette server-side sessions.
* `/assistant/ask`: Primary endpoint for user queries.
* **Core Services**:
* `core/bip_api_registry.py`: Defines known BIP API endpoints, their descriptions, and data schema hints. Crucial for LLM path selection.
* `core/bip_service.py`: Handles actual HTTP calls to BIP APIs, manages session cookies, pagination, and department data caching. Includes `get_student_details_from_session` for user context.
* `core/llm_service.py`: Contains functions for LLM interactions:
* `determine_api_path_from_query`: Selects the best BIP API path based on user question and API registry.
* `extract_query_type_and_value`: Classifies query intent (specific entity, list by category, general listing, user context dependent) and extracts relevant values/keywords for API parameterization. Includes typo tolerance.
* `get_answer_from_llm`: Generates a natural language answer from fetched BIP data, handles data chunking for large contexts, and incorporates a helpful/witty tone.
* `routes/assistant_routes.py`: Orchestrates the query processing flow for `/assistant/ask`, including rule-based overrides and iterative logic for context-dependent queries (e.g., "Faculties teaching me").
* **Models (`models/bip_models.py`)**: Pydantic models for request/response validation (e.g., `BipSessionData`).
* **Configuration (`config.py`, `.env`)**: Manages settings like `GOOGLE_API_KEY`.
2. **`packages/bip-extension` (Chrome Extension)**:
* **Role**: Captures BIP session cookies (`bip_session`, `XSRF-TOKEN`, `wiki_wiki_UserID`, `wiki_wiki_UserName`, `app_forward_auth`) and sends them to the MCP server.
3. **`apps/web` (React/Vite Frontend)**:
* **Role**: Intended user interface for the assistant.
* **Status**: Placeholder, future development.
4. **`apps/api`**:
* **Role**: Placeholder for potential future standalone API.
* **Status**: Not developed.
## III. Current State of API Integration & Logic
- **Integrated & Handled APIs (in `bip_api_registry.py` and `assistant_routes.py` logic):**
1. `/nova-api/academic-feedbacks`
2. `/nova-api/students`
3. `/nova-api/departments` (with caching)
4. `/nova-api/student-activity-masters`
5. `/nova-api/student-achievement-loggers` (with rule-based override for "my achievements")
6. `/nova-api/academic-course-faculty-mappings` (with iterative logic for "Faculties teaching me")
7. `/nova-api/periodical-statuses`
- **Data Parsing**: `parse_nova_api_response_data` in `assistant_routes.py` is the primary parser, assuming a common Nova resource structure.
- **Query Parameterization**: Logic exists to use `search` and `filters` parameters based on `query_details` from `extract_query_type_and_value`. Default filters are applied for some endpoints.
- **User Context**: `get_student_details_from_session` fetches student's department and semester for contextual queries.
## IV. Strategic Plan for Expanding API Endpoint Integration
This section outlines the strategy for integrating the comprehensive list of additional BIP Nova API endpoints.
### A. Centralized API Registry (`core/bip_api_registry.py`)
- **Action**: For each new API endpoint integrated, a new entry will be added to `BIP_API_ENDPOINTS`.
- **`path`**: The exact API path.
- **`description`**: A clear, concise description detailing the API's purpose, the type of data it provides, and example use cases or question types it can answer. This is **critical** for the Path Selection LLM. Keywords relevant to user queries should be included.
- **`data_schema_hint`**: An optional but highly recommended string summarizing the key fields available in the (parsed) response. This aids both path selection and answer generation LLMs.
- **Design Consideration (Future)**: If the list of APIs becomes excessively long (e.g., 50+), consider:
- Categorizing endpoints within the registry (e.g., "Academics", "People", "Projects").
- A multi-stage path selection process (LLM first selects a category, then an API within that category).
- Using embeddings for semantic search over API descriptions if direct LLM prompting becomes inefficient.
- *Current approach*: For the current list size (~30-40 endpoints), direct listing in the prompt for `determine_api_path_from_query` is expected to be feasible.
### B. Path Selection LLM (`core/llm_service.py -> determine_api_path_from_query`)
- **Prompt Engineering**: The system prompt for this LLM will be dynamically updated to include all registered APIs and their descriptions.
- The prompt emphasizes selecting the *single best path* or "NO_PATH_FOUND".
- It includes instructions to prioritize "my" queries for user-centric data endpoints.
- **Scalability**: Monitor performance and context window limits as more APIs are added.
- **Disambiguation**: Rely on high-quality, distinct API descriptions to minimize ambiguity.
### C. Data Parsing (`routes/assistant_routes.py -> parse_nova_api_response_data` & Potential New Parsers)
- **Action**: For each new API:
1. Obtain a sample JSON response.
2. **Verify Structure**: Determine if it follows the standard Nova resource structure handled by `parse_nova_api_response_data` (top-level `resources` list, items with `id` and `fields` or `attributes`).
3. **Adapt/Extend**:
- If similar: `parse_nova_api_response_data` might be reusable. Ensure it correctly extracts primary `id` and relevant data from `fields` or `attributes`.
- If different:
- Option A (Preferred for distinct structures): Create a new, specific parser function (e.g., `parse_periodical_status_data`).
- Option B (If variations are minor): Enhance `parse_nova_api_response_data` to be more adaptable, possibly by accepting hints.
- **Strategy (Parsing Dispatch)**: Implement or refine a dispatcher mechanism (e.g., a dictionary mapping API paths to parser functions) in `assistant_routes.py` or a dedicated `parsing_service.py`.
```python
# Example:
# API_PARSERS = { "/nova-api/students": parse_nova_api_response_data, ... }
# parser = API_PARSERS.get(selected_path, default_parser_or_error_handler)
# parsed_data = await parser(raw_response)
```
The current `parse_nova_api_response_data` in `assistant_routes.py` is generally robust for Nova's typical "list of resources" structure where each resource has an `id` (either as a direct value or a dict `{"value": ...}`) and data within a `fields` array or an `attributes` object. It also handles cases where a single resource might be returned directly under a `data` key (e.g., when fetching a student by ID).
### D. Answering LLM (`core/llm_service.py -> get_answer_from_llm`)
- **Prompting**:
- The general answering prompt is designed to be somewhat adaptable.
- Specific prompt variations exist for `list_by_department_query` and `specific_entity_details`.
- **Action**: As new data types are introduced, review if the generic prompt is sufficient or if new conditional prompt variations are needed for optimal summarization/answering for that data type.
- Consider passing the `data_schema_hint` from the API registry to the Answering LLM to give it more context about the structure of the `context_data` it's receiving.
- **Data Quality**: The quality of parsed data is paramount.
### E. User Context (Student Details)
- **Strategy**: Continue to use `get_student_details_from_session` to fetch the logged-in student's department ID, department name, and current semester.
- **Application**:
1. **Path Selection**: This context can be appended to the user's query for `determine_api_path_from_query` to improve accuracy for context-sensitive questions. (Not yet implemented, but a planned enhancement).
2. **Data Filtering**: Used in `handle_faculty_for_my_courses_query` to filter course mappings client-side after a broader fetch.
3. **Response Personalization**: The Answering LLM can use this context if provided.
- **Action**: Ensure `get_student_details_from_session` correctly parses the student record structure from `/nova-api/students/{id}` or `/nova-api/students?search=...`. (Recent fixes addressed this).
### F. Modular Design
- Maintain separation of concerns:
- `bip_api_registry.py`: API definitions.
- `bip_service.py`: Raw BIP API interaction, session, caching.
- `llm_service.py`: All direct LLM calls and core prompt logic.
- `assistant_routes.py`: Orchestration, request/response handling, path-specific parameter logic, calling services.
- Potential `parsing_service.py` if parsing logic becomes very complex.
### G. Immediate Focus for Integrating Next API
1. **Select Next API** from the provided list.
2. **Add to `BIP_API_ENDPOINTS`**: Define `path`, `description`, and `data_schema_hint`.
3. **Test Path Selection**: Formulate sample user queries and verify `determine_api_path_from_query` selects the new path. Refine description if needed.
4. **Obtain Sample JSON Response** for the new API.
5. **Implement/Verify Parser**:
- Check if `parse_nova_api_response_data` works.
- If not, implement a new parser or adapt the existing one. Update parsing dispatcher if using one.
6. **Update `assistant_routes.py`**:
- Add the path to the list of general `perPage=150` fetch paths.
- Add logic to apply default filters if necessary (like the `created_at` or periodical status filters).
- Determine if any specific `query_details` (from `extract_query_type_and_value`) can be used to populate `search` or specific `filters` for this new API. Implement this parameterization.
7. **Test Full Flow**: User query -> Path Selection -> Parameterization -> Data Fetch -> Data Parse -> LLM Answer.
8. **Refine Answering LLM Prompt**: If needed for the new data type.
## V. List of BIP Nova API Endpoints for Future Integration (from User)
**Academics:**
- `/nova-api/academic-course-faculty-mappings` (Integrated)
- `/nova-api/academic-feedbacks` (Integrated)
- `/nova-api/periodical-statuses` (Integrated)
**Master Entries (Reference Data):**
- `/nova-api/academic-years`
- `/nova-api/departments` (Integrated)
- `/nova-api/designations`
- `/nova-api/student-statuses`
- `/nova-api/faculty-statuses`
**Mentors:**
- `/nova-api/mentors`
**People:**
- `/nova-api/students` (Integrated)
- `/nova-api/faculties`
**Projects:**
- `/nova-api/student-project-details`
- `/nova-api/student-project-registrations`
- `/nova-api/student-project-implementation-details`
**SSIG (Student Special Interest Groups):**
- `/nova-api/ssigs`
**Special Labs:**
- `/nova-api/special-labs`
- `/nova-api/special-labs-details`
**Student Achievements (Broader Category):**
- `/nova-api/student-activity-masters` (Integrated - General event master list)
- `/nova-api/student-achievement-loggers` (Integrated - Student's logged participations)
- `/nova-api/student-paper-presentation-reports`
- `/nova-api/student-project-presentation-reports`
- `/nova-api/student-project-outcomes`
- `/nova-api/student-technical-competition-reports`
- `/nova-api/mba-student-technical-competitions`
- `/nova-api/student-patent-trackers`
- `/nova-api/student-patent-reports`
- `/nova-api/industries` (Likely related to internships/placements)
- `/nova-api/student-internships`
- `/nova-api/internships-trackers`
- `/nova-api/internships-reports`
**Student Action Plans:**
- `/nova-api/student-action-plan-internships`
- `/nova-api/student-action-plan-online-courses`
- `/nova-api/student-action-plan-paper-presentations`
- `/nova-api/student-action-plan-patents`
- `/nova-api/student-action-plan-products`
- `/nova-api/student-action-plan-project-presentations`
- `/nova-api/student-action-plan-competitions`
**Student Declarations:**
- `/nova-api/student-declarations`
**Technical Approval Committee (TAC):**
- `/nova-api/student-tacs`
- `/nova-api/student-tac-review-appoinments`
- `/nova-api/tac-internship-projects`
**Users:**
- `/nova-api/users` (General user information, potentially including roles)
## VI. Current Known Issues / Areas for Immediate Refinement (Paused)
- The iterative reasoning for "Faculties teaching me" is implemented but relies on accurate student detail parsing and potentially complex filter construction for `/nova-api/academic-course-faculty-mappings`. The Python-side filtering is a fallback.
- Effectiveness of the `search` parameter versus structured `filters` for some endpoints (e.g., `/nova-api/student-activity-masters`) needs ongoing evaluation.
- Robustness of `extract_query_type_and_value` for diverse phrasings and typos, especially for non-entity-specific keyword extraction.
This memory bank update reflects the new strategic direction and the comprehensive list of APIs.