Skip to main content
Glama
3-Vector-Search-LLM-Prompting.ipynb37.8 kB
{ "cells": [ { "cell_type": "markdown", "id": "1863a11f-dc83-4101-b137-661e87919f08", "metadata": {}, "source": [ "# 3 - Vector Search and LLM prompting\n", "\n", "This notebook shows how we can query our vector database using plain-text queries.\n", "\n", "**Before starting, I want to emphasize that the patient data we are using is all synthetically generated. The main FHIR resources were created by [Synthea](https://synthetichealth.github.io/synthea/), while the clinical notes being used in this example are created by Copilot. \n", "\n", "As in previous steps, we will first create our iris cursor:\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "13e26fe7-a2f4-4d95-bf86-6be2cfcb4032", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:root:IRISINSTALLDIR or ISC_PACKAGE_INSTALLDIR environment variable must be set\n", "WARNING:root:Embedded Python not available\n", "WARNING:root:Error importing pythonint: No module named 'pythonint'\n", "WARNING:root:Embedded Python not available\n" ] } ], "source": [ "from Utils.get_iris_connection import get_cursor\n", "cursor = get_cursor()" ] }, { "cell_type": "markdown", "id": "8993821d-c6bb-4815-9692-47fabd987de7", "metadata": {}, "source": [ "The database search is performed using functions built into IRIS-SQL. The query, which is given in plain text, is then encoded into a vector using the same sentence transformer model as used for the information in the database. Here we are going to simply query the database about headaches:\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "10e59405-4e1a-43e8-a11e-14c67773787e", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu\n", "INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: all-MiniLM-L6-v2\n" ] } ], "source": [ "from sentence_transformers import SentenceTransformer\n", "model = SentenceTransformer('all-MiniLM-L6-v2') \n", "query = \"Has the patient reported any chest or respiratory complaints?\"\n", "query_vector = model.encode(query, normalize_embeddings=True, show_progress_bar =False).tolist()" ] }, { "cell_type": "markdown", "id": "220badd0-19b8-4d1c-99b8-032396bedc3f", "metadata": {}, "source": [ "There are two ways to measure the similarity between vectors, we can look at the vector cosine or the dot-product. The cosine is the angle between the two vectors, while the dot product is the distance between them. They have slight differences that I won't go into much detail here, but can be explored if you want to optimise your search. In reality, for small searches, the results will be similar whichever method you use. " ] }, { "cell_type": "code", "execution_count": 9, "id": "8e5939c6-eb17-4c74-bee1-290c29b7f666", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--------------------------------------Note Start---------------------------------\n", "Date: 2024-11-21\n", "Provider: Dr. Chin306 Kulas532\n", "Location: Waltham Urgent Care\n", "Reason for Visit: Cough and difficulty breathing\n", "Subjective:\n", "Aurora was brought in with a 3-day history of persistent cough, mild fever, and labored breathing. Parent reports decreased appetite and increased fatigue. No vomiting or diarrhea.\n", "Objective:\n", "\n", "Vitals: BP 116/78 mmHg\n", "Physical Exam:\n", "\n", "Respiratory: Wheezing, prolonged expiratory phase\n", "O2 saturation: Slightly reduced\n", "No cyanosis or retractions\n", "\n", "\n", "\n", "Assessment:\n", "\n", "Acute bronchitis\n", "Respiratory distress – mild\n", "\n", "Plan:\n", "\n", "Performed respiratory function measurement\n", "Supportive care: hydration, rest, antipyretics\n", "Monitor for worsening symptoms\n", "Follow-up with primary care in 3–5 days\n", "--------------------------------------Note End---------------------------------\n", "\n", "\n", "--------------------------------------Note Start---------------------------------\n", "Date: 2025-11-21\n", "Provider: Dr. Chin306 Kulas532\n", "Location: Waltham Urgent Care\n", "Reason for Visit: Respiratory symptoms\n", "Subjective:\n", "Aurora was brought in with a 3-day history of cough, mild fever, and wheezing. Parent reports reduced appetite and fatigue.\n", "Objective:\n", "\n", "Vitals: BP 116/78 mmHg\n", "Physical Exam:\n", "\n", "Respiratory: Wheezing, mild distress\n", "O2 saturation: Slightly reduced\n", "No cyanosis\n", "\n", "\n", "\n", "Assessment:\n", "\n", "Acute bronchitis\n", "Mild respiratory impairment\n", "\n", "Plan:\n", "\n", "Respiratory function measured\n", "Supportive care: hydration, rest, antipyretics\n", "Monitor for worsening symptoms\n", "Follow-up with PCP in 3–5 days\n", "\n", "\n", "--------------------------------------Note End---------------------------------\n", "\n", "\n", "--------------------------------------Note Start---------------------------------\n", "Date: 2024-11-21\n", "Provider: Dr. Chin306 Kulas532\n", "Location: Waltham Urgent Care\n", "Reason for Visit: Persistent cough and respiratory symptoms\n", "Subjective:\n", "Aurora was brought in with a 4-day history of persistent cough, mild fever, and wheezing. Parent reports decreased appetite and increased fatigue. No vomiting or diarrhea.\n", "Objective:\n", "\n", "Vitals: BP 116/78 mmHg\n", "Physical Exam:\n", "\n", "Respiratory: Audible wheezing, mild respiratory distress\n", "O2 saturation: Slightly reduced\n", "No cyanosis or retractions\n", "\n", "\n", "\n", "Assessment:\n", "\n", "Acute bronchitis\n", "Respiratory function impairment – mild\n", "\n", "Plan:\n", "\n", "Performed respiratory function measurement\n", "Supportive care: fluids, rest, antipyretics\n", "Monitor for worsening symptoms\n", "Follow-up with PCP in 3–5 days\n", "--------------------------------------Note End---------------------------------\n", "\n" ] } ], "source": [ "table_name = \"VectorSearch.DocRefVectors\"\n", "search_sql = f\"\"\"\n", " SELECT TOP 3 ClinicalNotes \n", " FROM {table_name}\n", " WHERE PatientID = ?\n", " ORDER BY VECTOR_COSINE(NotesVector, TO_VECTOR(?,double)) DESC\n", " \"\"\"\n", "\n", "cursor.execute(search_sql, [3, str(query_vector)])\n", "results = cursor.fetchall()\n", "\n", "for result in results:\n", " print(\"\\n--------------------------------------Note Start---------------------------------\")\n", " print(result[0])\n", " print(\"--------------------------------------Note End---------------------------------\\n\")" ] }, { "cell_type": "markdown", "id": "f62bdc1c-0268-4526-ac1f-e8cadff5fadf", "metadata": {}, "source": [ "This SQL query selects the top result from a vector search of our table. The results are ordered by the vector cosine of the NotesVector column of the database and the vector of our query - we have to order in descending order to get the best match first. \n", "\n", "Note, we have an additional filter here to only include the results for a particular patient ID which will be provided upon execution. We could provide any number of additional filters, for example each clinical note starts with a date. In the set-up section we could extract this date and save it as a separate column within our data table. Then when querying the database we could add a query to only include a particular date range. \n", "\n", "Below I have put this into an (almost) standalone function. This function includes the global variables model and table_name defined above. " ] }, { "cell_type": "code", "execution_count": 11, "id": "dccb9e00-6dca-4f39-b5b2-737e7dc9007b", "metadata": {}, "outputs": [], "source": [ "def vector_search(user_prompt,patient):\n", " search_vector = model.encode(user_prompt, normalize_embeddings=True,show_progress_bar=False ).tolist() \n", " \n", " search_sql = f\"\"\"\n", " SELECT TOP 3 ClinicalNotes \n", " FROM {table_name}\n", " WHERE PatientID = {patient}\n", " ORDER BY VECTOR_COSINE(NotesVector, TO_VECTOR(?,double)) DESC\n", " \"\"\"\n", " cursor.execute(search_sql,[str(search_vector)])\n", " \n", " results = cursor.fetchall()\n", " return results" ] }, { "cell_type": "code", "execution_count": 13, "id": "c31f49dd-7338-48a5-9039-f6927375ebb5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(('Date: 2024-11-21\\nProvider: Dr. Chin306 Kulas532\\nLocation: Waltham Urgent Care\\nReason for Visit: Cough and difficulty breathing\\nSubjective:\\nAurora was brought in with a 3-day history of persistent cough, mild fever, and labored breathing. Parent reports decreased appetite and increased fatigue. No vomiting or diarrhea.\\nObjective:\\n\\nVitals: BP 116/78 mmHg\\nPhysical Exam:\\n\\nRespiratory: Wheezing, prolonged expiratory phase\\nO2 saturation: Slightly reduced\\nNo cyanosis or retractions\\n\\n\\n\\nAssessment:\\n\\nAcute bronchitis\\nRespiratory distress – mild\\n\\nPlan:\\n\\nPerformed respiratory function measurement\\nSupportive care: hydration, rest, antipyretics\\nMonitor for worsening symptoms\\nFollow-up with primary care in 3–5 days',), ('Date: 2025-11-21\\nProvider: Dr. Chin306 Kulas532\\nLocation: Waltham Urgent Care\\nReason for Visit: Respiratory symptoms\\nSubjective:\\nAurora was brought in with a 3-day history of cough, mild fever, and wheezing. Parent reports reduced appetite and fatigue.\\nObjective:\\n\\nVitals: BP 116/78 mmHg\\nPhysical Exam:\\n\\nRespiratory: Wheezing, mild distress\\nO2 saturation: Slightly reduced\\nNo cyanosis\\n\\n\\n\\nAssessment:\\n\\nAcute bronchitis\\nMild respiratory impairment\\n\\nPlan:\\n\\nRespiratory function measured\\nSupportive care: hydration, rest, antipyretics\\nMonitor for worsening symptoms\\nFollow-up with PCP in 3–5 days\\n\\n',), ('Date: 2024-11-21\\nProvider: Dr. Chin306 Kulas532\\nLocation: Waltham Urgent Care\\nReason for Visit: Persistent cough and respiratory symptoms\\nSubjective:\\nAurora was brought in with a 4-day history of persistent cough, mild fever, and wheezing. Parent reports decreased appetite and increased fatigue. No vomiting or diarrhea.\\nObjective:\\n\\nVitals: BP 116/78 mmHg\\nPhysical Exam:\\n\\nRespiratory: Audible wheezing, mild respiratory distress\\nO2 saturation: Slightly reduced\\nNo cyanosis or retractions\\n\\n\\n\\nAssessment:\\n\\nAcute bronchitis\\nRespiratory function impairment – mild\\n\\nPlan:\\n\\nPerformed respiratory function measurement\\nSupportive care: fluids, rest, antipyretics\\nMonitor for worsening symptoms\\nFollow-up with PCP in 3–5 days',))\n" ] } ], "source": [ "results = vector_search(query, 3)\n", "print(results)" ] }, { "cell_type": "markdown", "id": "a3d0da42-0b1a-45bb-bbfd-52ce3602f8c5", "metadata": {}, "source": [ "### Prompting a Local LLM \n", "\n", "There are a few options when it comes to prompting an LLM. Most LLMs available are accessible through APIs, for example you can use GPT-5 through [OpenAI's API](https://openai.com/api/). For sensitive data, like Patient FHIR data, there might be data protection rules or even laws which would prohibit sending sensitive data to an external API, so this might not be possible. In these cases it is possible to download a local version of models. Local versions can be downloaded with Ollama, or very simply with hugging faces. \n", "\n", "#### Using hugging faces transformer\n", "\n", "This is the easiest way (I have found) to get started with a local LLM. Transformers can be installed using pip and then simply used as shown below.\n", "\n", "I'm using a very small model to make it lighter-weight for use on my laptop, but don't expect very good results using this method unless you are willing to download larger models. For better results using local models, feel free to skip to the Ollama section below, which requires slightly more set-up but is giving me much better results. " ] }, { "cell_type": "code", "execution_count": null, "id": "deaa305c-24b8-49b6-bfa3-87d641299af5", "metadata": {}, "outputs": [], "source": [ "pip install transformers" ] }, { "cell_type": "code", "execution_count": 15, "id": "56f46ce0-4c92-4ed8-8232-5531d09072f7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Device set to use cpu\n" ] } ], "source": [ "from transformers import pipeline\n", "# Create our chatbot using the flan-t5-base model\n", "generator = pipeline(\"text2text-generation\", model=\"google/flan-t5-base\", max_length=512, min_length=100)" ] }, { "cell_type": "code", "execution_count": 17, "id": "60bd9062-5a40-471d-b7a5-1d9821a961ec", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Token indices sequence length is longer than the specified maximum sequence length for this model (687 > 512). Running this sequence through the model will result in indexing errors\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Vector search complete\n", "No fever, ear pain, or irritability noted in past month. No fever, ear pain, or irritability noted in past month. No vomiting or diarrhea. No prior history of ear infections.nObjective:nVitals: Temp 37.8°C, BP 107/80 mmHg, Weight 10.3 kg, Height 75.3 cmnPhysical Exam:nnTympanic membrane: Erythematous and bulging on the right sidenNo discharge notednMild tenderness on palpation of the mastoid processnLungs clear, no respiratory distressnnnAssessment:nnAcute Otitis Media – Right earnPlan:nnPrescribed Amoxicillin 250 mg oral capsule, 1 capsule twice daily for 7 daysnSupportive care: fluids, rest, acetaminophen for fevernFollow-up in 10 days or sooner if symptoms worsennEducated parent on signs of complications\n" ] } ], "source": [ "## Perform Vector Search again\n", "query = \"Has the patient reported having bad headaches?\"\n", "results = vector_search(query, 3)\n", "print(\"Vector search complete\") \n", "\n", "prompt = (\n", " f\"\"\"SYSTEM: You are a helpful and knowledgeable assistant designed to help a doctor interpret a patient's medical history \n", " using retrieved information from a database. Please provide a detailed and medically relevant explanation and include relevant\n", " dates in your response and ensure your response is coherent \\n\\n\\n\"\"\"\n", " f\"CONTEXT:\\n{results[:-1]}\\n\\n\"\n", " f\"USER QUESTION:\\n{query}\\n\\n\"\n", " f\"ASSISTANT ANSWER:\"\n", ")\n", "response = generator(prompt)\n", "print(response[0][\"generated_text\"])" ] }, { "cell_type": "markdown", "id": "5fc9968a-24ac-47a3-a654-477a3bb5839f", "metadata": {}, "source": [ "So the vector search gave an answer which was relevant to the prompt, but ignored some parts of it (notably, I added in the system prompt that it should include dates in its response). This is a limitation of using a small model like I have done. \n", "\n", "### Prompting a better local model with Ollama\n", "\n", "Here I am going to use [Ollama](https://ollama.com/) and in particular the [gemma3:1b](https://ollama.com/library/gemma3) model. To use the code below, download Ollama. After downloading Ollama, download the relevant model, this can be done from the Ollama UI or from the command line using `ollama pull gemma3:1b`.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 18, "id": "dca85072-a026-4621-8b3b-12e20f929e50", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/chat \"HTTP/1.1 200 OK\"\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "==================================Response Start===============================\n", "\n", "Okay, let's analyze the provided information to determine if the patient has reported experiencing headaches.\n", "\n", "**Analysis:**\n", "\n", "The provided data does not explicitly state whether Aurora has reported experiencing headaches. The information focuses on her symptoms – ear pain, fever, irritability – and the physician’s assessment. There’s no mention of headaches in the patient’s subjective or objective findings or treatment plan.\n", "\n", "**Conclusion:**\n", "\n", "Based on the available information, **no, the patient has not reported experiencing headaches.**\n", "\n", "**Disclaimer:** *As a helpful AI assistant, I strive to provide accurate and relevant information. However, this analysis is based solely on the data provided and does not constitute medical advice. A full medical evaluation would be necessary to determine if the patient is experiencing headaches.*\n", "\n", "==================================Response End===============================\n", "\n" ] } ], "source": [ "from ollama import chat\n", "from ollama import ChatResponse\n", "\n", "response: ChatResponse = chat(model='gemma3:1b', messages=[{'role': 'system',\n", " 'content': (\n", " \"You are a helpful and knowledgeable assistant designed to help a doctor interpret a patient's medical history using retrieved information from a database.\"\n", " \"Please provide a detailed and medically relevant explanation, include relevant dates, and ensure your response is coherent.\"\n", " )},\n", " {\n", " 'role': 'user',\n", " 'content': f\"CONTEXT:\\n{results}\\n\\nUSER QUESTION:\\n{query}\"\n", " }])\n", "print(\"\\n==================================Response Start===============================\\n\")\n", "print(response['message']['content'])\n", "print(\"\\n==================================Response End===============================\\n\")" ] }, { "cell_type": "markdown", "id": "16dd6f8d-9d50-4a83-abb0-c3b2a52d1a87", "metadata": {}, "source": [ "Thats already a better response than the basic hugging faces one. This could still get much better, because `gemma3:1b` is a very small model (<1Gb in size, 1 billion parameters). Almost all the other models are much bigger, giving better results, whilst also allowing longer prompts with more search results. You might like to try `deepseek-r1:671b` if you have 404Gb of free space and an extremely powerful computer. Theres also several versions of `gemma3` with increasing numbers of parameters that could give improved results. You can see a list of them on the [ollama github](https://github.com/ollama/ollama#:~:text=ollama%20run%20gemma3-,Model%20library,-Ollama%20supports%20a)." ] }, { "cell_type": "markdown", "id": "3eb5cd8c-5cbd-4d5d-abda-74337455e4c9", "metadata": {}, "source": [ "#### Adding Memory\n", "\n", "When we chat with a chatbot, it's ideal for the model to remember the conversation that has come before. Memory can be implemented by passing the previous messages back to the chatbot with our next queries. Instead though, I am going to use [LangChain](https://python.langchain.com/docs/introduction/) to automate this. LangChain also automatically creates summaries of the conversations rather than passing the whole message into the chatbot, reducing the length of the prompt to the model. This reduction can speed up the response, and reduce the amount of tokens used (and with API chatbot access, you pay per token).\n", "\n", "The first step is to install langchain and langchain community with pip:" ] }, { "cell_type": "code", "execution_count": null, "id": "1d4d7f67-42b4-48ae-b94d-c2a06dc83907", "metadata": {}, "outputs": [], "source": [ "pip install langchain langchain-ollama" ] }, { "cell_type": "code", "execution_count": 25, "id": "8ba35eb0-f781-41e3-8c23-8ca23dd5f6c7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/generate \"HTTP/1.1 200 OK\"\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Hi Gabriel! It’s lovely to meet you. My name is Codex, and I’m a large language model. I was just finishing up processing a massive dataset of historical weather patterns – fascinating stuff! It’s really interesting to see how climate change is impacting long-term trends. Are you interested in learning a little about that, or perhaps something else entirely?\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/generate \"HTTP/1.1 200 OK\"\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "You just told me your name is Gabriel! I’m really good at remembering details, I’ve been trained on a truly enormous amount of text and code. I’ve been tracking our conversation since the very beginning – I have a record of every turn. It’s quite impressive, really. Do you want me to repeat it back to you again, just to be absolutely certain?\n" ] } ], "source": [ "from langchain_ollama import OllamaLLM \n", "from langchain.chains import ConversationChain\n", "from langchain.memory import ConversationBufferMemory\n", "\n", "## Load the gemma3:1b model using LangChains OllamaLMM class\n", "llm=OllamaLLM(model=\"gemma3:4b\")\n", "\n", "# Create a conversation with memory\n", "memory = ConversationBufferMemory()\n", "conversation = ConversationChain(llm=llm, memory=memory)\n", "\n", "# Interact\n", "response = conversation.predict(input=\"Hi, I'm Gabriel.\")\n", "print(response)\n", "\n", "## Test it remembered what I just told it \n", "response = conversation.predict(input=\"What did I just tell you?\")\n", "print(response)" ] }, { "cell_type": "markdown", "id": "63a7b37c-d596-4d5c-8175-ded890356160", "metadata": {}, "source": [ "It remembers me! \n", "\n", "### Putting this together\n", "\n", "Now lets put the whole chat architecture into a single class to make it easy to run. I'm going to upgrade the model for the next step to `gemma:4b` - this is about 4gb (4B parameters). It gives much better responses, but it does come at a cost of an increased runtime, so depending on your processing power and/or patience level, you may wish to change this back. " ] }, { "cell_type": "code", "execution_count": 26, "id": "6f11056d-e0f4-4bab-8842-e5b863810ca6", "metadata": {}, "outputs": [], "source": [ "class RAGChatbot:\n", " def __init__(self):\n", " self.message_count = 0\n", " self.cursor = get_cursor()\n", " self.conversation = self.create_conversation()\n", " self.embedding_model = self.get_embedding_model()\n", " \n", "\n", " \n", " def get_embedding_model(self):\n", " return SentenceTransformer('all-MiniLM-L6-v2') \n", " \n", " def create_conversation(self):\n", " system_prompt = \"You are a helpful and knowledgeable assistant designed to help a doctor interpret a patient's medical history using retrieved information from a database.\\\n", " Please provide a detailed and medically relevant explanation, \\\n", " include the dates of the information you are given.\"\n", " ## instanciate the conversation: \n", " llm=OllamaLLM(model=\"gemma3:4b\", system=system_prompt) \n", " memory = ConversationBufferMemory()\n", " conversation = ConversationChain(llm=llm, memory=memory)\n", " return conversation\n", " \n", " def vector_search(self, user_prompt,patient):\n", " search_vector = self.embedding_model.encode(user_prompt, normalize_embeddings=True, show_progress_bar=False).tolist() \n", " \n", " search_sql = f\"\"\"\n", " SELECT TOP 3 ClinicalNotes \n", " FROM VectorSearch.DocRefVectors\n", " WHERE PatientID = {patient}\n", " ORDER BY VECTOR_COSINE(NotesVector, TO_VECTOR(?,double)) DESC\n", " \"\"\"\n", " self.cursor.execute(search_sql,[str(search_vector)])\n", " \n", " results = self.cursor.fetchall()\n", " return results\n", "\n", " def run(self):\n", " if self.message_count==0:\n", " query = input(\"\\n\\nHi, I'm a chatbot used for searching a patient's medical history. How can I help you today? \\n\\n - User: \")\n", " else:\n", " query = input(\"\\n - User:\")\n", " search = True\n", " if self.message_count != 0:\n", " search_ans = input(\"Search the database? [Y/N - default N]\")\n", " if search_ans.lower() != \"y\":\n", " search = False\n", "\n", " if search:\n", " try:\n", " patient_id = int(input(\"What is the patient ID?\"))\n", " except:\n", " print(\"The patient ID should be an integer\")\n", " return\n", "\n", " results = self.vector_search(query, patient_id)\n", " if results == []:\n", " print(\"No results found, check patient ID\")\n", " return\n", "\n", " prompt = f\"CONTEXT:\\n{results}\\n\\nUSER QUESTION:\\n{query}\"\n", " else:\n", " prompt = f\"USER QUESTION:\\n{query}\"\n", "\n", " ##print(prompt)\n", " response = self.conversation.predict(input=prompt)\n", " \n", " print(\"- Chatbot: \"+ response)\n", " self.message_count += 1\n", "\n" ] }, { "cell_type": "code", "execution_count": 27, "id": "4b7d85c5-9fd5-47e9-8990-e66d3d64883e", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu\n", "INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: all-MiniLM-L6-v2\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Hi, I'm a chatbot used for searching a patient's medical history. How can I help you today? \n", "\n", " - User: Tell me about the patients history of respiratory or chest problems\n", "What is the patient ID? 3\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/generate \"HTTP/1.1 200 OK\"\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "- Chatbot: Okay, let’s delve into Aurora’s history of respiratory or chest problems, based on the records I have.\n", "\n", "Looking at the first visit on 2024-11-21, Aurora presented with a 3-day history of a persistent cough, mild fever, and labored breathing. The notes state that this was a new presentation for her. The physical exam revealed wheezing, a prolonged expiratory phase, and slightly reduced oxygen saturation. The assessment noted acute bronchitis and respiratory distress – mild.\n", "\n", "Then, on 2025-11-21, Aurora returned with a 3-day history of cough, wheezing, and mild fever. Her oxygen saturation was again slightly reduced. The assessment was Acute bronchitis and mild respiratory impairment.\n", "\n", "Finally, on 2024-11-21, she presented with a 4-day history of a persistent cough, mild fever, wheezing, decreased appetite, and increased fatigue. The physical exam showed audible wheezing, mild respiratory distress, slightly reduced oxygen saturation, and no cyanosis or retractions. The assessment was Acute bronchitis and respiratory function impairment – mild.\n", "\n", "So, it appears Aurora has experienced recurrent episodes of respiratory symptoms, primarily involving cough, wheezing, and reduced oxygen saturation over the course of about a year – first starting on 2024-11-21 and then again on 2024-11-21 and 2025-11-21. It’s important to note that these are all related to what has been diagnosed as acute bronchitis.\n", "\n", "Do you want me to elaborate on any specific aspect of these visits, such as the vital signs, the physical exam findings, or the treatment plan?\n" ] } ], "source": [ "# Usage\n", "bot = RAGChatbot()\n", "bot.run()" ] }, { "cell_type": "code", "execution_count": 28, "id": "6f413e98-1227-4a22-a319-5b8239ade745", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " - User: No can you tell me about the scar on her right hand? Is there any records of this leasion? \n", "Search the database? [Y/N - default N] Y\n", "What is the patient ID? 3\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/generate \"HTTP/1.1 200 OK\"\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "- Chatbot: Okay, let’s investigate whether there’s any record of a scar on Aurora’s right hand within these medical records.\n", "\n", "I’ve reviewed the provided records – the 2024-08-06 visit, the 2025-08-06 visit, and the 2025-05-07 visit – and I have no information regarding a scar or any lesion on her right hand. All of the physical examinations described normal findings, with no mention of any abnormalities on her skin or appendages. \n", "\n", "The notes consistently describe a healthy appearance, normal breath sounds, and no rashes or lesions. \n", "\n", "I apologize that I cannot provide you with the information you’re seeking based on these records.\n" ] } ], "source": [ "bot.run()" ] }, { "cell_type": "markdown", "id": "849f7d8e-4dbb-4cdb-b88f-b7a9f68ff9da", "metadata": {}, "source": [ "Ok so the results from this approach are not exactly realible or high quality, but at least it works. I want to take a second here to show the output when I gave a very similar prompt (including search results) to Copilot: \n", "\n", "------------------------ Copilot Response ----------------------------------\n", "\n", "Yes, the patient Aurora has reported respiratory issues on multiple occasions:\n", "1. November 21, 2024\n", " - Symptoms: Persistent cough, mild fever, labored breathing, wheezing, fatigue, and decreased appetite.\n", " - Findings: Mild respiratory distress, prolonged expiratory phase, slightly reduced O₂ saturation.\n", " - Diagnosis: Acute bronchitis and mild respiratory distress.\n", "\n", "2. November 21, 2025\n", " - Symptoms: Cough, mild fever, wheezing, fatigue, and reduced appetite.\n", " - Findings: Wheezing, mild respiratory distress, slightly reduced O₂ saturation.\n", " - Diagnosis: Acute bronchitis and mild respiratory impairment.\n", "\n", "3. Duplicate entry for November 21, 2024 (with slightly varied wording)\n", " - Confirms the same clinical picture: persistent cough, wheezing, mild respiratory distress, and reduced O₂ saturation.\n", "\n", "These visits consistently show acute bronchitis with mild respiratory compromise, suggesting a recurring or seasonal pattern of respiratory symptoms.\n", "\n", "------------------------ Copilot Response ----------------------------------\n", "\n", "So good results are possible, use a better model than I have chosen to here! \n", "\n", "# Conclusions\n", "\n", "In this example, we are using synthetically generated clinical notes created by by a large language model. We have used a very small dataset, with only three patients with only 10 clinical notes each. However, it does show the potential of the process. You can hopefully see how this could be scaled up into a useful resource. \n", "\n", "#### Where to continue from here :\n", "\n", "If this were a real project, rather than a quick demo, there are a number of improvements I could make: \n", "\n", "- #### Vector Search\n", " - **Use a Medical Specific Embedding Model.** \n", " - **Create a Score requirement**\n", " - The relevance of a search result to the prompt can be seen by the vector-cosine with a number between 1 and -1. You can use this to create a minimum requirement to classify a result as a hit.\n", " - **Filter by Date**\n", " - We are querying a patients entire medical history, which means we could get results from 50 years ago which obscure results from 6 months ago due to a slightly better semantic match. In reality we would probably be more interested in recent results, even if they weren't quite as good a match as the older ones.\n", " - We could approach this in different ways - we could just add a date restriction (e.g. search only in last 5 years) or we could take the top N results when ordered by score and then order them by date.\n", "\n", "- #### LLM prompting\n", " - Use a better LLM\n", " - Refine the system prompt\n", "\n", "- #### General set-up or design:\n", "\n", " - **Link source information**\n", " - It is key that the medical practitioner sees the information that the LLM can see, because LLMs do have a habit of making things up... \n", " - **Add a Front-end User Interface**\n", " - **Create more detailed medical history using a range of resource types (e.g. Conditions, Immunizations, Observations, Medications and more)**\n", " - **Improve Patient ID collection** - At the moment the method to give the Patient ID is very clunky and involves knowing the patient by their ID, not by any easily identifiable systems. This could be improved by adding a beginning form or interaction that asks for full name, address or DOB to identify the patient without needing to know the ID>\n", " - **Using an Agentic method to add functionality to the chatbot** - You could look at LangChain or LangGraph documentation to have a better idea of this. \n", "\n", "\n", "On the final point, here I have used DocumentReference resources because this is one of the few plain text resources with clinically relevant notes. Even in this example, I had to generate the clinical notes myself using an LLM. A more complete example of this may involve creating plain-text strings of medical notes using other resources available. For example, you could group FHIR resources by year, take clinically relevant resources [ Condition, Observation, AllergyIntolerance, Procedures, Immunization, CarePlan], create strings representing the clinical information per year and use this for the vector search method we have implemented above. \n", "\n", "A nice example of a complete project with a similar set-up is available on the [Open Exchange](https://openexchange.intersystems.com/package/FHIR-Data-Explorer-with-Hybrid-Search-and-AI-Summaries-1) by Pietro Di Leo. There are some key differences from this demo, for example the tabular data is created within Python and directly loaded into a SQL database. \n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 5 }

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/isc-tdyar/medical-graphrag-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server