1-Using-FHIR-SQL-Builder.ipynbā¢18.9 kB
{
"cells": [
{
"cell_type": "markdown",
"id": "761acc89-4fc4-45f7-ba24-5c18256a5b28",
"metadata": {},
"source": [
"# Querying FHIR with SQL \n",
"\n",
"FHIR data can be queried using SQL to create tabular views of the information stored on FHIR servers. However, these tables need to be manually curated to fill the tables with the information you required. For more information, see the relevant pages of the [documentation](https://docs.intersystems.com/irisforhealth20251/csp/docbook/DocBook.UI.Page.cls?KEY=HXFHIRFSB_installation).\n",
"\n",
"Start by going to the management portal at http://localhost:32783/csp/sys/UtilHome.csp \n",
"\n",
"On the left-hand side, Click `Health`, then click `DEMO`, then if its not selected, select `FHIR Server Management` from the dropdown, and click the `Go` button\n",
"\n",
"\n",
"\n",
"\n",
"From here you can see the URL of the FHIR server (red), the number of Resources (pink), and if you click on resources, you can see the number of each resource type. \n",
"\n",
"\n",
"\n",
"For now though, click the link on the FHIRSQL builder on the left hand side (orange in the image above).\n",
"\n",
"\n",
"\n",
"If you are lost at this point, you can always follow this link: http://localhost:32783/csp/fhirsql/index.html#/home . \n",
"\n",
"There are three sections to this - first we create an analysis, this connects to our FHIR server, then we create a transformation sepecification - this defines which bits of data get added to the SQL table projection, then we create a projection based on this transformation specification. \n",
"\n",
"### Creating SQL Analyses\n",
"\n",
"\n",
"Next to Analyses, click the blue `+ New` button and a form will open. \n",
"\n",
"Next to the `FHIR Respository` dropdown click `+ New` and fill in the following details:\n",
"\n",
"- Name: SQLBuilderConfig (Or choose your own name)\n",
"- Host: localhost\n",
"- Port: 52773 // This refers to the docker container port, not your local port, so don't change this\n",
"\n",
"- URL: Leave Blank\n",
"- SSL: Leave Blank\n",
"- Credentials:\n",
" - Click `+ New`, create new credentials profile\n",
" - Username: '_SYSTEM'\n",
" - Password: 'SYS'\n",
"\n",
"\n",
"\n",
"If you've filled this out correctly, when you click the 'FHIR Repository URL' dropdown, an option should appear. Select this option and click save. \n",
"\n",
"When you return to the `New FHIR Analysis` Page enter \"100\" into the `selectivity percentage` (this just allows you analyse a subset of your data, as we have a small dataset, we will leave it as 100%)\n"
]
},
{
"cell_type": "markdown",
"id": "de4d7e55-10ec-42fb-a35d-67e3c604a802",
"metadata": {},
"source": [
"## Creating Transformation Specifications\n",
"\n",
"Click New to create a new transformation specification. Give it a name (If you are making multiple specifications its useful to name them something informative, as I am only using 1 I'm just calling it \"demo\"), and choose to base it from the analysis created in the previous step.\n",
"\n",
"A transformation specification controls how your FHIR resources will be put into a table, i.e. which pieces of information within the FHIR data will be mapped into relational tables. \n",
"\n",
"On the left hand side, there is a list of FHIR resources that can be used. Each resource will include a number of pieces of data/information. \n",
"\n",
"\n",
"\n",
"We can select the information we need from the resources. In this example, we are going to use DocumentReference data, so we want the following datapoints. Note, for each one you can change the column name for the output table. I am not going to bother doing this now, but you may consider doing so. \n",
"For each of the following, make the change given and click 'Update' and/or 'Add to projection' at the bottom right of the right hand panel. \n",
"\n",
" // This is the reference to the patient\n",
" - DocumentReference -> Subject -> Reference\n",
" \n",
" // This is the encoded clinical note\n",
" - DocumentReference -> Content -> Attachment -> Data \n",
"\n",
" // Encounter date\n",
" - DocumentReference -> Date\n",
"\n",
"Also, we might want patient information later, so lets also add some patient details: \n",
" \n",
" - Patient -> names -> family\n",
" - Patient -> names -> given\n",
" - Patient -> identifier -> value\n",
"\n",
"When you are finished, click `Done` at the top right of the screen. \n",
"\n",
"\n",
"### Create Projection \n",
"\n",
"Finally, you have to create a projection based on the specification you have made. This is as simple as picking the FHIR server and transformation specification we just created from the dropdown menus, then give it a name, this will be the base name for all of your tables. I have given the project the name: `VectorSearchApp`. \n",
"\n",
"\n",
"\n",
"Click `Launch Projection` \n"
]
},
{
"cell_type": "markdown",
"id": "469b9620-76c9-47e1-88ca-609e5686d28e",
"metadata": {},
"source": [
"## Querying the database\n",
"\n",
"Now that we have created projections of our data, we can query it through SQL. The easiest method for this is to use the [Management Portal SQL Builder](http://localhost:32783/csp/sys/exp/%25CSP.UI.Portal.SQL.Home.zen). You may need to change the namespace to the DEMO namespace, this can be done from the top of the page: \n",
"\n",
"You can find the tables created in the list on the left hand side - if you've also called it VectorSearchApp it will be right at the bottom because its alphabetically ordered. You can also search for it. Drag the table into the `Execute Query` box to select all the comments.\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "d4889872-92a3-41d6-b7e4-9ec9e434979b",
"metadata": {},
"source": [
"## Querying the database with Python \n",
"\n",
"Now we know that the SQL works, we can query the database with Python. There are several different methods to execute SQL queries from Python, we are going to use the [Python DB-API method](https://docs.intersystems.com/irisforhealth20251/csp/docbook/Doc.View.cls?KEY=BPYNAT_pyapi), as it is the prefered method for using relational querying from a Python Application. This uses the `intersystems-irispython`, which can be installed with pip:\n",
"\n",
" pip install intersystems-irispython==5.1.0 \n",
"\n",
"(Note, I've had some issues with the most recent version of this driver, so I am specifying version 5.1.0 to ensure consistent results.) \n",
"\n",
"To use this method we take the following steps: \n",
"\n",
" 1. Create a connection\n",
" 2. Create a Cursor object\n",
" 3. Create the SQL query (and parameters) \n",
" 4. Execute the query from the cursor object\n",
" 5. Iterate through the results from the cursor object. \n",
" 6. Extract the results into a pandas DataFrame\n",
"\n",
"Here, its important to note that the cursor execution command does not directly return the results, instead the results are stored within the cursor object."
]
},
{
"cell_type": "markdown",
"id": "7b2dc54b-b7e3-4723-943a-e4b1ab7f6189",
"metadata": {},
"source": [
"### Create Connection and Cursor"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "eb45ce9c-e61f-4593-a0ff-a93f1389118c",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:IRISINSTALLDIR or ISC_PACKAGE_INSTALLDIR environment variable must be set\n",
"WARNING:root:Embedded Python not available\n",
"WARNING:root:Error importing pythonint: No module named 'pythonint'\n",
"WARNING:root:Embedded Python not available\n"
]
}
],
"source": [
"## Import IRIS python DB-API Driver\n",
"import iris"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d5ed3d45-b4b4-429d-acb2-8ce5823d03fe",
"metadata": {},
"outputs": [],
"source": [
"## Credentials: \n",
"server_location = \"localhost\"\n",
"port_number = 32782\n",
"namespace = \"DEMO\"\n",
"user_name = \"_SYSTEM\"\n",
"password = \"ISCDEMO\"\n",
"\n",
"## Create a connection\n",
"conn = iris.connect(server_location, port_number, namespace, user_name, password)\n",
"\n",
"## Create a cursor object\n",
"cursor = conn.cursor()"
]
},
{
"cell_type": "markdown",
"id": "840d6ab0-fa0e-4d20-b3c5-86dcf3219cac",
"metadata": {},
"source": [
"### Create SQL query\n",
"\n",
"There are two options here, we can either create a single query string containing all the search parameters, or we can leave out some parameters to be defined at the point of execution, leaving a '?' character as a placeholder. \n",
"\n",
"For example, we could run:\n",
"\n",
" cursor.execute(\"SELECT col1, col2 FROM exampleTable\") \n",
"\n",
"or we could run:\n",
"\n",
" cursor.execute(\"SELECT ?, ? FROM exampleTable\", [\"col1\", \"col2\"])\n",
"\n",
"These options have the same result, but the second option allows different parameters to be passed into the same query. You can even execute the query with a list of parameter lists using the `cursor.executemany()` command. This functionality is particularly useful when inserting new data into tables. \n",
"\n",
"For now though, we are going to execute the query I gave above using the syntax from the first example:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b565757d-95fa-4613-8d94-d9b89d053740",
"metadata": {},
"outputs": [],
"source": [
"sql = \"\"\"SELECT \n",
"DocumentReferenceContentAttachmentData, DocumentReferenceSubjectReference\n",
"FROM VectorSearchApp.DocumentReference\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a6529324-87b5-40de-ac30-ec5a1355851f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-1"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cursor.execute(sql)"
]
},
{
"cell_type": "markdown",
"id": "7459ee1d-a94f-485e-bd8e-cfef30be6289",
"metadata": {},
"source": [
"### Extracting Data\n",
"\n",
"While the result of the query above (-1) may suggest failure, here it actually means the query has been executed successfully. The results can be collected from the cursor object using one of the following commands: \n",
"\n",
"- `cursor.fetchone()` returns the next row of data from the query.\n",
"- `cursor.fetchmany(n)` returns the next n rows of data (where n is an integer). \n",
"- `cursor.fetchall()` returns all the results. \n",
"\n",
"You can find more specific and useful methods in the [DB-API documentation](https://docs.intersystems.com/irisforhealth20251/csp/docbook/Doc.View.cls?KEY=BPYNAT_pyapi)\n",
"\n",
"We are going to use `cursor.fetchall()` and collect it in a pandas DataFrame. \n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "ea27464c-a2f8-48e1-a1c9-ec84803bcb21",
"metadata": {},
"outputs": [],
"source": [
"result_set = cursor.fetchall()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "99126825-3a5e-4d1b-b373-a64570e1bc5b",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"## The result_set doesn't include column names so we will add them ourselvs\n",
"cols = [\"ClinicalNotes\", \"Patient\"] \n",
"\n",
"df = pd.DataFrame(result_set, columns=cols)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "18f819b0-66f7-4938-80c2-7b90d43df9b5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ClinicalNotes</th>\n",
" <th>Patient</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>4f7469746973204d65646961204576616c756174696f6e...</td>\n",
" <td>Patient/3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>446174653a20323032352d30382d30360a50726f766964...</td>\n",
" <td>Patient/3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>466f6c6c6f772d557020666f72204f7469746973204d65...</td>\n",
" <td>Patient/3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>446174653a20323032342d31312d32310a50726f766964...</td>\n",
" <td>Patient/3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>446174653a20323032342d30382d30360a50726f766964...</td>\n",
" <td>Patient/3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ClinicalNotes Patient\n",
"0 4f7469746973204d65646961204576616c756174696f6e... Patient/3\n",
"1 446174653a20323032352d30382d30360a50726f766964... Patient/3\n",
"2 466f6c6c6f772d557020666f72204f7469746973204d65... Patient/3\n",
"3 446174653a20323032342d31312d32310a50726f766964... Patient/3\n",
"4 446174653a20323032342d30382d30360a50726f766964... Patient/3"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "9d36140a-d4f4-4750-b633-1f4e92eddd24",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"51\n"
]
}
],
"source": [
"print(len(df))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "49c0446d-edde-4891-b31d-e41885de94a3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ClinicalNotes</th>\n",
" <th>Patient</th>\n",
" <th>PatientID</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>4f7469746973204d65646961204576616c756174696f6e...</td>\n",
" <td>Patient/3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>446174653a20323032352d30382d30360a50726f766964...</td>\n",
" <td>Patient/3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>466f6c6c6f772d557020666f72204f7469746973204d65...</td>\n",
" <td>Patient/3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>446174653a20323032342d31312d32310a50726f766964...</td>\n",
" <td>Patient/3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>446174653a20323032342d30382d30360a50726f766964...</td>\n",
" <td>Patient/3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ClinicalNotes Patient PatientID\n",
"0 4f7469746973204d65646961204576616c756174696f6e... Patient/3 3\n",
"1 446174653a20323032352d30382d30360a50726f766964... Patient/3 3\n",
"2 466f6c6c6f772d557020666f72204f7469746973204d65... Patient/3 3\n",
"3 446174653a20323032342d31312d32310a50726f766964... Patient/3 3\n",
"4 446174653a20323032342d30382d30360a50726f766964... Patient/3 3"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"PatientID\"] = pd.to_numeric(df[\"Patient\"].astype(str).str.strip(\"Patient/\"))\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "07e76ea5",
"metadata": {},
"source": [
"### Next steps\n",
"\n",
"Now we can access the FHIR resources with SQL calls, we can continue with the main tutorial by [Creating a Vector Database](./2-Creating-Vector-DB.ipynb), or you can create SQL datasets for your own project. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}