Building Private Healthcare AI Assistant for Clinics Using Qdrant Hybrid Cloud (JWT-RBAC), DSPy and…

本文由简悦 SimpRead 转码，原文地址 readmedium.com

Introduction: Security in AI Healthcare

With the latest advancements in AI, we often overlook the security aspect while thinking of solutions. For example, suppose you find a new tool that can simplify your chatbot pipeline but the catch is it’s not secure and your organization does not approve its usage.

In my experience of working with a US-based healthcare company, I have observed how seriously we take data security because of the Health Insurance Portability and Accountability Act (HIPAA) which aims to protect Protected Health Information (PHI) like account numbers, addresses, phone numbers, social security numbers, and so on. Generally speaking, any business that serves users should definitely adhere to privacy rules and take measures to prevent data leak.

And with the increase in AI solutions in healthcare, especially in the domain of NLP, AI security is bound to grow as well.

https://my.di.cloudns.asia/wp-content/webpc-passthru.php?src=https://www.unite.ai/wp-content/uploads/2023/08/ai-crucial-for-healthcare-cybersecurity-feature.jpg&nocache=1

As an experiment, we will build a Private AI Assistant for clinics and hospitals which fetches patient data and answers questions on top of that data.

But, before proceeding, let’s take a look at the flow diagram:

https://github.com/sachink1729/AI-Assistant-Clinics-Medical-Data-Qdrant-Dspy-Groq

Briefly:

Dataset: We will be working on a healthcare dataset that contains the patient’s data, including details about name, illness, medication, bills, hospital name, etc. One thing to be noted is, datasets like these are rarely available online; this dataset also is not real and is generated digitally: originally, it’s a multi-label classification dataset and can be downloaded from Kaggle here: 🩺Healthcare Dataset 🧪
DSPy: DSPy (or Declarative Sequencing Python framework) is a game-changing framework for algorithmically optimizing LM prompts instead of manual prompting. I have covered it in detail in one of my blogs; take a look: Prompt Like a Pro Using DSPy: A guide to build a better local RAG model using DSPy, Qdrant and Ollama | by Sachin Khandewal | Mar, 2024
Qdrant Managed Cloud: Qdrant is a lightweight vector database that recently started their managed cloud services, which let you use a free cluster for trial and the option to upgrade as you use more features. We will use it to store our dataset in the form of vectors.
Groq: Groq is building an AI accelerator application-specific integrated circuit (ASIC) which they call the Language Processing Unit (LPU) and related hardware to accelerate the inference performance of AI workloads. They provide access to latest models like Llama3 free of cost (it’s limited), but it’s enough for our use case.

Now let’s set up Qdrant cloud.

Go to https://cloud.qdrant.io/login, create a new account, and proceed.
After signing in, on the left hand side you’ll see clusters; click on that — you can select the free or the paid version based on your needs.
After successful creation, you’ll see your cluster on the dashboard:

The next thing is creating an API Key for the usage; click on the API key below Cluster0 and copy it for future use.

You will also see the Python code snippet to access this container and create collections; but we will explore that later.

Before environment setup, let us make sure you can access Groq:

Go to https://groq.com/ to sign up.
After that, go to https://console.groq.com/keys to create or manage API keys; copy the API key and keep it with you.

Environment Setup:

Install the required packages using:

pip install qdrant-client groq sentence-transformers dspy-ai fastembed gradio --upgrade

Before coding, make sure you download the Kaggle dataset from this link: https://www.kaggle.com/datasets/prasad22/healthcare-dataset

Your setup is done!

Let’s explore the dataset and preprocess it according to our use case. You will see that the dataset is in tabular format and so we need to format it accordingly. Let’s follow the steps below:

import pandas as pd
df = pd.read_csv("healthcare_dataset.csv")

If you take a close look, you will see that this contains a lot of information which we can use!

Let’s format these rows into snippets of text.

def format_row(row):
    return (
        f"Name: {row['Name']}, Age: {row['Age']}, Gender: {row['Gender']}, "
        f"Blood Type: {row['Blood Type']}, Medical Condition: {row['Medical Condition']}, "
        f"Date of Admission: {row['Date of Admission']}, Doctor: {row['Doctor']}, "
        f"Hospital: {row['Hospital']}, Insurance Provider: {row['Insurance Provider']}, "
        f"Billing Amount: {row['Billing Amount']}, Room Number: {row['Room Number']}, "
        f"Admission Type: {row['Admission Type']}, Discharge Date: {row['Discharge Date']}, "
        f"Medication: {row['Medication']}, Test Results: {row['Test Results']}"
        "\n\n".lower()
    )

df['formatted_text'] = df.apply(format_row, axis=1)

text_data = df['formatted_text'].tolist()

For the sake of experiment, I will use only 128 data rows, since we are using the free-tier Qdrant Cloud Cluster.

from random import shuffle
sampled_dataset = text_data[:128]
shuffle(sampled_dataset)

See how these rows look now:

which gives:

['name: heather miller, age: 76, gender: male, blood type: a+, medical condition: diabetes, date of admission: 2021-04-17, doctor: scott grant, hospital: powell ward, and mercado, insurance provider: aetna, billing amount: 3908.9465679463137, room number: 428, admission type: elective, discharge date: 2021-05-10, medication: lipitor, test results: inconclusive\n\n',
 'name: connor hansen, age: 75, gender: female, blood type: a+, medical condition: diabetes, date of admission: 2019-12-12, doctor: kenneth fletcher, hospital: powers miller, and flores, insurance provider: cigna, billing amount: 43282.28335770435, room number: 134, admission type: emergency, discharge date: 2019-12-28, medication: penicillin, test results: abnormal\n\n',
 'name: daniel schmidt, age: 63, gender: male, blood type: b+, medical condition: asthma, date of admission: 2022-11-15, doctor: denise galloway, hospital: hammond ltd, insurance provider: cigna, billing amount: 23762.203579059587, room number: 465, admission type: elective, discharge date: 2022-11-22, medication: penicillin, test results: normal\n\n',
 'name: david higgins, age: 49, gender: female, blood type: b-, medical condition: arthritis, date of admission: 2021-03-05, doctor: erin henderson md, hospital: evans and hall schneider,, insurance provider: medicare, billing amount: 24948.47782402692, room number: 361, admission type: emergency, discharge date: 2021-03-20, medication: penicillin, test results: abnormal\n\n',
 'name: lindsey lambert, age: 82, gender: female, blood type: a+, medical condition: hypertension, date of admission: 2021-11-19, doctor: christopher guerra, hospital: and brown oneal, shah, insurance provider: medicare, billing amount: 23067.672165245425, room number: 307, admission type: elective, discharge date: 2021-12-12, medication: ibuprofen, test results: normal\n\n']

Now that we have our sample_dataset, the next step is to generate embeddings for these sentences in order to store them in a vector DB.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-large-en-v1.5", device='cuda')
vectors = model.encode(sampled_dataset)

Remember the size (columns or features) of this vector; we need it while creating the vector DB collection.

Which gives:

(1024,)

Now remember Qdrant’s API key and your own cluster’s URL — we need them now:

import os
os.environ['QDRANT__SERVICE__API_KEY']=<your api key>

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient(
    url=<your cluster’s url>,

    api_key=os.environ['QDRANT__SERVICE__API_KEY'],

Note: Typically the API key should come from a config server that you have to set up, which is internal to your organization or services only. In that way, only people who have access to that config server will get access to the API key, so not everyone can access your cluster and your data is safe!

But since we’re primarily showing the functionality, we will proceed like this.

Create a collection named: phi_data

client.recreate_collection(
    collection_,
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)

Now comes the main part — upload this collection to the cloud cluster:

client.upload_collection(
    collection_,
    ids=[i for i in range(len(sampled_dataset))],
    vectors=vectors,
    parallel=4,    
    max_retries=3,
)

That’s about it!

Qdrant also provides the option of access control via JWT, to replicate it here it has to be done on local mode, to do that simply install docker and run these 2 lines:

docker pull qdrant/qdrant

docker run -p 6333:6333 -p 6334:6334 /
-e QDRANT__SERVICE__API_KEY=eXaMplE12345Key67890Api /  
-e QDRANT__SERVICE__JWT_RBAC=true 
qdrant/qdrant

Note: this will be a single line so delete ‘/’ in above code.

After that, let’s start up the root_client and create a dummy collection:

root_client = QdrantClient(
    url="http://localhost:6333",
    api_key="eXaMplE12345Key67890Api",
)

root_client.recreate_collection(
    collection_,
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)

root_client.upload_collection(
    collection_,
    ids=[i for i in range(len(sampled_dataset))],
    vectors=vectors,
    parallel=4,
    max_retries=3,
)

After that, let’s create a user and limit their access to read only mode using JWT, it creates a temporary key that is linked to the original key.

import jwt
import time

api_key = 'eXaMplE12345Key67890Api'

current_time = int(time.time())

payload = {
    'exp': current_time + 3600,  
    'value_exists': {
        'collection': 'demo_collection',
        'matches': [
            {'key': 'user', 'value': 'John'}
        ]
    },
    "access": [
    {
        "collection": "demo_collection",
        "access": "r",
        "payload": {
            "user": "John"
      }
    }
  ]  
}

encoded_jwt = jwt.encode(payload, api_key, algorithm='HS256')

print(encoded_jwt)

Which gives:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3MTYzMTA2MDgsInZhbHVlX2V4aXN0cyI6eyJjb2xsZWN0aW9uIjoiZGVtb19jb2xsZWN0aW9uIiwibWF0Y2hlcyI6W3sia2V5IjoidXNlciIsInZhbHVlIjoiSm9obiJ9XX0sImFjY2VzcyI6W3siY29sbGVjdGlvbiI6ImRlbW9fY29sbGVjdGlvbiIsImFjY2VzcyI6InIiLCJwYXlsb2FkIjp7InVzZXIiOiJKb2huIn19XX0.7ald90UIk7hI5d57S0vDfo_bdatNsi20XURlhUee_Nw

This key will only give a “read” access to the collection we created — “dummy”.

Even if you try to upload a new data point into this collection you won’t be allowed to, to test it use this:

from qdrant_client import QdrantClient, models
import numpy as np

client = QdrantClient(
    url="http://localhost:6333",
    api_key=your_role_key,
)

data = np.array(list([0.1]*1024))
print(data.shape)

client.upload_points(
    collection_,
    points=[
        models.PointStruct(
           ,
            vector=data,
        )])

Which will give:

UnexpectedResponse: Unexpected Response: 403 (Forbidden)

Now let’s set up our prompting (programming, rather) tool, DSPy!

from dspy.retrieve.qdrant_rm import QdrantRM
qdrant_retriever_model = QdrantRM("phi_data", client, k=3)

Let’s initialize DSPy — Groq’s integration using Groq’s API key:

import dspy
llama3 = dspy.GROQ(model='llama3-8b-8192', api_key = <your groq api> )

The next step tells the system to use Qdrant as the retriever model and Groq as the LLM.

dspy.settings.configure(rm=qdrant_retriever_model, lm=llama3)

Let’s set up our CoT (chain of thought) modules and signatures using DSPy:

class GenerateAnswer(dspy.Signature):
    """Answer questions with logical factoid answers."""

    context = dspy.InputField(desc="will contain phi medical data of patients matched with the query")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="an answer between 10 to 20 words")

We define a function get_context to get the 3 best matching data points for the query:

def get_context(text):
    query_vector = model.encode(text)

    hits = client.search(
        collection_,
        query_vector=query_vector,
        limit=3  
    )
    s=''
    for x in [sampled_dataset[i.id] for i in hits]:
        s = s + x
    return s

After that, define the main class that handles the RAG pipeline.

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = get_context(question)
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

These may not make sense if you are encountering these concepts for the first time. Want to understand how everything works? Hop on to my article that covers it beautifully! — Prompt Like a Pro Using DSPy: A guide to build a better local RAG model using DSPy, Qdrant and Ollama | by Sachin Khandewal | Mar, 2024

To respond to the queries, use:

rag = RAG()
def respond(query):
    response = rag(query)
    return response.answer

This is not visually pleasing, so let’s build a very simple chatbot using my favorite framework Gradio!

import gradio as gr

with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.ClearButton([msg, chatbot])

    def respond(query, chat_history):
        response = uncompiled_rag(query)
        chat_history.append((query, response.answer))
        return "", chat_history

    msg.submit(respond, [msg, chatbot], [msg, chatbot])

Finally, let’s start the app!

If you are running this on Colab or on any other cloud service, consider using:

After starting the app, let’s query and see how this works.

Based on the random shuffle, you may or may not see these results; that’s why find a data point in sampled_dataset and use a name from it to get started. In my case, a person named kayla padilla showed up and I will query on her to get some information.