How to build an AI Agent for Health Advice

Learn how to build an AI agent based on the NHS Health A-Z Data that makes it easy to find answers to health and medical queries.

Neha Sharma

Feb 14, 2025

Whenever we seek guidance on health matters online, we instinctively turn to trusted national health agencies, such as the NHS in UK, for reliable information—however scouring through its vast library of health content takes time and effort. What if we could build a ChatGPT like chatbot that answers all your health queries?

Many AI systems generate impressive results, but since they are prone to hallucinations it sometimes leads to inaccurate responses, which is risky in healthcare. By integrating trusted NHS data into our agentic Retrieval-Augmented Generation (RAG) pipeline, we ensure accurate and reliable answers.

This blog walks you through the entire process—from scraping NHS Health A–Z data, storing it on Ori’s Object Storage, and finally building an agentic RAG system with two retrieval methods (BM25 and vector databases) using Hugging Face’s smol agents library. At the end, we’ll compare the responses from both methods that can help you decide which one best fits your specific use case.

Background

Retrieval-Augmented Generation (RAG) combines a retrieval module with a generative language model. The retrieval component fetches the most relevant documents or passages from a corpus, and the generative model uses these documents to produce informed, contextually accurate answers.

With Agentic RAG, we can go a step further that not only retrieves relevant information but also uses that data to guide the generation process in a more informed and interactive manner. This approach aims to bridge the gap between trusted, authoritative content and the dynamic, conversational abilities of modern generative models.

Let’s explore how to build an intelligent, agentic RAG system designed specifically for NHS health conditions data.

Scraping the NHS Health A–Z Website

We start by extracting the links for each health condition listed on the NHS Health A–Z website. Below is a simplified example of how you might accomplish this in Python:

import requests from bs4 import BeautifulSoup import csv from urllib.parse import urljoin BASE_URL = "https://www.nhs.uk/conditions/" url = "https://www.nhs.uk/conditions/" response = requests.get(url) response.raise_for_status() soup = BeautifulSoup(response.text, "html.parser") # Find all anchor tags links = soup.find_all("a") # Open a CSV file to write the results with open("nhs_conditions_links.csv", mode="w", newline="", encoding="utf-8") as csv_file: writer = csv.writer(csv_file) # Write header row writer.writerow(["Link Text", "URL"]) for link in links: link_text = link.get_text(strip=True) link_href = link.get("href") # We’ll also filter out empty or “#” links if needed: if link_href and link_href not in ["#", ""]: full_url = urljoin(BASE_URL, link_href) writer.writerow([link_text, full_url])

We’ll then scrape the data from each of the extracted URLs on various health conditions and save into a Markdown CSV file.

import pandas as pd import re import requests from markdownify import markdownify from requests.exceptions import RequestException def markdown_from_urls( nhs_conditions_links, nhs_conditions_dataset ): #Read the input csv into a pandas dataframe df = pd.read_csv(nhs_conditions_links) #Store the markdown results in a new list or directly in a new column md_contents =[] #Iterate through each row for idx, row in df.iterrows(): url = row["URL"] print(f"[{idx+1}/{len(df)}] Fetching {url}") try: response = requests.get(url, timeout=10) response.raise_for_status() #Convert HTML to Markdown markdown_content = markdownify(response.text) # Remove multiple line breaks markdown_content = re.sub(r"\n{3,}", "\n\n", markdown_content) # Return markdown_content except RequestException as e: return f"Error fetching the webpage: {url(e)}" except Exception as e: return f"An unexpected error occurred: {url(e)}" #Add the markdown content to our list md_contents.append(markdown_content) #Create a new dataframe column with markdown content df["Markdown"] = md_contents #Save to a new csv df.to_csv(nhs_conditions_dataset, index=False, encoding="utf-8") def main(): #print("inside main") markdown_from_urls("nhs_conditions_links.csv", "nhs_conditions_dataset.csv") if __name__ == "__main__": main()

Storing Data on S3

Once you have your CSV file, the next step is to store it on OGC object storage (S3) or any preferable compatible S3 storage. Refer to our docs to get started on OGC S3.

Once you create and set up your bucket, you can use the following command to copy the file to S3.

aws s3 cp /path/to/file/filename --endpoint-url=https://s3.<bucket_region>.oriobjects.cloud s3://bucket_name

The S3-hosted CSV will serve as our data source for the retrieval modules.

Before we dive into the implementation, it’s important to note that we’re using the s3fs library—one of the S3-compatible tools that simplifies reading and managing objects stored in your S3 bucket. Alternatively, you could also use boto3 to interact with your S3 storage.

Note: For optimal compatibility, ensure that you have the botocore package version 1.35.99 installed to avoid enabling the checksum header by default.

Implementing the Retrieval Methods

We use two approaches to retrieve relevant responses from the NHS data:

BM25 Retriever
Vector Database

In order to power the agent, we would need an LLM inference API. We are using OGC Inference Endpoints API, or alternatively you could also use the default Hugging Face’s HfApiModel. This inference powers the agent.

Approach #1: BM25 Ranking Function

BM25 is a classical ranking function used in information retrieval. It evaluates the relevance of documents by comparing the query terms with document term frequencies. Here is an overview of BM25 and how it can be used.

Below is our agentic RAG system built with the BM25 retrieval method, leveraging the RetrieverTool from the smol agents library.

Start by installing the required dependencies:

pip install smolagents pandas langchain langchain-community sentence-transformers datasets python-dotenv rank_bm25 --upgrade -q

from langchain.docstore.document import Document from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_community.retrievers import BM25Retriever import os import s3fs import csv key = os.environ["ACCESS_KEY_ID"] secret = os.environ["SECRET_ACCESS_KEY"] endpoint_url = os.environ["ENDPOINT_URL"] fs = s3fs.S3FileSystem( key = key, secret = secret, endpoint_url = endpoint_url, config_kwargs={ 'region_name': 'eu-central-003', 'signature_version': 's3v4', } ) with fs.open("nhs-dataset/nhs_conditions_dataset.csv", "rb") as f: decoded_content = f.read().decode("utf-8") reader = csv.DictReader(decoded_content.splitlines()) docs = list(reader) source_docs = [ Document(page_content=doc["Markdown"], metadata={"URL": doc["URL"].split("/")[-2]}) for doc in docs ] text_splitter = RecursiveCharacterTextSplitter( chunk_size=500, chunk_overlap=50, add_start_index=True, strip_whitespace=True, separators=["\n\n", "\n", ".", " ", ""], ) docs_processed = text_splitter.split_documents(source_docs) from smolagents import Tool class RetrieverTool(Tool): name = "retriever" description = "Uses semantic search to retrieve the parts of transformers documentation that could be most relevant to answer your query." inputs = { "query": { "type": "string", "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.", } } output_type = "string" def __init__(self, docs, **kwargs): super().__init__(**kwargs) self.retriever = BM25Retriever.from_documents( docs, k=10 ) def forward(self, query: str) -> str: assert isinstance(query, str), "Your search query must be a string" docs = self.retriever.invoke( query, ) return "\nRetrieved documents:\n" + "".join( [ f"\n\n===== Document {str(i)} =====\n" + doc.page_content for i, doc in enumerate(docs) ] ) retriever_tool = RetrieverTool(docs_processed) # Using Ori Endpoint from smolagents import OpenAIServerModel model = OpenAIServerModel( model_id=model, api_base="https://for-smol-agent.inference.ogc.ori.co/openai/v1/", api_key=os.environ["ACCESS_TOKEN"], ) # If using an HfApiModel # from smolagents import HfApiModel # agent = CodeAgent( # tools=[retriever_tool], model=HfApiModel(), max_steps=4,verbosity_level=2 # ) from smolagents import CodeAgent agent = CodeAgent( tools=[retriever_tool], model=model, max_steps=4, verbosity_level=2 ) agent_output = agent.run("What to do if your throat is paining?") print("Final output:") print(agent_output)

On running with the question “What to do if your throat is paining?”, we get the following:

Final output: Okay, so I'm trying to figure out what to do if my throat is hurting. The user wants a concise answer, so I need to be efficient but still helpful. First, I recall that when your throat gets bad, it's important to assess the situation. Maybe there's an infection or something else causing it. But without symptoms like fever or a runny nose, it's hard to say for sure. Hydration seems like a good first step. Staying hydrated can help ease discomfort and prevent dehydration. I'll mention drinking fluids, maybe water or herbal teas. If the pain persists, I should advise consulting a healthcare professional. They can give a proper diagnosis and suggest treatments like antibiotics or over-the-counter remedies if necessary. Using a saline solution might help clear the throat, so that's another step to include. Maybe just a quick gargle with some saltwater. Keeping the environment clean is also important. Wiping down surfaces could reduce any irritation or allergens that might be causing the pain. I should keep each step brief and clear, without being too technical. The user wants a straightforward guide, so each tip should stand alone. I'll structure it in a numbered list for easy reading. I need to make sure not to repeat any previous errors, like using functions that aren't available, so I'll stick to what's possible with the tools at hand. Putting it all together: assess the situation, stay hydrated, consult a professional, gargle with saline, and keep the area clean. That should cover the main actions needed. </think> To address the throbbing pain in your throat and provide a structured response based on typical health advice, here are the key steps to take: 1. **Assess the Situation**: Evaluate if the pain is due to an infection, allergies, or something else. Consider any additional symptoms like fever or runny nose. 2. **Hydration**: Drink fluids to stay hydrated. Simple options include water or herbal teas to alleviate discomfort. 3. **Consult a Healthcare Professional**: If the pain persists or worsens, seek medical advice. They may perform a proper diagnosis and recommend appropriate treatments, such as antibiotics or over-the-counter remedies. 4. **Remove Irritants**: To help alleviate the throat pain, you can gargle with a saline solution or mouthwash. Avoid irritants like alcohol, smoking, and certain foods or drinks. 5. **Maintain Hygiene**: Clean frequently touched surfaces to reduce irritation and avoid potential irritants that might be contributing to the symptom. By following these steps, you can better manage the throbbing pain and address the underlying cause if necessary.

Approach #2: Using a Vector Database

For a more semantic approach, we’ll create embeddings using the all-MiniLM-L6-v2, a sentence-transformer model from Hugging Face and then store them in a Chroma vector database for the RAG. This method captures contextual meaning better than simple keyword matching.

Note: In production, consider using more advanced vector databases such as Pinecone, Weaviate for scalability and additional features.

from langchain.docstore.document import Document from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_chroma import Chroma from langchain_huggingface import HuggingFaceEmbeddings from tqdm import tqdm from transformers import AutoTokenizer import os import s3fs import csv key = os.environ["ACCESS_KEY_ID"] secret = os.environ["SECRET_ACCESS_KEY"] endpoint_url = os.environ["ENDPOINT_URL"] fs = s3fs.S3FileSystem( key = key, secret = secret, endpoint_url = endpoint_url, config_kwargs={ 'region_name': 'eu-central-003', 'signature_version': 's3v4', } ) with fs.open("nhs-dataset/nhs_conditions_dataset.csv", "rb") as f: decoded_content = f.read().decode("utf-8") reader = csv.DictReader(decoded_content.splitlines()) docs = list(reader) source_docs = [ Document(page_content=doc["Markdown"], metadata={"URL": doc["URL"].split("/")[-2]}) for doc in docs ] text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer( AutoTokenizer.from_pretrained("thenlper/gte-small"), chunk_size=200, chunk_overlap=20, add_start_index=True, strip_whitespace=True, separators=["\n\n", "\n", ".", " ", ""], ) # Split docs and keep only unique ones print("Splitting documents...") docs_processed = [] unique_texts = {} for doc in tqdm(source_docs): new_docs = text_splitter.split_documents([doc]) for new_doc in new_docs: if new_doc.page_content not in unique_texts: unique_texts[new_doc.page_content] = True docs_processed.append(new_doc) print("Embedding documents... ") # Initialize embeddings and ChromaDB vector store embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2") vector_store = Chroma.from_documents(docs_processed, embeddings, persist_directory="./chroma_db") from smolagents import Tool class RetrieverTool(Tool): name = "retriever" description = ( "Uses semantic search to retrieve the parts of documentation that could be most relevant to answer your query." ) inputs = { "query": { "type": "string", "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.", } } output_type = "string" def __init__(self, vector_store, **kwargs): super().__init__(**kwargs) self.vector_store = vector_store def forward(self, query: str) -> str: assert isinstance(query, str), "Your search query must be a string" docs = self.vector_store.similarity_search(query, k=10) return "\nRetrieved documents:\n" + "".join( [f"\n\n===== Document {str(i)} =====\n" + doc.page_content for i, doc in enumerate(docs)] ) retriever_tool = RetrieverTool(vector_store) from smolagents import OpenAIServerModel, CodeAgent model = OpenAIServerModel( model_id=model, api_base="https://for-smol-agent.inference.ogc.ori.co/openai/v1/", api_key=os.environ["ACCESS_TOKEN"], ) # from smolagents import HfApiModel # model = HfApiModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct") agent = CodeAgent( tools=[retriever_tool], model = model # model=HfApiModel(), max_steps=4, verbosity_level=2, ) agent_output = agent.run("What to do if your throat is paining?") print("Final output:") print(agent_output)

This generates the following output

Final output: ['Get plenty of rest', 'Drink plenty of fluids', 'Take painkillers like paracetamol or ibuprofen', 'Try adding honey to a warm drink to soothe your throat', 'Gargle with warm salty water (not for children)', 'Cover your mouth and nose when coughing or sneezing', 'Wash your hands regularly']

Comparing the responses

BM25-based retrieval: The output generated by BM25 retriever is more aligned with the original NHS content. This retrieval method surfaces text that more closely matches the official NHS wording and structure, thus including formal “red flag” advice. This is because BM25 can retrieve exact or near-exact text matches from your source documents.
Vector Database: Provides a broader coverage of symptomatic care. The embeddings-based approach captures a wider spread of advice across multiple documents (e.g., mention of humidifiers, more pediatric nuances) because vector-based similarity can cluster semantically related content. However, it may occasionally miss important keywords or phrases if those items are not “close” in the embedding space.

How can we enhance model output further?

Improve Prompt Engineering: For the LLM part of your RAG pipeline, refine your prompt to specifically request both:
- Self-care advice (hydration, rest, etc.), and
- When to seek professional help (urgent symptoms, red flags).
- Provide instructions for the model to include disclaimers or relevant official guidance.
Use More Domain-Specific Models
- Embeddings from domain-specific models (e.g., BioClinicalBERT, PubMedBERT) might capture medical nuances better.
- Similarly, a more capable LLM with domain fine-tuning might produce more coherent, comprehensive answers.

In this blog, we’ve built an agentic RAG system that scrapes NHS Health A–Z data, stores it securely on S3, and utilizes two retrieval methods, BM25 for keyword matching and a vector database for semantic search, integrated with Hugging Face’s smol agents library. This framework effectively combines retrieval and generation to build domain-specific intelligent agents and can be adapted for various applications, with enhancements including fine-tuning, user feedback integration, and advanced vector scalability.

Chart your own AI reality with Ori

Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables growing AI businesses and enterprises to deploy their AI models and applications in a variety of ways:

Deploy Private Clouds to build secure Enterprise AI, faster.
Operate Inference Endpoints effortlessly at any scale.
Leverage GPU Instances as on-demand virtual machines.
Scale GPU Clusters for training and inference.
Manage AI workloads on Serverless Kubernetes without infrastructure overhead.

How to build an AI Agent for Health Advice

Background

Scraping the NHS Health A–Z Website

Storing Data on S3

Implementing the Retrieval Methods

Approach #1: BM25 Ranking Function

Approach #2: Using a Vector Database

Note: In production, consider using more advanced vector databases such as Pinecone, Weaviate for scalability and additional features.

Comparing the responses

How can we enhance model output further?

Chart your own AI reality with Ori

Subscribe for more news and insights

Similar posts

AI at Scale: Deploy LLMs like Code Llama on Any Cloud

How to Merge Models for Code-Generating LLMs

Unveiling a New Benchmarking Framework from Ori

How to build an AI Agent for Health Advice

Background

Scraping the NHS Health A–Z Website

Storing Data on S3

Implementing the Retrieval Methods

Approach #1: BM25 Ranking Function

Approach #2: Using a Vector Database

Note: In production, consider using more advanced vector databases such as Pinecone, Weaviate for scalability and additional features.

Comparing the responses

How can we enhance model output further?

Chart your own AI reality with Ori

Subscribe for more news and insights

Similar posts

AI at Scale: Deploy LLMs like Code Llama on Any Cloud

How to Merge Models for Code-Generating LLMs

Unveiling a New Benchmarking Framework from Ori

Join the new class of AI infrastructure!