This repository contains the full implementation of an AI-powered pharmacovigilance system that automates the extraction of Adverse Event Reports (AERs) and generates detailed narrative case reports from unstructured pharmaceutical literature.
The system combines traditional NLP, biomedical entity extraction, LLM-based summarization, and full-stack deployment components, providing a complete solution for regulatory safety reporting.
- Literature Ingestion: Handles pharmaceutical documents (PDF/HTML) and extracts clean text using OCR and parsing.
- AER Entity Extraction: Extracts structured data like drug name, dosage, reaction, etc., using BioBERT/SciSpacy + rule-based pipelines.
- Vault-compliant JSON Generation: Formats data into standard regulatory JSON schema for downstream use.
- Narrative Generation: Uses Claude (via AWS Bedrock) to generate fluent case narratives from structured AER data.
- AER Insights Analyzer: Enables users to upload multiple AER JSON case files and receive a dynamic, visual and textual insights of data.
- REST API Backend: Exposes core functionalities through a FastAPI server with endpoints for file upload, JSON output, and feedback submission.
- Streamlit Frontend: Interactive interface for uploading literature and viewing extracted reports in real time.
- Containerized Deployment: Deployed with Docker, NGINX (HTTPS), and AWS EC2.
pharmacovigilance/
├── aer_entity_extraction/
│ ├── __pycache__/
│ ├── __init__.py
│ ├── ner_pipeline.py
│ ├── rule_extractors.py
│ └── testrun.py
│
├── case_data_construction/
│ ├── __pycache__/
│ ├── __init__.py
│ ├── json_generator.py
│ └── testrun2.py
│
├── literature_ingestion/
│ ├── __pycache__/
│ ├── __init__.py
│ └── text_extraction.py
│
├── narrative_generation/
│ ├── __pycache__/
│ ├── __init__.py
│ ├── narrative_generator.py
│ └── prompt_builder.py
│
├── case_insights_analysis/
│ ├── __pycache__/
│ ├── __init__.py
│ ├── insights_api.py
│ └── sample_cases/
|
├── nginx/
│ ├── certs/
│ ├── Dockerfile
│ └── nginx.conf
│
├── rest_api/
│ ├── __pycache__/
│ ├── __init__.py
│ ├── .dockerignore
│ ├── Dockerfile
│ ├── main.py
│ └── requirements.txt
│
├── streamlit/
│ ├── .dockerignore
│ ├── app.py
│ ├── Dockerfile
│ └── requirements.txt
│
├── docker-compose.yml
└── requirements.txt
git clone https://github.com/Nidhish-Balasubramanya/AI-Powered-Pharmacovigilance-via-Literature-Monitoring
cd pharmacovigilance-app
- Python 3.10+
- Create virtual env:
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
docker-compose up --build
This starts:
- REST API backend on
http://localhost:8000
- Streamlit frontend on
http://localhost:8501
- HTTPS via NGINX reverse proxy (certificates must be configured)
Description: Upload a pharmaceutical document (.pdf
, .txt
, or image) to extract AER entities and construct a Vault-compatible JSON.
-
Request:
multipart/form-data
file
: The literature document to upload
-
Response:
{
"case_id": "a1b2c3d4...",
"message": "Case data extracted successfully.",
"case_json": { }
}
- Errors:
400
(Unsupported file type),500
(Processing failed)
Description: Fetch the JSON-structured AER case generated from the uploaded literature.
- Path Param:
case_id
– Unique ID of the case - Response: JSON content of the AER case
- Errors:
404
if not found
Description: Generate a narrative from a previously extracted case.
-
Query Param:
case_id
-
Response:
{
"case_id": "a1b2c3d4...",
"narrative": "Patient experienced..."
}
- Errors:
404
if case not found
Description: Download the structured AER JSON file.
- Path Param:
case_id
- Response: Attachment (
.json
) asapplication/json
- Errors:
404
if case not found
Description: Download the generated narrative as a .txt
file.
- Path Param:
case_id
- Response: Attachment (
.txt
) astext/plain
- Errors:
404
if narrative not found
Description: Submit validation feedback for a specific case.
-
Form Params:
case_id
: ID of the case being reviewedfeedback
: Free-text feedback message
-
Response:
{
"message": "Feedback received. Thank you!"
}
Description: Simple health check for uptime and monitoring.
- Response:
{
"status": "ok"
}
The Streamlit UI allows:
- Uploading documents
- Viewing extracted JSON
- Triggering narrative generation
- Displaying full case report
Accessible via: https://pharmacovigilence.com/
- Python 3.10, FastAPI, Streamlit
- SciSpacy, BioBERT/ClinicalBERT, AWS Bedrock (Claude)
- Docker, NGINX, AWS EC2
- Vault JSON Schema, OCR, Regex/Ruled NER
This work is licensed under CC BY-NC-ND 4.0.
- Nidhish Balasubramanya - nidhishbalasubramanya@gmail.com
- For queries or feedback, feel free to open an issue or contact via the email.