Lightweight, private, and customizable retrieval-augmented chatbot running entirely on your Mac.
Based on the excellent work by pruthvirajcyn and his Medium article.
This is my personal implementation of a local RAG (Retrieval-Augmented Generation) chatbot using:
- Ollama for running open-source LLMs and embedding models locally.
- Streamlit for a clean and interactive chat UI.
- ChromaDB for storing and querying vector embeddings.
As of 2025-07-17, I'm using:
- 🔍 Embedding model:
nomic-embed-text-v2-moe
- 🧠 LLM:
gemma3n
- 🔒 Privacy: No data is sent to the cloud. Upload and query your documents entirely offline.
- 💸 Cost-effective: No API tokens or cloud GPU costs. You only pay electricity.
- 📚 Better than summarizing: With long PDFs or multiple documents, even summaries may not contain the context you need. A RAG chatbot can drill deeper and provide contextual answers.
✅ Recommended: At least 16GB of RAM on your Mac. Preferably 24GB+ for smoother experience.
git clone https://github.com/eplt/RAG_Ollama_Mac.git
cd RAG_Ollama_Mac
python3 -m venv venv
source venv/bin/activate
pip install -r ./src/requirements.txt
ollama serve
ollama pull gemma3n
ollama pull toshk0/nomic-embed-text-v2-moe:Q6_K
Place your .pdf
files in the data/
directory.
python ./src/load_docs.py
To reset and reload the vector database:
python ./src/load_docs.py --reset
streamlit run ./src/UI.py
Ask questions and the chatbot will respond using relevant context retrieved from your documents.
-
✏️ Modify Prompts
Update prompt templates inUI.py
to guide the chatbot’s tone or behavior. -
🔄 Try Different Models
Ollama supports various LLMs and embedding models. Runollama list
to see what’s available or try pulling new ones. -
⚙️ Tune Retrieval Parameters
Adjust chunk size, overlaps, or top-K retrieval values inload_docs.py
for improved performance. -
🚀 Extend the Interface
Add features like file upload, chat history, user authentication, or export options using Streamlit’s powerful features.
-
Ollama not running?
Make sureollama serve
is active in a terminal tab. -
Missing models?
Runollama list
to verify models are downloaded correctly. -
Dependency issues?
Double-check your Python version (3.7+) and re-create the virtual environment. -
Streamlit errors?
Ensure you're running the app from the correct path and activate your virtual environment.
- Planning to support non-PDF formats (Markdown, .txt, maybe HTML).
- Will experiment with additional LLMs like
phi-3
,mistral
, andllama3
. - Might integrate chat history persistence and better document management.
Local RAG is now more accessible than ever. With powerful small models and tools like Ollama, anyone can build a private, intelligent assistant — no cloud needed.
If you found this useful or have ideas to improve it, feel free to open a PR or drop a star ⭐️