A sophisticated book recommendation system that combines the power of AI, vector similarity search, and natural language processing to provide personalized book recommendations.
- AI-Powered Recommendations: Uses OpenAI's language models to provide intelligent, contextual book recommendations
- Semantic Search: Leverages HuggingFace embeddings and ChromaDB for similarity-based book discovery
- Book Comparison: Compare two books with AI-generated insights
- Fast API: Built with FastAPI for high-performance API endpoints
- Vector Database: ChromaDB for efficient similarity search and retrieval
BookLM.mp4
The system consists of several key components:
- Data Layer: SQLite database storing dataset
- Vector Database: ChromaDB for semantic similarity search
- AI Layer: OpenAI LLM for intelligent recommendations
- API Layer: FastAPI serving REST endpoints
- Frontend: Static HTML/CSS/JS interface
Tools Used:
- OpenAI for providing the language models
- HuggingFace for embedding models
- ChromaDB for vector database functionality
- FastAPI for the web framework
- LangChain for AI/ML orchestration
- Python 3.8 or higher
- OpenAI API key
- Sufficient disk space for book embeddings
git clone https://github.com/mehrdad-dev/BookLM.git
cd BookLM
pip install -r requirements.txt
Create a .env
file in the project root with the following variables:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_BASE=your_openai_base_url_here (if needed)
LLM_MODEL=gemma-3-1b-it
# Embedding Model
EMBEDDING_MODEL_NAME=sentence-transformers/all-MiniLM-L6-v2
# Database Configuration
CSV_PATH=dataset/Best_books_ever[Cleaned].csv
DB_PATH=books.db
ROWS_LIMIT=100
# Vector Database
INDEX_PATH=chroma_books_index
The original dataset I used for this project: https://github.com/scostap/goodreads_bbe_dataset
You can find a cleaned version of this dataset in the dataset/
folder.
Ensure you have the book dataset in the dataset/
folder. The system expects a CSV file with the following columns:
bookId
: Unique book identifiertitle
: Book titleauthor
: Book authorrating
: Book ratingdescription
: Book descriptiongenres
: Book genrescharacters
: Book characterscoverImg
: Book cover image URL
uvicorn main:app --reload
On the first run, the system will:
- Load book data from CSV into SQLite database
- Create embeddings for book descriptions using HuggingFace
- Store embeddings in ChromaDB for similarity search
- Start the web server
This process may take a few minutes depending on the dataset size.
-
Book Recommendations:
- Navigate to the "Recommendation" tab
- Enter your book preferences (e.g., "I want a fantasy book about magical worlds")
- Get AI-powered recommendations with reasoning
-
Book Comparison:
- Navigate to the "Compare" tab
- Search book titles
- Select two books
- Get AI-generated comparison insights
BookLM/
├── main.py
├── requirements.txt
├── README.md
├── books.db # SQLite database (auto-generated)
├── chroma_books_index/ # ChromaDB vector database (auto-generated)
├── books_1.Best_Books_Ever.csv
├── dataset/
│ ├── Best_books_ever[Cleaned].csv
│ └── dataset.ipynb
└── static/
└── index.html
OPENAI_API_KEY
: Your OpenAI API keyOPENAI_API_BASE
: Your OpenAI Base URLLLM_MODEL
: OpenAI model to use (I used: gemma-3-1b-it)EMBEDDING_MODEL_NAME
: HuggingFace embedding modelCSV_PATH
: Path to your book dataset CSVDB_PATH
: SQLite database file pathROWS_LIMIT
: Number of books to process (for testing)INDEX_PATH
: ChromaDB index directory
- ROWS_LIMIT: Reduce for faster initial setup, increase for more comprehensive recommendations
- Chunk Size: Modify
chunk_size
inprepare_documents()
for different embedding granularity - Similarity Search: Adjust
k
parameter inquery_db()
for more/fewer recommendations
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License
Happy Reading! 📚✨