This project implements sentiment analysis on movie reviews using both traditional Machine Learning and BERT-based approaches. The system can classify movie reviews as either positive (1) or negative (0) with high accuracy.
├── BERT accuracy/ # BERT model performance visualizations
├── ML models accuracy/ # ML models performance visualizations
├── NLP_Data/ # Dataset files
│ └── all_reviews.csv # Combined dataset
├── NLP project.py # Main project implementation
├── NLP project.ipynb # Main project implementation as Jupyter Notebook
├── Documentation.pdf # Detailed documentation
└── requirements.txt # Project dependencies
- Clone the repository:
git clone https://github.com/Mohammed2372/Movie-Reviews-Sentiment-Analysis.git
cd Movie-Reviews-Sentiment-Analysis
- Create a virtual environment (recommended):
python -m venv venv
.\venv\Scripts\Activate
- Install dependencies:
pip install -r requirements.txt
- Download NLTK resources: The script will automatically download required NLTK resources on first run, or you can manually download them:
import nltk
nltk.download(['punkt', 'wordnet', 'stopwords', 'averaged_perceptron_tagger'])
Run the main script to train both ML models and BERT:
python "NLP project.py"
- Logistic Regression
- Linear SVC
- Random Forest All models use TF-IDF vectorization with unigrams and bigrams.
- Base: textattack/bert-base-uncased-SST-2
- Fine-tuned for sentiment analysis
- Includes early stopping and model checkpointing
- Results available in
ML models accuracy/
- Classification reports for each model
- Comparative performance analysis
- Results available in
BERT accuracy/
- Training loss curves
- Evaluation metrics
- Final test results
For detailed information about:
- Data preprocessing steps
- Model architectures
- Training configurations
- Performance metrics
- Implementation details
Please refer to Documentation.pdf
for more details.
To use the trained model for predictions (after training and saving the model):
from transformers import BertForSequenceClassification, BertTokenizer
# Load the model
model_path = "./bert_model"
model = BertForSequenceClassification.from_pretrained(model_path)
tokenizer = BertTokenizer.from_pretrained(model_path)
# Prepare text
text = "Your movie review here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
# Get prediction
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1)
sentiment = "positive" if prediction == 1 else "negative"