This repository utilizes your personal exported 'reviews' data from your letterboxd account, and adds details like the runtime, box office, budget, directors, actors, and producers to your exported data through a python web scraping script to retrieve this information from the film's Wikipedia pages. Downstream analysis to follow.
-
Export your data- You must have a Letterboxd account to complete the following actions. In letterboxd, navigate to your 'Account Settings' and to the 'Data' tab. Select 'Export your data' and access this zip file in your downloads.
-
Clone this repository-
git clone https://github.com/yourusernam/LetterBodxdReviews2024.git
-
Add your exported data- Extract your
reviews.csv
file, from the 'letterboxd-yourusername-date of export' zip file, and add thisreviews.csv
file to thedata/
folder of your this repository. -
Instal required packages-
pip install -r requirements.txt
-
Collect extra metadata for your watched films- Fetch extra details on your watched films (Producer, Director, box office, budget, accolades, stars, cinematography, run time, etc.) by running the web scraping script to extract this data from the Wikipedia page for your watched films.
python scripts/scrape_metadata.py
-
Combine datasets- Combine your exported 'reviews' data with the newly collected 'scrape_metadata' data into a single, richer data set of your reviews, ratings, films, and their associated details.
python scripts/combine_data.py
-
Analyze your watched films! Run the analysis script and view the results.
python analysis_scripts/year_data_analysis.py
data/
: Houses the user's added data (reviews.csv
), and the generated data sets from thescrape_metadata.py
andcombine_data.py
scripts.scripts/
: this folder contains the scripts for scraping your film's metadata from their wikipedia entries and for combining this data with your personalreviews.csv
data for easier downstream analysis.analysis_scripts/
: This folder contains the script(s) for analyzing your letterboxd watched films data that has been combined in previous steps.
The data/
directory has been added to the .gitignore to ensure that a user's personal review data is not committed to thier public repository.