CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models

News

Our paper has been accepted to SIGIR 2025 🎉

Introduction

CORONA is a coarse-to-fine recommendation framework that retrieves user-neighborhoods on user–item bipartite graphs and leverages LLM-augmented user profiles as side information. The coarse stage retrieves candidate users via graph-aware similarity with distance priors; the fine stage constructs compact subgraphs for downstream training/evaluation.

Repository Structure

main.py: training/validation/testing for user retriever
model.py: retriever model with distance-aware embedding transformation
construct_graph.py: build per-user subgraphs from retrieved neighbors
chat_api_query.py: LLM-based user profiling and embedding generation
load.py: utilities for dataset loading and diagnostics
netflix_data/: example dataset placeholder (train/val/test splits and sparse matrices)

Quick Start

1) Environment

Python 3.9+
Install minimal deps:

pip install -r requirements-min.txt

For CUDA/Torch Geometric GPU wheels, follow the official guides.

2) Configure

Create .env (or copy .env.example) to specify paths/devices:

cp .env.example .env

Key variables:

DATA_DIR: project root for data and outputs (default .)
DATASET_DIR: dataset subdir (default /netflix_data)
CUDA_VISIBLE_DEVICES: GPU id (default 0)
TOP_K: retrieved users per query (default 500)
OPENAI_*: LLM credentials for profiling

3) Prepare Dataset

We experiment on Netflix, MovieLens, and Amazon-Book. Provide only textual side information for all methods.

Place processed files under ${DATA_DIR}${DATASET_DIR}:
- train.json, val.json, test.json (uid -> item list)
- train_mat (scipy sparse user–item CSR, pickled)
- augmented_user_init_embedding_final (numpy array pickled, dim = user embedding)
- Optional: netflix_image_text/item_attribute.csv for profiling
For Netflix node features, we recommend following LLMRec instructions.

4) LLM-based User Profiling (Optional)

If you need to generate augmented_user_init_embedding_final:

make augment

This reads train_mat/test.json and writes ${AUGMENT_FILE_PATH}/augmented_user_init_embedding_final.

5) Train and Evaluate

make train

After training, the best model and retrieved nodes are saved to ${DATA_DIR}/Graph_RA_Rec/model_states/. For testing independently:

make test

6) Construct Subgraphs

make graphs

Produces user/item subgraphs under ${DATA_DIR}/Graph_RA_Rec/${basename(DATASET_DIR)}/.

Datasets Details

Netflix (KDD Cup 2007)
MovieLens-10M (ACM TiiS 2015)
Amazon-Book (EMNLP 2019) We follow LLMRec for Netflix/MovieLens splits and RLMRec for Amazon-Book. Textual info is encoded by Sentence-BERT.

Reproducibility Notes

Determinism: set_seed(3) in main.py
GPU selection via CUDA_VISIBLE_DEVICES
Cached tensors: *_for_RA.pkl are stored in ${DATA_DIR}${DATASET_DIR}

Citation

If you find this repository helpful, please cite:

@inproceedings{corona2025,
  title={CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models},
  booktitle={Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2025}
}

References

He et al., LightGCN, SIGIR 2020
Wei et al., LLMRec, arXiv 2024
Ren et al., RLMRec, WWW 2024
Bennett and Lanning, The Netflix Prize, KDD Cup 2007
Harper and Konstan, MovieLens, ACM TiiS 2015
Reimers and Gurevych, Sentence-BERT, EMNLP/IJCNLP 2019

License

This code is released for research purposes. See repository license if provided.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
netflix_data		netflix_data
.editorconfig		.editorconfig
.gitignore		.gitignore
Makefile		Makefile
Pseudocode.png		Pseudocode.png
README.md		README.md
chat_api_query.py		chat_api_query.py
construct_graph.py		construct_graph.py
load.py		load.py
main.py		main.py
model.py		model.py
requirements-min.txt		requirements-min.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models

News

Introduction

Repository Structure

Quick Start

1) Environment

2) Configure

3) Prepare Dataset

4) LLM-based User Profiling (Optional)

5) Train and Evaluate

6) Construct Subgraphs

Datasets Details

Reproducibility Notes

Citation

References

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

lazishu2000/CORONA

Folders and files

Latest commit

History

Repository files navigation

CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models

News

Introduction

Repository Structure

Quick Start

1) Environment

2) Configure

3) Prepare Dataset

4) LLM-based User Profiling (Optional)

5) Train and Evaluate

6) Construct Subgraphs

Datasets Details

Reproducibility Notes

Citation

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages