Ruiyang Hao1, Bowen Jing2, Haibao Yu1,3, Zaiqing Nie1,*
1 AIR, Tsinghua University, 2 The University of Manchester,
3 The University of Hong Kong
Aug. 5th, 2025
: We update the Arxiv paper with more experimental results, and supply more demos in the Project Homepage.Jul. 1st, 2025
: We release the initial version of code and weight (except for WoTE-Style model), along with documentation and training/evaluation scripts.Jun. 30th, 2025
: We released our paper on Arxiv. Code/Models are coming soon. Please stay tuned! ☕️
- Introduction
- StyleDrive Dataset Construction
- Getting Started
- Benchmark Results
- Qualitative Results on StyleDrive Benchmark
- Contact
- Acknowledgement
- Citation
We introduce the first large-scale real-world dataset with rich annotations of diverse driving preferences, addressing a key gap in personalized end-to-end autonomous driving (E2EAD). Using static road topology and a fine-tuned visual language model (VLM), we extract contextual features to construct fine-grained scenarios. Objective and subjective preference labels are derived through behavior analysis, VLM-based modeling, and human-in-the-loop verification. Building on this, we propose the first benchmark for evaluating personalized E2EAD models. Experiments show that conditioning on preferences leads to behavior better aligned with human driving. Our work establishes a foundation for human-centric, personalized E2EAD.

We propose a unified framework for modeling and labeling personalized driving preferences, as shown in the figure below.
Main results are shown in the table below:
Models | NC | DAC | TTC | Comf. | EP | SM-PDMS |
---|---|---|---|---|---|---|
AD-MLP | 92.63 | 77.68 | 83.83 | 99.75 | 78.01 | 63.72 |
TransFuser | 96.74 | 88.43 | 91.08 | 99.65 | 84.39 | 78.12 |
WoTE | 97.29 | 92.39 | 92.53 | 99.13 | 76.31 | 79.56 |
DiffusionDrive | 96.66 | 91.45 | 90.63 | 99.73 | 80.39 | 79.33 |
AD-MLP-Style | 92.38 | 73.23 | 83.14 | 99.90 | 78.55 | 60.02 |
TransFuser-Style | 97.23 | 90.36 | 92.61 | 99.73 | 84.95 | 81.09 |
WoTE-Style | 97.58 | 93.44 | 93.70 | 99.26 | 77.38 | 81.38 |
DiffusionDrive-Style | 97.81 | 93.45 | 92.81 | 99.85 | 84.84 | 84.10 |
All the checkpoints are open-sourced in this Link.
More discussions and analysis are provided in paper.

Qualitative illustration of DiffusionDrive-Style predictions under different style conditions across identical scenarios. Left: Aggressive vs. Normal; Right: Conservative vs. Normal. Red lines indicate the model’s predicted trajectory under the given style condition; green lines denote the ground-truth human trajectory. Clear behavioral differences emerge with style variation, reflecting the model’s ability to adapt its outputs to driving preferences.
If you have any questions, please contact Ruiyang Hao via email (haory369@gmail.com).
This work is partly built upon NAVSIM, Transfuser, DiffusionDrive, WoTE, and nuplan-devkit. Thanks them for their great works!
If you find StyleDrive is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.
@article{hao2025styledrive,
title={StyleDrive: Towards Driving-Style Aware Benchmarking of End-To-End Autonomous Driving},
author={Hao, Ruiyang and Jing, Bowen and Yu, Haibao and Nie, Zaiqing},
journal={arXiv preprint arXiv:2506.23982},
year={2025}
}