You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> <sup>1</sup>University of Tübingen, <sup>2</sup>Tübingen AI Center, <sup>3</sup>OpenDriveLab at Shanghai AI Lab, <sup>4</sup>NVIDIA Research\
20
20
> <sup>5</sup>Robert Bosch GmbH, <sup>6</sup>Nanyang Technological University, <sup>7</sup>University of Toronto, <sup>8</sup>Vector Institute, <sup>9</sup>Stanford University
21
21
>
22
22
> Advances in Neural Information Processing Systems (NeurIPS), 2024 \
23
-
> Track on Datasets and Benchmarks
23
+
> Track on Datasets and Benchmarks
24
24
<br/>
25
25
26
26
27
27
## Highlights <aname="highlight"></a>
28
28
29
-
🔥 NAVSIM gathers simulation-based metrics (such as progress and time to collision) for end-to-end driving by unrolling simplified bird's eye view abstractions of scenes for a short simulation horizon. It operates under the condition that the policy has limited influence on the environment, which enables **efficient, open-loop metric computation** while being **better aligned with closed-loop** evaluations than traditional displacement errors.
29
+
🔥 NAVSIM gathers simulation-based metrics (such as progress and time to collision) for end-to-end driving by unrolling simplified bird's eye view abstractions of scenes for a short simulation horizon. It operates under the condition that the policy has limited influence on the environment, which enables **efficient, open-loop metric computation** while being **better aligned with closed-loop** evaluations than traditional displacement errors.
30
30
31
31
<palign="center">
32
32
<imgsrc="assets/navsim_cameras.gif"width="800">
@@ -43,13 +43,13 @@
43
43
## Getting started <aname="gettingstarted"></a>
44
44
45
45
-[Download and installation](docs/install.md)
46
-
-[Understanding and creating agents](docs/agents.md)
46
+
-[Understanding and creating agents](docs/agents.md)
47
47
-[Understanding the data format and classes](docs/cache.md)
48
48
-[Dataset splits vs. filtered training / test splits](docs/splits.md)
49
49
-[Understanding the Extended PDM Score](docs/metrics.md)
50
50
-[Understanding the traffic simulation](docs/traffic_agents.md)
51
51
-[Submitting to the Leaderboard](docs/submission.md)
52
-
52
+
53
53
<palign="right">(<ahref="#top">back to top</a>)</p>
54
54
55
55
@@ -90,11 +90,11 @@
90
90
All assets and code in this repository are under the [Apache 2.0 license](./LICENSE) unless specified otherwise. The datasets (including nuPlan and OpenScene) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.
91
91
92
92
```BibTeX
93
-
@inproceedings{Dauner2024NEURIPS,
94
-
author = {Daniel Dauner and Marcel Hallgarten and Tianyu Li and Xinshuo Weng and Zhiyu Huang and Zetong Yang and Hongyang Li and Igor Gilitschenski and Boris Ivanovic and Marco Pavone and Andreas Geiger and Kashyap Chitta},
95
-
title = {NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking},
96
-
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
97
-
year = {2024},
93
+
@inproceedings{Dauner2024NEURIPS,
94
+
author = {Daniel Dauner and Marcel Hallgarten and Tianyu Li and Xinshuo Weng and Zhiyu Huang and Zetong Yang and Hongyang Li and Igor Gilitschenski and Boris Ivanovic and Marco Pavone and Andreas Geiger and Kashyap Chitta},
95
+
title = {NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking},
96
+
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
97
+
year = {2024},
98
98
}
99
99
```
100
100
@@ -104,27 +104,14 @@ All assets and code in this repository are under the [Apache 2.0 license](./LICE
Copy file name to clipboardExpand all lines: docs/agents.md
+14-16Lines changed: 14 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,14 +3,14 @@
3
3
Defining an agent starts by creating a new class that inherits from `navsim.agents.abstract_agent.AbstractAgent`.
4
4
5
5
Let’s dig deeper into this class. It has to implement the following methods:
6
-
-`__init__()`:
6
+
-`__init__()`:
7
7
8
8
The constructor of the agent.
9
9
-`name()`
10
10
11
-
This has to return the name of the agent.
11
+
This has to return the name of the agent.
12
12
The name will be used to define the filename of the evaluation csv.
13
-
You can set this to an arbitrary value.
13
+
You can set this to an arbitrary value.
14
14
-`initialize()`
15
15
16
16
This will be called before inferring the agent for the first time.
@@ -19,16 +19,16 @@ Let’s dig deeper into this class. It has to implement the following methods:
19
19
-`get_sensor_config()`
20
20
21
21
Has to return a `SensorConfig` (see `navsim.common.dataclasses.SensorConfig`) to define which sensor modalities should be loaded for the agent in each frame.
22
-
The SensorConfig is a dataclass that stores for each sensor a List of indices of history frames for which the sensor should be loaded. Alternatively, a boolean can be used for each sensor, if all available frames should be loaded.
22
+
The SensorConfig is a dataclass that stores for each sensor a List of indices of history frames for which the sensor should be loaded. Alternatively, a boolean can be used for each sensor, if all available frames should be loaded.
23
23
Moreover, you can return `SensorConfig.build_all_sensors()` if you want to have access to all available sensors.
24
24
Details on the available sensors can be found below.
25
-
25
+
26
26
**Loading the sensors has a big impact on runtime. If you don't need a sensor, consider to set it to `False`.**
27
27
-`compute_trajectory()`
28
28
29
29
This is the main function of the agent. Given the `AgentInput` which contains the ego state as well as sensor modalities, it has to compute and return a future trajectory for the Agent.
30
30
Details on the output format can be found below.
31
-
31
+
32
32
**The future trajectory has to be returned as an object of type `from navsim.common.dataclasses.Trajectory`. For examples, see the constant velocity agent or the human agent.**
33
33
34
34
# Learning-based Agents
@@ -39,7 +39,7 @@ In addition to the methods mentioned above, you have to implement the methods be
39
39
Have a look at `navsim.agents.ego_status_mlp_agent.EgoStatusMLPAgent` for an example.
40
40
41
41
-`get_feature_builders()`
42
-
Has to return a List of feature builders (of type `navsim.planning.training.abstract_feature_target_builder.AbstractFeatureBuilder`).
42
+
Has to return a List of feature builders (of type `navsim.planning.training.abstract_feature_target_builder.AbstractFeatureBuilder`).
43
43
FeatureBuilders take the `AgentInput` object and compute the feature tensors used for agent training and inference. One feature builder can compute multiple feature tensors. They have to be returned in a dictionary, which is then provided to the model in the forward pass.
44
44
Currently, we provide the following feature builders:
45
45
- [EgoStatusFeatureBuilder](https://github.com/autonomousvision/navsim/blob/main/navsim/agents/ego_status_mlp_agent.py#L18) (returns a Tensor containing current velocity, acceleration and driving command)
@@ -49,13 +49,13 @@ Currently, we provide the following feature builders:
49
49
Similar to `get_feature_builders()`, returns the target builders of type `navsim.planning.training.abstract_feature_target_builder.AbstractTargetBuilder` used in training. In contrast to feature builders, they have access to the Scene object which contains ground-truth information (instead of just the AgentInput).
50
50
51
51
-`forward()`
52
-
The forward pass through the model. Features are provided as a dictionary which contains all the features generated by the feature builders. All tensors are already batched and on the same device as the model. The forward pass has to output a Dict of which one entry has to be "trajectory" and contain a tensor representing the future trajectory, i.e. of shape [B, T, 3], where B is the batch size, T is the number of future timesteps and 3 refers to x,y,heading.
52
+
The forward pass through the model. Features are provided as a dictionary which contains all the features generated by the feature builders. All tensors are already batched and on the same device as the model. The forward pass has to output a Dict of which one entry has to be "trajectory" and contain a tensor representing the future trajectory, i.e. of shape [B, T, 3], where B is the batch size, T is the number of future timesteps and 3 refers to x,y,heading.
53
53
54
54
-`compute_loss()`
55
55
Given the features, the targets and the model predictions, this function computes the loss used for training. The loss has to be returned as a single Tensor.
56
56
57
57
-`get_optimizers()`
58
-
Use this function to define the optimizers used for training.
58
+
Use this function to define the optimizers used for training.
59
59
Depending on whether you want to use a learning-rate scheduler or not, this function needs to either return just an Optimizer (of type `torch.optim.Optimizer`) or a dictionary that contains the Optimizer (key: "optimizer") and the learning-rate scheduler of type `torch.optim.lr_scheduler.LRScheduler` (key: "lr_scheduler").
60
60
61
61
-`get_training_callbacks()`
@@ -68,13 +68,13 @@ In inference, the trajectory will automatically be computed using the feature bu
68
68
69
69
## Inputs
70
70
71
-
`get_sensor_config()` can be overwritten to determine which sensors are accessible to the agent.
71
+
`get_sensor_config()` can be overwritten to determine which sensors are accessible to the agent.
72
72
73
73
The available sensors depend on the dataset. For OpenScene, this includes 9 sensor modalities: 8 cameras and a merged point cloud (from 5 LiDARs). Each modality is available for a duration of 2 seconds into the past, at a frequency of 2Hz (i.e., 4 frames). Only this data will be released for the test frames (no maps/tracks/occupancy etc, which you may use during training but will not have access to for leaderboard submissions).
74
74
75
75
You can configure the set of sensor modalities to use and how much history you need for each frame with the `navsim.common.dataclasses.SensorConfig` dataclass.
76
76
77
-
**Why LiDAR?** Recent literature on open-loop planning has opted away from LiDAR in favor of using surround-view high-resolution cameras. This has significantly strained the compute requirements for training and testing SoTA planners. We hope that the availability of the LiDAR modality enables more computationally efficient submissions that use fewer (or low-resolution) camera inputs.
77
+
**Why LiDAR?** Recent literature on open-loop planning has opted away from LiDAR in favor of using surround-view high-resolution cameras. This has significantly strained the compute requirements for training and testing SoTA planners. We hope that the availability of the LiDAR modality enables more computationally efficient submissions that use fewer (or low-resolution) camera inputs.
78
78
79
79
**Ego Status.** Besides the sensor data, an agent also receives the ego pose, velocity and acceleration information in local coordinates. Finally, to disambiguate driver intention, we provide a discrete driving command, indicating whether the intended route is towards the left, straight or right direction. Importantly, the driving command in NAVSIM is based solely on the desired route, and does not entangle information regarding obstacles and traffic signs (as was prevalent on prior benchmarks such as nuScenes). Note that the left and right driving commands cover turns, lane changes and sharp curves.
80
80
@@ -94,16 +94,14 @@ The `ConstantVelocityAgent` is a naive baseline and follows the most simple driv
94
94
95
95
Link to the [implementation](https://github.com/autonomousvision/navsim/blob/main/navsim/agents/constant_velocity_agent.py).
96
96
97
-
### `EgoStatusMLPAgent`:
97
+
### `EgoStatusMLPAgent`:
98
98
The `EgoStatusMLPAgent` is a blind baseline, which ignores all sensors that perceive the environment. The agent applies a Multilayer perceptron to the state of the ego vehicle (i.e., the velocity, acceleration, and driving command). Thereby, the EgoStatusMLP serves as an upper bound for performance, which can be achieved by merely extrapolating the kinematic state of the ego vehicle. The EgoStatusMLP is a lightweight learned example, showcasing the procedure of creating feature caches and training an agent in NAVSIM.
99
99
100
100
Link to the [implementation](https://github.com/autonomousvision/navsim/blob/main/navsim/agents/ego_status_mlp_agent.py).
101
101
102
-
### `TransfuserAgent`:
102
+
### `TransfuserAgent`:
103
103
[Transfuser](https://arxiv.org/abs/2205.15997) is an example of a sensor agent that utilizes both camera and LiDAR inputs. The backbone of Transfuser applies CNNs on a front-view camera image and a discretized LiDAR BEV grid. The features from the camera and LiDAR branches are fused over several convolution stages with Transformers to a combined feature representation. The Transfuser architecture combines several auxiliary tasks and imitation learning with strong closed-loop performance in end-to-end driving with the CARLA simulator.
104
104
105
-
In NAVSIM, we implement the Transfuser backbone from [CARLA Garage](https://github.com/autonomousvision/carla_garage) and use BEV semantic segmentation and DETR-style bounding-box detection as auxiliary tasks. To facilitate the wide-angle camera view of the Transfuser, we stitch patches of the three front-facing cameras. Transfuser is a good starting point for sensor agents and provides pre-processing for image and LiDAR sensors, training visualizations with callbacks, and more advanced loss functions (i.e., Hungarian matching for detection).
105
+
In NAVSIM, we implement the Transfuser backbone from [CARLA Garage](https://github.com/autonomousvision/carla_garage) and use BEV semantic segmentation and DETR-style bounding-box detection as auxiliary tasks. To facilitate the wide-angle camera view of the Transfuser, we stitch patches of the three front-facing cameras. Transfuser is a good starting point for sensor agents and provides pre-processing for image and LiDAR sensors, training visualizations with callbacks, and more advanced loss functions (i.e., Hungarian matching for detection).
106
106
107
107
Link to the [implementation](https://github.com/autonomousvision/navsim/blob/main/navsim/agents/transfuser).
0 commit comments