FAQ

I reproduced the results with the provided checkpoints but get different results

CARLA evaluation can be volatile, and results may differ between runs. Based on our empirical observations, typical variations are around 1-2 DS on Bench2Drive, 5-7 DS on Longest6 v2, and 1.0 DS on Town13. These are rough estimates from our experience and not strict guarantees—actual variation depends on many factors including randomness.

Why do we have so many version of leaderboard and scenario_runner?

Each benchmark has its own evaluation protocol and needs its own forks of those two repositories. Expert data collector also needs its own fork.

How do I create more routes?

See carla_route_generator. Also, see Section 5 of LEAD’s supplemental.

Can I see a list of modifications applied to leaderboard and scenario_runner?

We maintain custom forks of CARLA evaluation tools with our modifications:

Which TransFuser versions are there?

See this list.

How often does CARLA crash or fail to start?

In our experience, roughly 10% of CARLA launch attempts may fail, though this varies by system. Common issues include startup hangs, port conflicts, or GPU initialization problems. This is normal behavior with CARLA.

What to do:

  • Use bash scripts/clean_carla.sh to clean up zombie processes

  • Restart CARLA with bash scripts/start_carla.sh

  • Check that ports 2000-2002 aren’t in use

  • For Docker: docker compose restart carla

How to add custom scenarios to CARLA?

See this.

How does expert access to scenario’s specific data?

See this.