Introduction
Assuring AI
Artificial intelligence is becoming an increasingly large part of society. AIs have demonstrated the ability to generally perform very well at many different tasks, including providing search engine results, providing content recommendations, powering virtual assistants, controlling autonomous vehicles, and automatically translating different languages. Despite AI's impressive track record, it often fails on edge cases. Because of this, AI cannot be applied to mission critical applications where failures cannot be tolerated. To circumvent this flaw of AI, the AI can be observed using white box or black box monitors. When the AI is uncertain or near failure, a safer, determinstic algorithm can be used until the AI regains confidence.
Previous Work
Previous work has demonstrated the advantages and success of an Assured AI traffic light controller. However, the AI was trained as a monolithic model, requiring a long, costly training process for every different traffic grid topology to which the AI is applied. Previous attempts include:
- Spring 2020 Advanced Distrubuted Project First Monolithic Model Trained
- Sumo Flow Monolithic Model Based on Speed
- Gym CityFlow - A more lightweight simulation environment All credit goes to Jerry Chen and Brian Wheatman for their previous work on this project.
Generalizing to Traffic Grid Topologies
Our goal was to train a generalized model that can be applied to any N x M traffic grid topology after being trained once.
Methodology and Contributions
Note: In all our measurements, we use average speed of all observed cars as our metric.
Establishing a Baseline with all Safe Controllers
To get a basline, we first ran the simulation over a 3x3 grid where each light is a pre-programmed safe controller.
The Training Environment
Environment for training at least 1 AI controller. The idea is to avoid placing a traffic light on the edge of the grid
to get even training. Some approaches include:
- AI Safe Model: Train one AI controller in the center of a 3x3 environment, surrounded by safe controllers
- AI Look Ahead Model: Train one AI controller in the center of a 5x5 environment, where the actions of its adjacent neighbors are fed into the model as features during the training process
The Generalized Environment
A model trained in the previous environment is convolved over each intersection over an NxM topology.
The model is used to make predictions over this generalized topology. Work here includes:
- Applying the above models to each traffic light during an evaluation step
- Padding the exterior of the environment with safe controllers and applying the above models to the interior traffic lights
Other Contributions to the Project:
- Logging and plotting of metrics during training
- Script for evenly distributing training jobs across multiple machines, limiting to four jobs per machine
- Scripts for managing jobs across all machines
Results
Safe Controller Baseline
Baselines:
- 3x3 Topology: 5.57 m/s average speed
- 5x5 Topology: 5.39 m/s average speed
The Train Environment
Best Results:
- AI Safe Model: After 49,700,000 steps, we get an average speed of 6.429 m/s as our best model
- AI Look Ahead Model: After 44,350,000 steps, we get an average speed of 5.51 m/s, as our best model

The N x M Evaluation Environment
Best Results:
- Any of our AI models to a 3x3 Topology: We get an average speed of 4.3 m/s regardless of which model we apply
- Any of our AI models to a 5x5 Topology: We get an average speed of 4.22 m/s regardless of which model we apply
Best Model in a 5x5 Generalized Environment Video
Conclusion and Future Work
While we are able to succesfully train powerful models that outperform the safe controller in our train environment, we were unable to get our model to generalize. No matter the model that we try to run in our generalized environment, the average speed in the system remains unmoved, and well below the safe controller baseline.
Research has shown that generalizing RL models is very challenging. Best approach going forward is to try and train a model on multiple environments in parallel.
Given that each model takes about a week or 50,000,000 timesteps to train with Sumo, the biggest challenges will be computational. One suggestion would be to move away from using Sumo as a simulator and instead either use Gym CityFlow or a custom, fast environment that is optimized to take advantage of GPU resouces. Then, train many models in different combinations of different environments to get a generalized model.