Generalizing Assured AI for Traffic Control

Daniel Stambler, Evan Leung

Spring 2022

Presentation (.pdf)
Source (.zip)

Introduction

Assuring AI

Artificial intelligence is becoming an increasingly large part of society. AIs have demonstrated the ability to generally perform very well at many different tasks, including providing search engine results, providing content recommendations, powering virtual assistants, controlling autonomous vehicles, and automatically translating different languages. Despite AI's impressive track record, it often fails on edge cases. Because of this, AI cannot be applied to mission critical applications where failures cannot be tolerated. To circumvent this flaw of AI, the AI can be observed using white box or black box monitors. When the AI is uncertain or near failure, a safer, determinstic algorithm can be used until the AI regains confidence.

Previous Work

Previous work has demonstrated the advantages and success of an Assured AI traffic light controller. However, the AI was trained as a monolithic model, requiring a long, costly training process for every different traffic grid topology to which the AI is applied. Previous attempts include:

Generalizing to Traffic Grid Topologies

Our goal was to train a generalized model that can be applied to any N x M traffic grid topology after being trained once.


Methodology and Contributions

Note: In all our measurements, we use average speed of all observed cars as our metric.

Establishing a Baseline with all Safe Controllers
To get a basline, we first ran the simulation over a 3x3 grid where each light is a pre-programmed safe controller.

The Training Environment
Environment for training at least 1 AI controller. The idea is to avoid placing a traffic light on the edge of the grid to get even training. Some approaches include:

The Generalized Environment
A model trained in the previous environment is convolved over each intersection over an NxM topology. The model is used to make predictions over this generalized topology. Work here includes:

Other Contributions to the Project:


Results

Safe Controller Baseline

Baselines:

3x3 Safe Controller Video

The Train Environment

Best Results:

AI Safe Model Training Plot
AI with safe controller best training plot
AI Safe Model Video

The N x M Evaluation Environment

Best Results:

Best Model in a 3x3 Generalized Environment Video


Best Model in a 5x5 Generalized Environment Video


Conclusion and Future Work

While we are able to succesfully train powerful models that outperform the safe controller in our train environment, we were unable to get our model to generalize. No matter the model that we try to run in our generalized environment, the average speed in the system remains unmoved, and well below the safe controller baseline.

Research has shown that generalizing RL models is very challenging. Best approach going forward is to try and train a model on multiple environments in parallel.

Given that each model takes about a week or 50,000,000 timesteps to train with Sumo, the biggest challenges will be computational. One suggestion would be to move away from using Sumo as a simulator and instead either use Gym CityFlow or a custom, fast environment that is optimized to take advantage of GPU resouces. Then, train many models in different combinations of different environments to get a generalized model.