Introduction

Assuring AI

Artificial intelligence is becoming an increasingly large part of society. AIs have demonstrated the ability to generally perform very well at many different tasks, including providing search engine results, providing content recommendations, powering virtual assistants, controlling autonomous vehicles, and automatically translating different languages. Despite AI's impressive track record, it often fails on edge cases. Because of this, AI cannot be applied to mission critical applications where failures cannot be tolerated. To circumvent this flaw of AI, the AI can be observed using white box or black box monitors. When the AI is uncertain or near failure, a safer, determinstic algorithm can be used until the AI regains confidence.

Previous Work

Previous work has demonstrated the advantages and success of an Assured AI traffic light controller. However, the AI was trained as a monolithic model, requiring a long, costly training process for every different traffic grid topology to which the AI is applied. Previous attempts include:

Spring 2020 Advanced Distrubuted Project First Monolithic Model Trained
Sumo Flow Monolithic Model Based on Speed
Gym CityFlow - A more lightweight simulation environment

Generalizing to Traffic Grid Topologies

Our goal was to train a generalized model that can be applied to any N x M traffic grid topology after being trained once.

Methodology and Contributions

Note: In all our measurements, we use average speed of all observed cars as our metric.

Establishing a Baseline with all Safe Controllers
To get a basline, we first ran the simulation over a 3x3 grid where each light is a pre-programmed safe controller.

The Training Environment
Environment for training at least 1 AI controller. The idea is to avoid placing a traffic light on the edge of the grid to get even training. Some approaches include:

AI Safe Model: Train one AI controller in the center of a 3x3 environment, surrounded by safe controllers
AI Look Ahead Model: Train one AI controller in the center of a 5x5 environment, where the actions of its adjacent neighbors are fed into the model as features during the training process

The Generalized Environment
A model trained in the previous environment is convolved over each intersection over an NxM topology. The model is used to make predictions over this generalized topology. Work here includes:

Applying the above models to each traffic light during an evaluation step
Padding the exterior of the environment with safe controllers and applying the above models to the interior traffic lights

Other Contributions to the Project:

Logging and plotting of metrics during training
Script for evenly distributing training jobs across multiple machines, limiting to four jobs per machine
Scripts for managing jobs across all machines

Results

Safe Controller Baseline

Baselines:

3x3 Topology: 5.57 m/s average speed
5x5 Topology: 5.39 m/s average speed

3x3 Safe Controller Video

The Train Environment

Best Results:

AI Safe Model: After 49,700,000 steps, we get an average speed of 6.429 m/s as our best model
AI Look Ahead Model: After 44,350,000 steps, we get an average speed of 5.51 m/s, as our best model

AI Safe Model Training Plot

AI with safe controller best training plot

AI Safe Model Video

The N x M Evaluation Environment

Best Results:

Any of our AI models to a 3x3 Topology: We get an average speed of 4.3 m/s regardless of which model we apply
Any of our AI models to a 5x5 Topology: We get an average speed of 4.22 m/s regardless of which model we apply

Best Model in a 3x3 Generalized Environment Video

Best Model in a 5x5 Generalized Environment Video

Conclusion and Future Work

While we are able to succesfully train powerful models that outperform the safe controller in our train environment, we were unable to get our model to generalize. No matter the model that we try to run in our generalized environment, the average speed in the system remains unmoved, and well below the safe controller baseline.

Research has shown that generalizing RL models is very challenging. Best approach going forward is to try and train a model on multiple environments in parallel.

Given that each model takes about a week or 50,000,000 timesteps to train with Sumo, the biggest challenges will be computational. One suggestion would be to move away from using Sumo as a simulator and instead either use Gym CityFlow or a custom, fast environment that is optimized to take advantage of GPU resouces. Then, train many models in different combinations of different environments to get a generalized model.

Generalizing Assured AI for Traffic Control

Daniel Stambler, Evan Leung

Spring 2022