Master of Science
Martínez , Juan Camilo Camilo
0000-0002-2881-3076
:
2023-04-03
Abstract
In the United States alone, people take more than 9 billion trips on public transportation every year. Such modes of transport play an important role in access to healthcare, education, and economic opportunities. As a result, cities strive to optimize transit to meet the needs of their citizens. A fundamental requirement for optimizing transit lines in predicting demand in terms of expected board counts and occupancy, which in turn can inform decision-making. Predicting occupancy is also particularly important as we deal with a global pandemic since crowding in transit must be avoided to maintain social distancing protocols. We develop data-driven modeling strategies to predict board counts at individual bus stops as well as maximum occupancy in a trip. We show how off-the-shelf statistical as well as algorithmic approaches fail to work in this scenario due to high sparsity in data (excess zero counts) and multi-modal characteristics. We propose a hierarchical zero-inflated random forest model that can handle excess zero counts and learn from a rich variety of features to forecast board counts. Finally, we use real-world transportation data from the city of Chattanooga, TN to validate our approach.