A Data-Driven Analysis of Environmental Migration in Coastal Bangladesh
The decision to migrate is complex and is often influenced by a combination of economic, social, political, and environmental pressures. Seasonal, internal migration is a common strategy for livelihood diversification in Bangladesh, but it is unclear how existing patterns of mobility will be affected by future environmental variability, which is expected to increase pressure on coastal communities. Household survey instruments can capture detailed information about migration histories and their contexts, but it can be challenging to identify important predictors from large numbers of covariates with standard statistical methods. Machine learning techniques are well-suited to pattern identification and can identify important covariates from large datasets. This thesis reports on the application of machine learning approaches to two large surveys collected from a total of more than 2,800 households in southwestern Bangladesh. I applied random forest classification and regression models to identify significant covariates with the greatest predictive power for household migration decisions. The results show that random forest models are able to identify drivers of migration, but there exists a tradeoff between high predictive ability and low interpretability. To address this tradeoff, random forests and other more complex machine learning algorithms may be useful in combination with more traditional, simpler methods. I conduct a survival analysis of household time to first migration using the important variables identified by the random forest algorithm, which provides deeper insight into how important variables impact mobility. Future work should continue to explore the potential of machine learning techniques applied to questions of environmental migration.