Data Quality-Aware Graph Machine Learning
Wang, Yu
0000-0001-6908-508X
:
2024-07-22
Abstract
Graph-structured data is ubiquitous in real-world applications (e.g., social networks, infrastructure, biomedical) and Graph Machine Learning (GML) has become a prominent method for handling such data. However, despite GML’s remarkable achievements, its reliance on node features makes it susceptible to conventional data quality issues, and the additional consideration of graph topology adds novel complexity. Inspired by these challenges, my research in this dissertation focuses on data-quality-aware graph machine learning, systematically examining issues related to topology, imbalance, bias, and limited data issues in graph data. This includes proposing data/model-centric solutions to handle them and applying my proposed data issue solutions for real-world applications, such as de-noising unrelated interactions in personalization learning for recommender systems, overcoming the inherent class-imbalance issue in computer-aided drug discovery, mitigating hallucination bias in multi-document question answering, and boosting limited training data for graph generations. Lastly, to further advance the field of data quality-aware GML, I have outlined multiple future directions to make data quality-aware GML more inclusive, intelligent, and trustworthy.