Secure learning in adversarial environments
Machine learning has become ubiquitous in the modern world, varying from enterprise applications to personal use cases and from image annotation and text recognition to speech captioning and machine translation. Its capabilities in inferring patterns from data have found great success in the domains of prediction and decision making, including in security sensitive applications, such as intrusion detection, virus detection, biometric identity recognition, and spam filtering. However, strengths of such learning systems of traditional machine learning are based on the distributional stationarity assumption, and can become their vulnerabilities when there are adversarial manipulations during the training process (poisoning attack) or the testing process (evasion attack). Considering the fact that the traditional learning strategies are potentially vulnerable to security faults, there is a need for machine learning techniques that are secure against sophisticated adversaries in order to fill the gap between the distributional stationarity assumption and deliberate adversarial manipulations. These techniques will be referred to as secure learning throughout this thesis. To conduct systematic research for this secure learning problem, my study is based on three components. First, I model different kinds of attacks against the learning systems by evaluating the adversaries’ capabilities, goals and cost models. Second, I study the secure learning algorithms that counter any targeted malicious attacks by considering the specific goals of the learners and their resource and capability limitations theoretically. Concretely, I model the interactions between the defender (learning system) and attackers as different forms of games. Based on the game theoretic analysis, I evaluate the utilities and constraints for both participants, as well as optimize the secure learning system with respect to adversarial responses. Third, I design and implement practical algorithms to efficiently defend against multi-adversarial attack strategies. My thesis focuses on examining and answering theoretical questions about the limits of classifier evasion (evasion attack), adversarial contamination (poisoning attack) and privacy preserving problem in adversarial environments, as well as how to design practical resilient learning algorithms for a wide range of applications, including spam filters, malware detection, network intrusion detection, recommendation systems, etc. In my study, I tailor my approaches for building scalable machine learning systems, which are demanded by modern big data applications.