Prediction of VRC01 neutralization sensitivity by HIV-1 gp160 sequence features
Gergiev, Ivelin S.
The broadly neutralizing antibody (bnAb) VRC01 is being evaluated for its efficacy to prevent HIV-1 infection in the Antibody Mediated Prevention (AMP) trials. A secondary objective of AMP utilizes sieve analysis to investigate how VRC01 prevention efficacy (PE) varies with HIV-1 envelope (Env) amino acid (AA) sequence features. An exhaustive analysis that tests how PE depends on every AA feature with sufficient variation would have low statistical power. To design an adequately powered primary sieve analysis for AMP, we modeled VRC01 neutralization as a function of Env AA sequence features of 611 HIV-1 gp160 pseudoviruses from the CATNAP database, with objectives: (1) to develop models that best predict the neutralization readouts; and (2) to rank AA features by their predictive importance with classification and regression methods. The dataset was split in half, and machine learning algorithms were applied to each half, each analyzed separately using cross-validation and hold-out validation. We selected Super Learner, a nonparametric ensemble-based cross-validated learning method, for advancement to the primary sieve analysis. This method predicted the dichotomous resistance outcome of whether the IC50 neutralization titer of VRC01 for a given Env pseudovirus is right-censored (indicating resistance) with an average validated AUC of 0.868 across the two hold-out datasets. Quantitative log IC50 was predicted with an average validated R-2 of 0.355. Features predicting neutralization sensitivity or resistance included 26 surface-accessible residues in the VRC01 and CD4 binding footprints, the length of gp120, the length of Env, the number of cysteines in gp120, the number of cysteines in Env, and 4 potential N-linked glycosylation sites; the top features will be advanced to the primary sieve analysis. This modeling framework may also inform the study of VRC01 in the treatment of HIV-infected persons. Author summary The two Antibody Mediated Prevention (AMP) clinical trials are testing whether intravenous infusion of VRC01 (an antibody that can neutralize most HIV-1 viruses) can prevent HIV-1 infection. Since the outer envelope (Env) protein of HIV-1 is highly genetically diverse, the AMP trials will evaluate in an amino acid sequence sieve analysis whether VRC01 prevents infection differentially depending on Env amino acid features of exposing viruses. To maximize power of sieve analysis, the number of amino acid features tested should be limited to those most likely associated with whether the virus is sensitive to neutralization by VRC01. We used machine learning to analyze a database of 611 HIV-1 Envelope pseudoviruses, with data on how well VRC01 neutralizes each pseudovirus, to identify models that best predict neutralization sensitivity from the amino acid features and to identify the most predictive features. We identified models that could predict HIV-1 sensitivity (as opposed to resistance) to VRC01 very well, and found that several amino acid residues in Env locations where both VRC01 and the CD4 receptor bind were important for making correct predictions. Our modeling approach will enable a focused AMP sieve analysis and may be useful for studying the use of VRC01 in the treatment of HIV-infected persons.