Running Ahead of Evolution - AI based Simulation for Predicting Future High-risk SARS-CoV-2 Variants

Jun 20, 2023·
Jie Chen
,
Zhiwei Nie
,
Yu Wang
,
Kai Wang
,
Fan Xu
,
Zhiheng Hu
,
Bin Zheng
,
Zhennan Wang
,
Guoli Song
,
Jingyi Zhang
,
Jie Fu
,
Xiansong Huang
,
Zhongqi Wang
,
Zhixiang Ren
,
Qiankun Wang
,
Daixi Li
,
Dongqing Wei
,
Chao Yang
,
Yonghong Tian
· 1 min read
Abstract
The never-ending emergence of SARS-CoV-2 variations of concern (VOCs) has challenged the whole world for pandemic control. In order to develop effective drugs and vaccines, one needs to efficiently simulate SARS-CoV-2 spike receptor binding domain (RBD) mutations and identify high-risk variants. We pretrain a large protein language model with approximately 408 million protein sequences and construct a high-throughput screening for the prediction of binding affinity and antibody escape. As the first work on SARS-CoV-2 RBD mutation simulation, we successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds. Our workflow scales to 4096 NPUs with 96.5% scalability and 493.9× speedup in mixed precision computing, while achieving a peak performance of 366.8 PFLOPS (reaching 34.9% theoretical peak) on Pengcheng Cloudbrain-II. Our method paves the way for simulating coronavirus evolution in order to prepare for a future pandemic that will inevitably take place. Our models are released at https://github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation to facilitate future related work.
Publication
In International Journal of High Performance Computing Applications
The paper received the ACM Gordon Bell COVID Finalist 2022.