HMC Students Examine COVID-19 Data in New Course

Harvey Mudd College students are digging into COVID-19 data in real time in a new five-week spring semester course led by Weiqing Gu, McAlister Professor of Mathematics.

COVID-19: Data Analytics/Machine Learning is a project-based, online course that is challenging a multidisciplinary group of 36 upper-level students to use materials collected from an international dataset to study the novel coronavirus. They are employing big data analytics and machine learning techniques to process the data, identify its key features and infer, predict, integrate, classify and extract unique insights from the COVID-19 Open Research Dataset, a free resource. The dataset contains scholarly articles about SARS-CoV-2 and related viruses in the broader coronavirus group and is the most extensive collection of machine-readable coronavirus literature.

One of the primary goals of the course is to contribute to existing research and data analysis to help the science community understand data genetics, incubation and symptoms of the virus, as well as to fill gaps related to the novel coronavirus as scientists pursue knowledge around prevention, treatment and a vaccine.

Gu developed the online course after hearing students were disappointed to lose in-person, on-campus research opportunities for the remainder of the spring semester due to the COVID-19 pandemic. In addition to specializing in differential geometry and topology, applied to big data analysis, computer-aided design and robotics, Gu also researches applications to math-biology. She has created mathematical models to illustrate treatment-resistant strains of HIV and has used numerical calculations to analyze the behavior of the mutated HIV strains and to examine the effects of various treatment regimens on those strains.

Based on her work modeling strains of HIV, Gu has been building a model of COVID-19 using geometric techniques to determine patterns in its structure that are unique to the virus. She and her students are discussing this structure and studying its impact.

“During our weekly meetings, we go through the data and calculate the virus’ spread,” Gu says. “Students can see that it is getting slower.”

Students are also investigating a connection between the virus and global warming and climate change, and are discussing ways they think the virus might be controlled.

Gu encourages the use of web applications and resources. One weekly assignment involves examining data for trends. Students collect COVID-19-related posts from Twitter and convert them into vectors based on the topic of the tweet. They then analyze models to look for structure in the discourse on Twitter. Jupyter Notebook, an open source, web-based application, is used for analysis assignments, allowing students to create and share documents with live code. Students also use PyTorch, an open-source machine learning library, to construct feedforward neural networks and recurrent neural networks, a graphed function that allows Gu and her students to examine trends or predicted outcomes in data sets.

“I saw Prof. Gu’s Big Data class as a good way to become familiar with the math side of machine learning,” Josh Cordova ’22 says. “Being focused on such a relevant topic made it even more appealing.”

Because she wants students to become comfortable using GitHub, a popular tool used to develop and deploy software, Gu has students host their code, reading summaries and final projects on the platform. Once the course is complete, Gu and her students will compile their research from the course on a GitHub page to make it accessible to the research community.