Artificial intelligence transforms how UF EPI researchers forecast and respond to disease

Dr. Blackburn, wearing full protective gear, tests the environment for Bacillus anthracis.
The University of Florida is home to the fastest supercomputer in higher education and 300 faculty members like Jason Blackburn, shown here, are teaching AI courses or using AI in their research. (Photo courtesy of Jason Blackburn)

Highlights:

  • As part of its AI Initiative, the University of Florida has expanded its AI offerings in the curriculum, increased the number of faculty members with AI expertise and made significant investments in computing infrastructure.
  • UF is home to the HiPerGator, the nation’s most powerful supercomputer in higher education.
  • Jason Blackburn, a member of the UF Emerging Pathogens Institute, uses AI to track Anthrax and forecast disease risk.
  • Other EPI members Marco Salemi, Simone Marini and Mattia Prosperi use AI to predict new coronavirus variants.

In southwest Texas, the soil-dwelling bacteria Bacillus anthracis can persist in the environment. Every year, it puts livestock, wildlife and humans at risk of contracting the often-fatal disease anthrax.

But some years are more severe than others. Is there a way to know the likelihood of an outbreak beforehand? Jason Blackburn, Ph.D., a member of the University of Florida Emerging Pathogens Institute, is searching for patterns to help predict risk.

Jason Blackburn stands on a dirt path in front of an off-road vehicle.
Blackburn, seen here in Texas, is the founder of the Spatial Epidemiology & Ecology Research Laboratory (SEER Lab). (Photo courtesy of Jason Blackburn)

And like many other researchers at UF, Blackburn, a professor at the UF College of Liberal Arts and Sciences Department of Geography, has transformed his work with artificial intelligence.

Blackburn’s extensive experience tracking wildlife has perfectly prepared him for studying anthrax in Texas, where disease transmission is intertwined with how animals interact with the landscape.

“We’re kind of looking at it from the bookends. So, from the cellular level out, and from the landscape level in.” Blackburn said. On one end, the researchers characterize the pathogen by sequencing its genome and growing the bacteria to learn about key genes.

Then, on the other end, Blackburn’s lab studies the environments where hosts come in contact with pathogens. Here, machine learning makes a world of difference. This type of computer model mimics the human brain’s ability to learn and is trained on examples, so it can essentially teach itself general patterns. When a researcher presents the model with real data, the algorithm finds an answer based on what it learned in training.

When there’s a massive amount of data, artificial intelligence helps

Overhead view of ten blue-capped vials laid out on a rocky ground.
A network of ranchers and local veterinarians in West Texas call Blackburn’s lab when they have sick animals. The researchers visit to study the situation, provide expertise and collect samples to use in bioinformatics and sequencing efforts. (Photo courtesy of Jason Blackburn)

Blackburn’s lab monitors the disease in collaboration with veterinarians and ranchers in Texas, who receive diagnostic services for their animals in return. The state’s three biggest anthrax outbreaks of the century all occurred within the timeframe that Blackburn and his collaborators have been working in the region: 2001, 2005 and 2019.

To survey the landscape, Blackburn uses remotely sensed data from Google Earth Engine, a massive catalog of information that is easy to access thanks to its powerful cloud-based server. This resource lets the researchers partially automate the process of gathering environmental data to work with.

Blackburn’s lab still runs relatively small studies on desktop computers. But larger projects that need more computing power get pushed to the HiPerGator, UF’s supercomputer. In 2021, the HiPerGator was ranked the nation’s most powerful university-owned supercomputer and has made it possible for labs like Blackburn’s to process vast amounts of data in a reasonably short time.

Armed with information about animal movement, vegetation density and where a pathogen has been found, Blackburn’s team proceeds to conduct space-time analyses. Are there peaks? Is there any seasonality to the disease? If so, does it thrive in the wet or dry seasons?

In Texas, anthrax is most common between May and August. “So, it’s got seasonality, but it’s also episodic,” Blackburn said. “Some years you just get a few cases here and there, and some years you get an explosion of 10,000 animal cases, and then it fades out again.”

To pinpoint an element that can predict that explosion, Blackburn is developing an AI-based model that can forecast the likelihood of an outbreak based on what the first few months looked like.

Two people in protective gear stand beside a utility trailer loaded with a deer carcass and other materials destined for a burn pit.
Working an outbreak includes controlling infection by burning animal carcasses in a pit, to ensure the flames don’t get out of control in a Texas summer. (Photo courtesy of Jason Blackburn)

“So we’re starting to take some of our descriptive studies where we identified some patterns — like, hey, the first hundred days of the year tell us something about the next hundred days — and now, we’re developing some AI-based models to try and forecast phenology,” Blackburn said, referring to the study of cyclical patterns like how a landscape’s vegetation grows and dies back with each year’s changing of the seasons.

These are known as the green-up and brown-down phases, respectively, and can be tracked in satellite images. Pixels with varying levels of greenness correspond to the amount of vegetation in an area.

Blackburn feeds an AI model over 20 years of such data, training it to become familiar with the cycle. Then, he gives the algorithm partial green-up data and asks it to find matches in previous years. Does the green-up resemble that of a high-outbreak year? Or is it more comparable to a low-activity year?

The research aims to develop a way to determine the likelihood of an outbreak with two months’ notice. A major strength of AI, Blackburn noted, is its ability to find patterns in green-up trajectories even with incomplete data.

As part of UF’s growing AI initiatives, Blackburn’s home Department of Geography has developed a new certificate program and a growing set of courses for undergraduates and graduates interested in AI. He also observed that many UF researchers are retooling their machine learning algorithms to run more efficiently on GPU-enabled cluster computers like the HiPerGator.

“You’ll see lots of us collaborating to pull together AI for remote sensing and disease prediction,” Blackburn said. “And also, EPI is so supportive of our labs and the costs associated with this program.”

Using AI to fight disease in Florida

Because Florida’s population includes many different ethnic groups with varied lifestyles, it is nearly impossible for public health officials to develop a single set of interventions that simultaneously reach everyone who needs it.

Thankfully, AI can help experts find ways to tailor and optimize their strategies.

In treating HIV, for example, public health officials have a few different options. These include pre-exposure prophylaxis, a preventative medicine known as PrEP, and campaigns that promote safe sex and testing.

EPI Associate Director of Research Initiatives Marco Salemi, Ph.D., aims to understand what specific geographic areas or conditions correlate with high-risk populations.

Salemi is currently studying Miami and Fort Lauderdale, which both have a high incidence of HIV. This project is in collaboration with EPI members Simone Marini, Ph.D., an assistant professor at the UF College of Public Health and Health Professions, and Mattia Prosperi, the Associate Dean for AI and Innovation at the UF College of Public Health and Health Professions. With grants from the National Institutes of Health, the team is testing what public health measures are most likely to succeed.

“AI can give us a rational way to make public health decisions,” explained Salemi, also a professor at the UF College of Medicine. “When we run possible future scenarios, the algorithm can show us which are most likely to happen and what happens if we put in different measures.”

Artificial intelligence helps not only because it can analyze large amounts of data, but also because it can incorporate heterogeneous datasets. These include different types of information all at once, such as clinic stages, environmental conditions and viral genetic diversity.

“So, all these things can be analyzed simultaneously by AI,” Salemi said. “We can also look at financial parameters, for example, because clearly any kind of public health intervention costs money and requires resources.”

This information helps Salemi, Marini and Prosperi conduct a cost-benefit analysis and ultimately devise practical solutions that make the most of public health resources.

Collecting data faster than it can be analyzed

Artificial intelligence has also proven to be a major boon in processing the overwhelming amount of data biomedical researchers have to work with. In the past, simply collecting data was a significant hurdle. Obtaining the first complete human genome cost billions of dollars. Now, it can be done with only a few thousand dollars.

“Our ability to produce data has increased beyond our wildest dreams,” Salemi said. “We have essentially too much information to be analyzed with the statistical tools we have used in the past.”

The scientific community generated nearly 20 million viral sequences over four years of the COVID-19 pandemic. For comparison, it took 15 years to sequence 50,000 genomes of the Human Immunodeficiency Virus.

Throughout the COVID-19 pandemic, SARS-CoV-2 demonstrated its adaptability. Even as various public health measures were implemented, this virus, which causes COVID-19, has mutated and spread worldwide.

Fortunately, Salemi, Marini and Prosperi have developed algorithms that track how the virus evolves. If a mutation in the genetic code forms a new strain that is more pathogenic or transmissible, the model flags it.

“What really excites me right now is having an algorithm that can very quickly predict new and more aggressive variants,” Salemi remarked. “That’s obviously a good safeguard for the future.”

Salemi said the strengths of AI — its ability to analyze large amounts of data and be taught by example — make it perfect for this application. Classic statistical analysis techniques are tedious and require the analyst to know what they’re looking for and what kinds of questions to ask of the data.

In other words, they need to already have a theory or idea. But “you don’t always know what you don’t know,” Salemi remarked. When there are millions and millions of data points, the human brain might not ask the right questions to find any interesting patterns lurking in that sea of information.

“And if we use statistical methods to try all the possible combinations, even with our supercomputers we would be lost forever,” he said.

HiPerGator AI supercomputer. Photo taken 03-17-22.
Faculty at UF and other Florida academic institutions can use the HiPerGator for teaching and research. (UF/IFAS Photo by Tyler Jones)

But the team’s machine learning models don’t stop at identifying dangerous new strains; he also has algorithms that can find weak spots, helping scientists develop specific drugs that target the latest variations.

“When we understand exactly how these mutations impact the pathogen structure, for example, we can find ways to develop drugs that will specifically target that particular mutated protein,” Salemi explained. The researchers can even build dynamic models that simulate what happens when a virus encounters different drugs.

This, Salemi said, highlights the importance of working across disciplines. After they’ve finished modeling, the task of developing the drug belongs to the chemist.

“Everything that we do in terms of facing and tackling the challenges of infectious diseases needs to be comprehensive and multidisciplinary. Because it’s not just about finding a cure or vaccine,” Salemi said. “It’s about making sure that whatever we find can be implemented at the societal level.”

Salemi remarked that the EPI is especially well-positioned for this kind of work because it can leverage the enormous number of resources UF has invested in building a strong AI community. He added that putting together a multidisciplinary team including molecular biologists, epidemiologists and computer scientists through the EPI has been the key to his, Marini’s and Prosperi’s success.

“This integration of disciplines is necessary to face the challenges of the twenty-first century,” he said.


Written by: Jiayu Liang