Understanding salmonella infections in Florida

A medical illustration of nontyphoidal Salmonella bacteria.
A medical illustration of nontyphoidal Salmonella bacteria, the causative agent of salmonellosis or salmonella food poisoning. Here, the organisms’ flagella protrude in all directions from the cell wall, and numerous fimbriae give a furry look to its exterior. (CDC PHIL)

If you’ve ever fallen ill with foodborne salmonellosis, you won’t soon forget the days of gastrointestinal distress — abdominal pain, diarrhea and vomiting — plus fever and chills. Unfortunately, Floridians bear an unfair share of this illness, experiencing twice the per capita rate as the nation, new research says.

Investigators in the University of Florida’s Institute of Food and Agricultural SciencesFood Systems Institute, and the Emerging Pathogens Institute, recently published two papers that offer the most detailed analysis yet of salmonellosis in Florida. The work was a collaboration with the Florida Department of Health as part of the Centers of Excellence in Food Safety which is funded by the US Centers for Disease Control and Prevention.

In one paper, the researchers analyzed data collected by the FDOH for geographic, demographic and temporal patterns in cases and outbreaks in the last decade. In the second paper, they analyzed the molecular epidemiology of cases in 2017 and 2018 to compare the diversity and abundance of different kinds of Salmonella known from Florida with those known nationally. They also developed new methods to quickly detect outbreaks, clusters of cases that occur over a longer time span, or even common exposures from food, animals or other reservoirs.

Salmonellosis is caused by Salmonella bacteria, which includes around 2,600 different serotypes. A serotype refers to a unique set of surface antigens on a microorganism, and it is one way of grouping pathogens in categories smaller than species. Various Salmonella serotypes are associated with specific foods or other sources and can be used to demonstrate links between individual cases, detect outbreaks, and even link outbreaks to sources.

“The state records so many cases that they can only perform whole-genome sequencing for some of the samples,” says study author Nitya Singh, a research scientist with both IFAS and EPI. “What we’ve done is analyze their data to better understand how these data might inform public health action.”

Age, seasons, and regions

The study found that children under the age of five are infected more frequently than any other age group; they comprise 40.9% of the state’s nearly 63,000 Salmonella infections reported between 2009 and 2018. While the researchers did not investigate the reasons for this age-based difference, Singh offers that this is a common finding worldwide.

They also found that salmonellosis cases peak annually in Florida between August and October. Singh says this could be related to climate: these months tend to see high average temperatures and precipitation. The timing also aligns with late hurricane season, and other research ties foodborne illness spikes to extreme weather events.

Over time, the researchers detected a slight decrease in the rate of salmonellosis: from 2009 to 2016 it fell by 23% before ticking back up again. The northeast and northwest regions of Florida also experienced higher rates of salmonellosis during the years examined, see figure below, though the reasons remain unclear.

The incidence rate of salmonellosis by Florida county in 2017 and 2018.
The incidence rate of salmonellosis by Florida county in 2017 and 2018. (Figure courtesy of the study authors.)

Molecular epidemiology 

The researchers analyzed the state’s data and found little difference between the serotypes infecting young children and all other age groups. But they did find large differences between serotypes that are common nationally and those common to the Sunshine State, as the graphic below shows.

The major difference in Florida is a high prevalence of the Sandiego serotype which is almost nonexistent nationally. A serotype named Braenderup is also more prominent in Florida than the broader US. But the top two serotypes found in Florida, Enteritidis and Newport, match the top two found nationally.

Because Florida records so many Salmonella cases annually, it is not resource-efficient for the department of health to perform genetic sequencing in each case. But the study authors determined that the sequences obtained by the state are likely highly representative of all Salmonella cases that occur in the general population.

A new way to detect outbreaks

Various molecular tools help researchers to investigate an organism’s genetics. One tool that sifts through about 5 million DNA base pairs down to roughly seven key digits is multi-locus sequence typing, or MLST. But while useful for identifying Salmonella serotypes, its resolution is too coarse to compare if several isolates are genetically linked, as would occur in an outbreak situation.

A tanglegram from the paper shows the prevalence of Salmonella serotypes found in Florida and nationally.
A tanglegram from the paper shows the prevalence of Salmonella serotypes found in Florida and nationally. (Figure courtesy of the study authors.)

In other words, MLST methods will discern if a sample is the Sandiego serotype, but not whether two separate Sandiego isolates are genetically close enough to have come from the same source. For this level of detail, many investigators turn to single nucleotide polymorphism-based methods, or SNP analyses, which identify single changes in DNA base pairs.  But SNP-based work is too time and resource-intensive to use for the thousands of Salmonella isolates sequenced annually in Florida.

“The whole point of detecting an outbreak is to determine which cases share a close genetic relationship, and to figure this out quickly,” Singh says. “We had to rethink how to do this. We needed a fine-resolution tool to quickly search for genetic relatedness between cases and detect outbreaks. But SNP-based methods are too slow, our computers would be running for months.”

The challenge: How to parse terabytes of data and unearth genetic connections in drastically less time?

“We needed to narrow our fishing nets,” Singh says. “Finding genetic links is the ultimate proof that cases are linked, and with our method, you can’t miss it.”

First, the team first used an advanced MLST method that looks only at genes conserved in the core genome, to quickly search for links between thousands of isolates and type them. Second, using an AI-based machine learning algorithm called hierarchical clustering, the team analyzed the state’s sequencing data to group Salmonella isolates that shared the common feature of identical variations in up to five alleles, which are variations of a single gene that occur in the same place on a chromosome. This approach forms the core of the researchers’ proposed new two-step method.

This diagram of multi-locus sequence typing profiles shows the relatedness of various Salmonella isolates.
This diagram of multi-locus sequence typing profiles shows the relatedness of various Salmonella isolates. (Figure courtesy of study authors)
A hierarchical clustering diagram that shows the relatedness of Salmonella isolates.
This hierarchical clustering diagram also shows the relatedness of Salmonella isolates, but it uses a core genome MLST, which yields a finer resolution and includes machine learning to assign hierarchical clusters.

“Whole-genome sequencing has brought the ability to obtain and use sequence data from the entire genome,” says coauthor Arie Havelaar. “This of course provides a much higher level of resolution, but it also adds to the complexity.”  Havelaar is a UF professor of microbial risk assessment and epidemiology of foodborne disease, and he was hired under UF’s preeminence initiative. He is recognized as an international expert on food safety.

AI-based hierarchical clustering allows the researchers to refine their search within an isolate’s genome from about 5 million data points down to about 3,000. It then compares this sifted-down sequence data and uses machine learning to identify genetic relationships. Last, the researchers then used SNP-based phylogeny to further explore genetic relatedness at the level of individual base pairs and to validate the clustering approach.

Linking cases to sources

The study authors say their new two-step approach has different uses.

 “It can identify possible outbreaks within the traditional time window of 60 days. But we also used the method for detecting clustered series of cases occurring over much longer periods of time, up to 18 months. And we aimed to identify possible sources of these outbreaks and case clusters,” Singh says.

To identify possible sources, the team took the extra step of retroactively linking cases from patient clusters of Salmonella Enteritidis in Florida with samples obtained from food and the environment using the same two-step approach.

“Most of the time cases linked back to chicken meat,” Singh says. These chicken meat isolates came from Florida but also many other states, which suggests persistent issues in the poultry meat supply chain that repeatedly cause human illness.

Havelaar says the new work underlines the efficiency of the combination approach, which can be undertaken by any lab using publicly available genome and sequencing data.

“One challenge in current source tracking efforts is that there is limited genetic data on Salmonella isolates from different sources,” he says. “Data from chicken and other meats are routinely generated by the USDA Food Safety Inspection Service but much fewer data are available on other foods. If we were to systematically survey for Salmonella in more possible sources, we would greatly enhance our ability to quickly link human cases to sources.”

Written by: DeLene Beeland