0% found this document useful (0 votes)
12 views8 pages

pnas.202411894

This research article discusses the use of whole-genome sequencing (WGS) to analyze foodborne illness cases caused by pathogens like Salmonella and E. coli. It highlights that many foodborne illnesses are sporadic and linked to long-duration contamination episodes, particularly affecting younger age groups. The study emphasizes the importance of genomic data in identifying contamination sources and understanding the epidemiology of foodborne diseases.

Uploaded by

Robert Stryjak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

pnas.202411894

This research article discusses the use of whole-genome sequencing (WGS) to analyze foodborne illness cases caused by pathogens like Salmonella and E. coli. It highlights that many foodborne illnesses are sporadic and linked to long-duration contamination episodes, particularly affecting younger age groups. The study emphasizes the importance of genomic data in identifying contamination sources and understanding the epidemiology of foodborne diseases.

Uploaded by

Robert Stryjak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

RESEARCH ARTICLE | MICROBIOLOGY OPEN ACCESS

Genomic perspectives on foodborne illness


David J. Lipmana,1, Joshua L. Cherryb,c , Errol Straina, Richa Agarwalab, and Steven M. Mussera

Affiliations are included on p. 8.

Contributed by David J. Lipman; received June 14, 2024; accepted September 16, 2024; reviewed by Edward G. Dudley, Lance B. Price, and Abigail Snyder

Whole-­genome sequencing of bacterial pathogens is used by public health agencies


to link cases of food poisoning caused by the same source of contamination. The vast Significance
majority of these appear to be sporadic cases associated with small contamination epi-
sodes and do not trigger investigations. A “contamination episode” refers to one or While outbreaks of foodborne
more contamination events from a single source over a period of time. We examine illness receive much attention,
clusters of sequenced clinical isolates of Salmonella, Escherichia coli, Campylobacter, they include only a small fraction
and Listeria that differ by only a small number of mutations (SNPs) to identify features of cases. To learn more about
of the underlying contamination episodes. These analyses provide additional evidence nonoutbreak cases, we use
that the youngest age groups have greater susceptibility to infection by Salmonella, E. bacterial genomes generated for
coli, and Campylobacter than older age groups. This age bias is weaker for the com- outbreak detection and identify
mon Salmonella serovar Enteritidis than Salmonella in general. A large fraction of the
clusters of closely related
contamination episodes causing sickness appear to have a long duration. For example,
disease-causing isolates. Small
50% of the Salmonella cases are in clusters that persist for almost 3 y. For all four
pathogen species, the majority of the cases were part of genetic clusters with illnesses clusters account for most cases.
in multiple states and likely to be caused by contaminated commercially distributed A high fraction have cases from
foods. Salmonella infections in infants under 3 mo are predominantly acquired from multiple states, suggesting
the same contaminated food, pet food, or environmental sources as older individuals, contamination at central food
rather than infant formula contaminated during production. distribution sites. As previously
observed, illness is especially
bacterial pathogens | food safety | whole genome sequencing
common in the very young, at
least partially due to greater
Since 2013, the Centers for Disease Control (CDC), Food and Drug Administration (FDA), susceptibility. Most cases in
and the United States Department of Agriculture (USDA) have been using whole genome infants, who consume only
sequencing (WGS) of pathogens to detect foodborne outbreaks and to trace the source of breast milk and infant formula,
contamination (1). Because isolation of the pathogen and subsequent WGS has become routine cluster with cases from older
for reportable illnesses, we now have extensive sequence databases of Salmonella, Escherichia people, implicating cross-
coli, Campylobacter, and Listeria genomes. While the use of WGS has allowed CDC and FDA
contamination from noninfant
to detect foodborne outbreaks earlier and to improve the success rate for identification of the
food or early feeding of
contamination source (2), the associated sequence databases also provide a way to study sporadic
cases of foodborne illnesses that are far more prevalent than outbreak cases (3). complementary foods. Analysis
If the genome sequences of a set of clinical isolates differ by only a small number of of such genetic clusters is a
mutations (SNPs) (e.g. by ≤4 single nucleotide polymorphisms, SNPs, for the entire genome valuable tool for studying
sequence), they are very likely to share the same source of contamination (4–8). This is the sporadic food poisoning.
central basis for using WGS in food safety. The contamination can occur at any stage of the
food production, distribution, or preparation process, including within the consumer’s
household. For the purposes of this analysis, we will define a contamination episode as the
set of contamination events caused by a single source. The term “episode” is used because
the contamination events from this source may occur over a period of time.
Preprint server: https://www.medrxiv.org/content/10.1101/
The pathogen genomes in CDC’s PulseNet database (9) isolated from clinical cases of 2024.05.16.24307425v1.
foodborne illness represent a large subset of the servings of food impacted by contamina- Author contributions: D.J.L. designed research; D.J.L., J.L.C.,
tion episodes (4). These are the cases where: E.S., and R.A. performed research; D.J.L., J.L.C., E.S., and R.A.
analyzed data; and D.J.L., J.L.C., and S.M.M. wrote the paper.
Reviewers: E.G.D., The Pennsylvania State University—
• An individual consumed a serving of food with a high enough level of contamination University Park Campus; L.B.P., George Washington
to cause sufficient illness to seek medical care. (Note that the level of contamination University; and A.S., Cornell University.
needed to cause illness can vary related to patient and pathogen factors.) The authors declare no competing interest.
Copyright © 2024 the Author(s). Published by PNAS.
• The laboratory tests were sufficiently sensitive to detect the reportable pathogen. This open access article is distributed under Creative
Commons Attribution-­NonCommercial-­NoDerivatives
• The patient sample was submitted to the state public health lab for sequencing. License 4.0 (CC BY-­NC-­ND).
1
To whom correspondence may be addressed. Email:
[email protected].
A number of estimates have been made for the actual number of foodborne illness cases
This article contains supporting information online at
from major pathogens, i.e., the multiplier for the reported cases. As an example, CDC https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.​
estimates that the actual number of cases due to Salmonella is approximately 29 times the 2411894121/-­/DCSupplemental.
number of reported cases (10, 11). In larger contamination episodes, many food servings Published November 5, 2024.

PNAS 2024 Vol. 121 No. 46 e2411894121 https://doi.org/10.1073/pnas.2411894121 1 of 8


Timeline of Selected Salmonella Clusters

Typhi, n=82, states=22, max/month=5

Newport, n=5, states=5, max/month=1

Javiana, n=179, states=14, max/month=87


2017 2018 2019 2020 2021 2022 2023
Date
Fig. 1.   Timelines for three Salmonella clusters. Vertical tick marks indicate pathogen isolation dates. Within each timeline, cases occurring within the same state
have the same color. The timeline labels indicate the serovar, the number of isolates in the cluster, the total number of states with cases from that cluster, and
the maximum number of cases per month.

may be contaminated at levels sufficient to cause illness in most to 5 million aligned bases. We continue to add cases to the cluster
consumers. But other affected servings may have relatively small as long as they are within this threshold distance from at least one
levels that have a low probability of causing symptoms in healthy case already in the cluster, i.e., we perform single linkage cluster-
adults. However, in more susceptible individuals (e.g., young chil- ing. The results presented here are based on a four SNP threshold
dren and the elderly), these levels may have a higher probability but using smaller or larger thresholds did not qualitatively change
of causing disease. In countries with modern food production the results (Materials and Methods and SI Appendix).
systems and active safety programs, such as the US, one might Timelines for a few Salmonella clusters are depicted in Fig. 1.
expect most contamination episodes to be small—impacting only Each case/isolate is represented by a short vertical line at a position
a small number of servings and with infectious doses that have corresponding to its isolation date. Within each cluster, cases from
only a low probability of causing illness in most individuals. These the same state are given the same color. Though it contains only
smaller contamination episodes are called sporadic food poisoning five isolates, the Newport cluster is spread over approximately 4 y
and are known to be responsible for the vast majority of foodborne and each case is from a different state. The 82 Typhi cluster isolates
illness (3). are spread over 8 y and 22 states. Below we analyze these and other
Sporadic food poisoning, because it does not trigger an epide- characteristics of the clusters across the entire dataset.
miological investigation, has largely been studied by case–control The Fig. 2 is a cumulative frequency plot of the fraction of all
studies. Because the genome sequences now allow us to identify cases in clusters of size 1 on up to size 30 for Salmonella, E. coli,
clusters of cases associated with a contamination episode, we can Campylobacter, and Listeria. For Salmonella, we separate out
analyze these contamination episodes more directly. In other Enteritidis isolates because the assumption that clinical isolates
words, the WGS data help us to connect the genetic clusters of differing by a small number of SNPs generally share the same
clinical cases to the underlying contamination episodes. Note that proximal source of contamination may be less valid for Enteritidis
the approach described here does not fully account for contami- than the other foodborne pathogens (Discussion). Other than
nation episodes that are polyclonal since each of the strains in Enteritidis, the four pathogen species have over 50% of all cases
polyclonal contamination episodes will appear as a single cluster in singleton clusters, i.e., these cases are more than four SNPs
(12, 13).
By examining the composition of these clusters, we can obtain
estimates of the fraction of contamination episodes occurring 100
upstream of the distribution from a central source of production
Cumulative Frequency (%)

because clinical cases from the same cluster are found in multiple 80
states. Likewise, by using the isolation dates of the clinical cases
in a cluster, we can observe the persistence of the contamination
60
episode. Finally, by using the ages of the members of a contami-
nation cluster, we can get important clues as to whether the serving
of food was contaminated prior to entering the household or 40
whether there was cross-contamination of the serving from other
Salmonella (Ent.) Escherichia coli
foods or environmental sources in the household. 20 Salmonella (non-Ent.) Listeria
Campylobacter

Results 0
3 6 9 12 15 18 21 24 27 30
The Materials and Methods section describes how clusters of clin- Cluster Size
ical food poisoning cases are generated. Briefly, we use a threshold
genetic distance for pairs of genomes that is low enough to infer Fig. 2.   Cumulative frequency of cases by cluster size. Cumulative frequency
plot showing the fraction of all cases in clusters of size 1 up to size 30 for
a high likelihood that the associated pathogen isolates are derived Salmonella, E. coli, Campylobacter, and Listeria. Enteritidis isolates are separated
from the same source of contamination, e.g., four SNPs out of ~2 from other Salmonella serovars due to their distinct clustering behavior.

2 of 8 https://doi.org/10.1073/pnas.2411894121 pnas.org
Salmonella Campylobacter
3
Adjusted % of Total Counts

Adjusted % of Total Counts


8 Large Clusters (>10, excluding Enteritidis) Large Clusters (>10)
Enteritidis (all clusters) All Clusters
All Clusters (excluding Enteritidis)
6
2

4
1
2

0 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Age Age

Escherichia coli Listeria


10
Adjusted % of Total Counts

Adjusted % of Total Counts


Large Clusters (>10) Large Clusters (>10)
6 All Clusters All Clusters
8

4 6

4
2
2

0 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Age Age
Fig. 3.   Age distribution for all cases versus those in large clusters (>10). Age distribution of all cases versus large clusters (cluster size >10, red lines) across all
four pathogens. The case counts have been normalized based on the US population for different age groups. For Salmonella, the vertical bars and the red solid
line are excluding Enteritidis and the dashed black line is for Enteritidis (all cluster sizes). The curves for large clusters (red line) and for Enteritidis (black dashed)
were smoothed for legibility with a Savitzky–Golay filter (a window of seven for red and five for black and a third-­order polynomial).

away from all other cases, and over 75% of the cases are in clusters The age distributions for Salmonella in Fig. 3 are consistent with
of size 10 or less. Among the major Salmonella serovars, Enteritidis those reported in published surveillance studies though the latter
stands out in having a far higher fraction of its cases in large do not break them down into, e.g., large and small cluster sizes
clusters (Fig. 3, black dashed line) and accounts for 40% of the (16). For Salmonella and E. coli, and to a lesser extent,
Salmonella cases in clusters larger than size 10. Campylobacter, the elevation of case rates among the younger age
We will examine some of the properties of clusters of different groups is notably less pronounced for the larger cluster sizes (red
sizes to determine whether there are epidemiological signals asso- line) than for total cases, i.e., there is a substantially higher fraction
ciated with the computed clusters. Fig. 3 shows the age distribu- of young individuals in the smallest clusters which dominate the
tion of the cases for all four pathogens in all cluster sizes (blue overall counts (Fig. 2).
bars) compared with larger clusters (size >10, red line). For Is a different composition of pathogen strains in the smaller
Salmonella, we separate Enteritidis in all cluster sizes (dashed black clusters responsible for the greater proportion of young individuals
line) from the rest of the serovars. The case counts have been in the small clusters? The three stacked bar charts in Fig. 4 show
normalized for the age population structure (14). E. coli and the serovar composition of all Salmonella cases, for individuals of
Campylobacter show an increasing number of cases from approx- age ≦1, and for cases in clusters of size 1 (all age groups). The
imately age 10 down to age 1 before a drop for infants under 1 y serovars have been grouped into four abundance categories based
old. Salmonella (excluding Enteritidis) is similar, though case on total counts:
counts appear to be increasing through age 0. However, nonmono-
1. Enteritidis (17.4% of cases),
tonicity becomes apparent when ages less than 1 y are binned more
finely (SI Appendix, Fig. S1). Thus, the incidence of Salmonella 2. the next five most common serovars (36% of cases with
also decreases as age approaches birth, but the downturn occurs Newport: 11.55%, Typhimurium: 9.84%, Javiana: 5.84%, I
at 4 mo of age and is not apparent in Fig. 3. 4:I:-­: 4.74%, and Infantis: 4.04%),
For Salmonella there is a subtle increase in older adult cases 3. the next 22 most common (29.6% of cases), and
while for E. coli, there is a small increase in the 18 to 30 y age 4. the next 838 serovars (17.2% of cases).
interval followed by a decrease to a plateau from age 40 upward.
For Salmonella, E. coli, and Campylobacter we see a trough in case Both the infant set and the cluster size =1 set have a higher pro-
counts between the infant/young child peak and the teenage years. portion of rare serovars and a lower proportion of Enteritidis than
Listeria looks quite different, with case counts increasing with age the overall Salmonella set (Fig. 4). And the overall serovar com-
along with a narrow peak associated with newborn listeriosis (15). positions of these sets are extremely similar as shown in the scatter

PNAS 2024 Vol. 121 No. 46 e2411894121 https://doi.org/10.1073/pnas.2411894121 3 of 8


100 Most Common
cases from multiple states are members of the same cluster.
17% Common Geographical dispersion is likely to be underestimated for the
24% 22%
Enteritidis same reasons noted above for cluster persistence.
80 Rare
Percentage of Cases

17% 6% The blue line in Fig. 7 shows how the fraction of clusters with
5%
cases from at least two different states changes with increasing
60 cluster size. Larger clusters are more likely to be multistate for the
33% 35% trivial reason that, with more cases, there is a greater chance of
30%
including a case from a different state. We can control for this
40
trivial explanation by examining all pairs of cases within clusters
of a given size, to determine whether the fraction of multistate
20 36% 38% 37% pairs varies systematically with cluster size (red line in Fig. 7).
The red line shows that the contamination episodes underlying
the larger clusters are inherently more likely to be distributed from
0
All Cases Age 1 Cluster Size 1
a central site. While larger clusters are more likely to be geograph-
ically dispersed than smaller clusters, for Salmonella, over 30% of
Fig. 4.   Salmonella serovar composition for all cases, age ≦1 y, and cluster size the clusters of size 2 are multistate and by cluster size 3, over 50%
=1 (all ages). Salmonella serovars were grouped into four abundance categories are multistate (Fig. 7, blue line). Similar results were obtained for
with respect to all cases: Enteritidis, the next five most common serovars
(Newport: 11.6%, Typhimurium: 9.8%, Javiana: 5.8%, I 4:I:-­: 4.7%, and Infantis: Listeria, Campylobacter, and E. coli (SI Appendix, Fig. S4).
4.0%), the next 22 most common, and the next 838 serovars. Fig. 8 compares the persistence of single-state versus multistate
clusters for all four pathogens. We restrict the analysis to clusters
plot in SI Appendix, Fig. S2. The inverse relationship between of size 2 or 3 since the fraction of single-state clusters is substan-
serovar diversity and age is consistent with previously published tially lower with increasing cluster size (Fig. 7). As noted above,
results (17) and we provide an analysis of diversity by cluster size while the contamination for some of the single-state clusters may
in SI Appendix and Fig. S3 Thus a possible explanation for the age be distributed from, e.g., a central site, most of the multistate
distribution results seen in Fig. 3 is that the younger age groups clusters, however, are likely to have been distributed from central
are more susceptible to and/or more frequently exposed to the sources. The degree of shift to higher persistence of the multistate
serovars seen primarily in the smaller clusters e.g. perhaps from clusters compared to single-state appears roughly similar for all
environmental sources. four pathogens and is highly significant (all P-values < 10−4,
The Fig. 5 compares the fraction of cluster size 1 cases by age Mann–Whitney U test).
range for the same categories of serovars. The age ranges were We can also examine how the geographical dispersion of cases
chosen to include approximately the same numbers of cases and within a cluster varies among age groups (SI Appendix, Fig. S5). The
the fractions were normalized using the mean for each serovar youngest and oldest age groups have the highest fraction of cases in
category. We see the same pattern for all four categories of serovars, single-state clusters, which is consistent with their skew toward
i.e., there is a higher fraction of younger cases and of older cases smaller clusters (Figs. 3 and 5) and the relationship between cluster
in the smaller clusters. Thus, the higher frequencies of younger size and geographical dispersion (Fig. 7). The fraction of cases in
cases in small clusters seen in Fig. 3 (and to a lesser extent, older clusters where all cases in the cluster are within the same age range
cases) seem to be an inherent property of small clusters and do correlates very well with the geographical dispersion (Pearson
not primarily depend on the mix of serovars. Correlation 0.84 P-value 0.0024) and is only slightly higher than
Because we know the dates of collection of the clinical isolates, the fraction of cases in clusters where all cases are in the same state
we can examine the persistence of the clusters greater than size 1, and same age range. Thus, although overall only 22% of Salmonella
i.e., the number of days between the first isolate collected and the isolates are in single-state clusters, contamination episodes resulting
most recent isolate in the cluster. Fig. 6 shows the cumulative in same-age clusters are virtually all single-state (SI Appendix,
distribution of cluster persistence time among cases. Median per- Fig. S5). This would be consistent with these contamination
sistence times are highest for Salmonella and lowest for E. coli.
Note that we are underestimating the persistence of clusters begin-
ning or ending outside of the dates of our sample collection. An Enteritidis Common
even greater number of cases are likely to be missed because, as Most Common Rare
Normalized % (1 = Avg.)

1.2
mentioned in the Introduction, they have escaped detection by
PulseNet surveillance. 1.1
We can also examine nonsingleton clusters to see whether mul-
tiple US states are represented within them. The majority of cases
1.0
were found in multistate clusters, with all four pathogens showing
a high proportion of cases occurring in multiple states:
0.9
• Salmonella (78%)
• Listeria (70%) 0.8
• E. coli (65%)
8

0
12
2

-2

-4

-5

-6

-9
0-

• Campylobacter (63%)
2-

12

28

44

57

69

Age Group
While some contamination episodes are likely from a source in
the local environment, a majority of the contamination episodes Fig. 5.   Normalized fraction of cluster size 1 cases by age range and serovar
appear to be geographically dispersed. That is, the contaminated category. Normalized fraction of cluster size 1 cases by age range and serovar
category. The number of serovars and the percent of cases they account for
servings of food are likely to have been distributed from a central are listed in the legend. Fractions for each serovar were normalized by the
point source (e.g., a food production or packaging facility) because mean for each serovar category.

4 of 8 https://doi.org/10.1073/pnas.2411894121 pnas.org
100

Cumulative Percentage of Cases


80
190 days

60
50% of cases

40 1071 days
Salmonella
841 days Campylobacter
20 Escherichia coli
385 days
Listeria
0
0 500 1000 1500 2000 2500 3000 3500 4000
Persistence (days)

Fig. 6.   Persistence of clusters. Cumulative distribution of cluster persistence: The cumulative fraction of cases within nonsingleton clusters, showing the range
of cluster persistence measured by the number of days from the first to the last isolate collected within each cluster. The graph highlights the distribution of
persistence times across all observed clusters, with a dashed horizontal line indicating the median persistence level, where 50% of cases are found in clusters
with a duration exceeding this value.

episodes occurring in settings such as elder care facilities, daycare foodborne outbreaks that are investigated by the CDC, FDA,
facilities, schools, and direct environmental exposure. USDA, as well as state and local health authorities (10). The vast
The diet of the age group less than or equal to 3 mo old is distinct majority of food poisoning cases are classified as sporadic cases,
from all other age groups in that it consists primarily of breast milk which have primarily been studied through case–control methods
and/or infant formula: Only approximately 15.6% of infants receive and routine surveillance systems (3, 19–21). By analyzing the
complementary foods prior to 4 mo of age (18). Infant formula is clusters formed from closely related pathogen genomes of clinical
distributed nationally from a small number of production facilities. isolates within CDC’s PulseNet system, we can derive a more
If infant formula were a significant source of Salmonella contami- detailed picture of the contamination episodes underlying food-
nation then one might expect to see a substantial fraction of mul- borne illness in the United States.
tistate clusters solely composed of infants, contrary to the results The clusters of a given size shown in Fig. 2 are a mixture of
described above (SI Appendix, Fig. S5). However, of the 7,994 infant contamination episodes with a range of different sizes since, as
cases in clusters ≥size 2, the largest infant-only cluster has eight cases noted in the Introduction, only a variable fraction of the cases
and all clusters ≥size 4 are single-state clusters. Note that because would be reported within the PulseNet system. In addition, the
we are focusing on the possibility of infant formula as a source of SNP threshold used to generate the clusters may be too stringent
contamination, we are including those individuals less than 1 y old in some instances and thus split cases from the same contamina-
rather than ≤3 mo old since this is the age in which most children tion episode into different clusters, while in other instances it may
transition from infant formula to cow’s milk. be too high and thus merge cases from different episodes. This is
Both of these points provide strong support for the conclusion particularly an issue for Enteritidis. Eggs and poultry are the most
that Salmonella contamination of infant formula at the production common source of Enteritidis cases in the United States (22), and
site could only be responsible for a very small fraction of cases, if because the same breeder site may supply multiple poultry pro-
any, and any possible contamination episodes from this source duction facilities, identical or nearly identical strains may be found
would be exceedingly small. at these separate production facilities (23–25). Thus, Enteritidis
If infant formula contaminated at the site of production is not a clusters formed using low SNP thresholds may lump together
major source of salmonellosis then what is the source of contami- multiple independent proximal contamination episodes and the
nation for cases at age ≤3 mo? The diets of infants under 4 mo of interpretation of the Enteritidis results must account for that pos-
age are largely restricted to infant formula and breast milk, neither sibility (26, 27). Note also that some US states began the use of
of which are commonly consumed by individuals of age 10 or older. WGS before others so the coverage is more comprehensive later
Co-occurrence of these different age groups in a cluster therefore in the time interval of the sample set (Materials and Methods).
suggests cross-contamination between noninfant food and infant Furthermore, since we are likely to be missing cases from contam-
formula or breast milk, early complementary feeding, or exposure ination episodes toward the beginning and ends of the time inter-
to a common nonfood source of contamination. Over 80% of the val corresponding to our sample collection, the clusters associated
infants ≦3 mo of age are cluster members with individuals older with these episodes will be incomplete. Despite these caveats, the
than 10 y old (SI Appendix, Fig. S6). Furthermore, the fraction of results presented here demonstrate substantial differences in the
cluster membership with older individuals only increases slightly epidemiological properties of the clusters of different sizes, i.e.,
with young children whose dietary intake is more similar to that of there are strong, epidemiologically relevant signals associated with
older children and adults, e.g., 2 to 5 y of age (SI Appendix, Fig. S6). the computed clusters.
This is consistent with data presented above and in SI Appendix, We see a consistent pattern for Campylobacter, E. coli, and
Fig. S5. Similar patterns are evident in Campylobacter and E. coli. Salmonella:

• The smallest clusters account for the largest fraction of cases,


Discussion
except for Enteritidis (Fig. 2);
Most of what we know about the contamination episodes that • The diversity is greater, e.g., in terms of serovars for Salmonella,
cause the estimated 48 million cases of food poisoning each year in the smallest cluster sizes and youngest age groups (Fig. 4 and
in the United States is from the small fraction of cases in the SI Appendix, Fig. S3);

PNAS 2024 Vol. 121 No. 46 e2411894121 https://doi.org/10.1073/pnas.2411894121 5 of 8


100 of the immune system (28, 29). Note that the drop in infant
Percent Multi State cases at the very youngest age group may relate to some level of
90
protection from maternal antibodies (30, 31). Another reason
80 for the drop may relate to changes in the fraction of infants that
70
are exclusively breast-fed: at birth, the rate is 63%, by 3 mo the
rate is 45%, and by 6 mo, the rate is 25% (32). In addition,
60 based on survey information, the fraction of children receiving
50
complementary feeding drops below 16% for age groups younger
than 4 mo (18).
Fraction of Multi State Clusters
40 It therefore seems reasonable to assume that there would be a
Fraction of Multi State Pairs
30
distribution of pathogen levels in servings associated with a con-
tamination episode (assumption #2). While a large contamination
20
2 3 4 5 6 7 8 9
source might produce a substantial number of servings with a
sufficiently high level of pathogen to sicken healthy individuals,
Cluster Size such a source would also likely produce a number of additional
servings with dosages that are only likely to cause symptoms in
Fig. 7.   Fraction of multistate Salmonella clusters versus cluster size. Multistate
analysis by cluster size: The blue line shows the fraction of multistate clusters more susceptible consumers. Likewise, a smaller contamination
for cluster sizes 2 through 10. The red line shows the fraction of pairs of cases source may yield a lower fraction of servings with pathogen levels
in each cluster size that are from different states. All pairs of cases for each sufficiently high to reliably cause symptomatic illness in most
cluster in each cluster size were compared for this analysis.
individuals. Because of the increased susceptibility of the younger
age groups, a higher fraction of the contaminated servings are able
to produce symptomatic illness in this age group, producing the
• the case rates are higher in the younger age ranges in both large
age incidence curve in Fig. 3. Furthermore, this difference would
and small clusters, albeit with the drop-­off noted at ages <1 y
be more pronounced for the smaller clusters if the distribution of
(Fig. 3 and SI Appendix, Fig. S1);
pathogen levels per serving were shifted according to the second
• The youngest age groups, and to a lesser extent the oldest, are assumption.
most overrepresented in the smallest clusters (Figs. 3 and 5). Note that this explanation for Salmonella, E. coli, and Campylobacter
The last finding is consistent with earlier work (3) that found that is mostly based on the phenotype of the host, i.e., that overall, sus-
the youngest age groups were underrepresented in Salmonella out- ceptibility is primarily age-related, along with a very general property
breaks as compared with sporadic cases (i.e., in general, smaller of contamination episodes (i.e., assumption #2) rather than on the
clusters). phenotype of the pathogen or other factors. Listeria however looks
One possible explanation for the dramatically higher case rate quite different and thus the phenotype of the pathogen can lead to
among the youngest children is the greater likelihood of seeking med- very different age incidence.
ical care and thus being picked up in the PulseNet system. It is not Of the major Salmonella serovars, Enteritidis stands out with a
clear, however, how this factor would explain the differences seen in lower bias toward younger age groups and less difference in this
Fig. 3 between smaller and larger contamination episodes. Another bias between larger and smaller clusters. The latter may relate to the
possible explanation would be based on age-based differences, in greater likelihood, as compared with other serovars, of merging
terms of exposure or susceptibility, to different genetic subgroups separate contamination episodes of Enteritidis noted above.
within the pathogen species, e.g., different Salmonella serovars. However, even for clusters of size 1, the infant fraction is lower for
Consistent with this possibility, the composition of the clusters Enteritidis than the other serovars (Fig. 4). Perhaps the susceptibil-
changes with cluster size—with increasing diversity of serovars or ity of younger age groups is only slightly increased for Enteritidis
genetic subgroups in the smaller clusters (Fig. 4 and SI Appendix, or some aspect of its phenotype weakens the relationship between
Fig. S3). However, we see the increased case rates for the youngest the number of contaminated servings and the concentration of
age groups (and to a lesser extent, the oldest age groups) among both pathogen. In conjunction with the inference that a high fraction
the common and less common Salmonella serovars (Fig. 5). And even of infant cases of Salmonella are due to cross-contamination in the
Enteritidis, which is unique in its bias for larger clusters, follows the
same pattern of relatively higher rates in the smallest clusters for the
1400
youngest and oldest age groups (Fig. 5). Single State
A simple explanation that would integrate many of the reported 1200 Multi-State
findings for Salmonella, E. coli, and Campylobacter is based on two
Duration (days)

assumptions: 1000

1. Susceptibility is age-­related, with the youngest age groups, and, 800


to a lesser extent, the oldest, having greater susceptibility;
600
2. There is a distribution of contamination levels in, e.g., food
servings or environmental sources, associated with a contam- 400
ination episode: episodes that contaminate smaller numbers
of servings will, on average, generate a smaller fraction of serv- 200
ings with pathogen levels sufficiently high to be likely to cause
0
symptoms in most individuals.
Salmonella E. coli Listeria Campylobacter
Regarding Assumption #1, others have noted that the youngest Fig. 8.   Comparing persistence of single-­state versus multistate clusters.
age groups were more susceptible to foodborne illness (11). This Comparison of the persistence of single-­state versus multistate clusters of
is perhaps expected given our knowledge about the maturation size 2 and size 3 for all four pathogens.

6 of 8 https://doi.org/10.1073/pnas.2411894121 pnas.org
home, reduced environmental persistence could lead to incidence pathogens because the contamination episodes causing most cases
in younger age groups that is closer to adult incidence. Possibly are larger and a significantly higher fraction of these are likely to
relevant here is the observation that seasonal variability of Enteritidis have been distributed widely from central sites. Moreover, we
is also markedly reduced compared to the other major serovars (e.g., know the primary source of Salmonella Enteritidis: poultry and
ref. 33). eggs (37). Quantitative risk assessment models have been created
Although the smallest contamination episodes account for the based on the pathogen survival rates in the cooking process that
majority of clinical cases of food poisoning, these do not appear have fairly good correlations between the prevalence of Salmonella
to be primarily local, like some sort of direct environmental expo- on, e.g., poultry and the fraction of cases in the US (38, 39). And
sure or associated with exposure from a restaurant outbreak. while Enteritidis may, in principle, be a more feasible target for
Rather, a majority of these cases are in geographically dispersed risk reduction at central sites, improved food safety practices in
clusters. For example, 78% of Salmonella cases (cluster size ≧2) the household would be helpful here as well.
are in multistate clusters. This implies that the servings that cause While WGS has already proven to be a useful tool for iden-
these cases are being distributed from central sites where the con- tifying and investigating foodborne outbreaks, we have demon-
tamination is occurring, which would primarily be commercially strated that the increasingly comprehensive set of pathogen
distributed foodborne transmission, but could also include contact genomes can also reveal important aspects of sporadic foodborne
with commercially distributed animals, pet food, and returning illness which accounts for most cases of food poisoning.
travelers. Furthermore, a high fraction of cases are in clusters that
are persistent, e.g., half the Salmonella cases are in clusters that
persist for over 1,071 d. As discussed in the Materials and Methods Materials and Methods
section, though the results on the fraction of multistate clusters Datasets. The pathogen isolates used in this project were collected and
and on persistence of clusters are dependent on the choice of SNP sequenced by CDC’s PulseNet national laboratory network for foodborne
thresholds for the generation of the clusters, the qualitative picture outbreak detection. The pathogen genome data and SNP distances were
remains the same: over a wide range of thresholds, the majority downloaded from the National Center for Biotechnology Information (NCBI)
of Salmonella cases are the result of contamination distributed Pathogen Detection site (40). The identifiers for the pathogen isolates used
from central sites and persisting for extended periods of time. As for these analyses are available in spreadsheets listed in Supplementary files
noted above, because we are missing isolates that could be mem- (PDT*) along with identifiers for the clusters and the size of each cluster based
bers of clusters prior to the beginning and after the end of our on thresholds of two, four, and eight SNPs. For Salmonella, the serovars as iden-
sample window as well as the larger number of cases that are not tified in the NCBI pipeline are listed as well. The Pathogen Detection releases
detected, e.g., by PulseNet, these are likely to be conservative used for each pathogen are
estimates. • Campylobacter PDG000000003.2084 (11/2023)
If infant formula contaminated at the production facility were a • Ecoli_Shigella PDG000000004.4162 (11/2023)
significant cause of salmonellosis, we would expect to see a substantial • Listeria PDG000000001.3486 (11/2023)
fraction of infant-only clusters (size ≥2). However, they represent only • Salmonella PDG000000002.2848 (11/2023)
17.6% of the cases. Furthermore, almost all infant-only clusters are PulseNet clinical isolates in study set:
single-state: there are only two infant-only clusters at the maximum
size of eight cases, and both are single-state clusters (SI Appendix, • Salmonella:1961-­08-­23: 2023-­10-­27 265,449 isolates
Fig. S5 and above). Rather, as seen in SI Appendix, Fig. S6, it appears • >98% of all isolates are from 2015-­01-­01
that most cases of salmonellosis in infants (age ≤3 mo)—with the • E. coli:1981-­01-­02: 2023-­11-­04 68,527 isolates
caveat that we can only analyze nonsingleton clusters—are due to • >96% of all isolates are from 2015-­01-­01
• Campylobacter: 1983-­01-­01: 2023-­11-­07 23,577 isolates
contamination sources shared with individuals too old to consume
• >95% of all isolates are from 2015-­01-­01
infant formula: cross-contamination from noninfant food, pet food,
• Listeria:1965-­11-­20: 2023-­12-­15 8,205 isolates
complementary feeding, and environmental sources. • >90% of all isolates are from 2013-­01-­01
Our results imply that most cases of foodborne illness are due to
many very small contamination episodes distributed from central Metadata. The PulseNet metadata used for these analyses was accessed from
sites over quite an extended period, similar to the Newport cluster CDC’s SEDRIC database under their data-­use agreement (41). The fields used
shown in Fig. 1. These characteristics greatly increase the challenge were
of detecting and eliminating the source of contamination, e.g., in • the age of the affected individual,
a production facility or the environment. This may help explain the • the date of collection, and
slow progress in reducing the overall burden of foodborne illness in • the state where the isolate was collected.
the United States (34). Given the high case rates of the youngest
Generating Clusters.
age groups, their increased susceptibility, and the likelihood that a
SNP distances. The SNP distances for these analyses were the patristic (i.e., tree-­
high fraction of these cases are due to sources within the household,
based) distances computed by the NCBI Pathogen Detection resource (40).
a greater emphasis on improving food safety in the consumer house-
This resource has been available since 2016 and has been used intensively by
hold should be considered. While improving food safety practices the FDA, CDC, USDA, and US state public health laboratories for outbreak detection
in the household would be quite challenging (35), identifying the and source tracking. A summary of the pipeline is available (https://ftp.ncbi.nlm.
minimal changes needed to reduce cross-contamination of infant nih.gov/pathogen/Methods.txt).
formula may be more feasible. Currently, state laws require infant SNP clustering. There are many ways to generate clusters using distance meas-
car seats and hospitals provide training on installing and using a car ures such as SNP distances. We start with the NCBI Pathogen Detection SNP
seat for parents bringing their newborn home from the hospital. clusters that are updated regularly. These SNP clusters include all genomes that
While breastfeeding is encouraged, much less attention is given to are within 50 SNPs of each other (NCBI pipeline summary https://ftp.ncbi.nlm.
educating new parents and caregivers on the safe preparation of nih.gov/pathogen/Methods.txt). We determine the case cluster for the genomes
infant formula and the use of feeding bottles (36). of each PulseNet clinical isolate within an NCBI SNP cluster as follows: Using
Salmonella Enteritidis seems to be a more feasible target for single linkage clustering based on a threshold SNP distance, isolates are added to
improving food safety outside the household than the other major a case cluster as long as they are less than or equal to the threshold SNP distance

PNAS 2024 Vol. 121 No. 46 e2411894121 https://doi.org/10.1073/pnas.2411894121 7 of 8


from any of the isolates in the cluster. Any isolate which is further away than the isolates, uploading the sequences to the PulseNet National Database, and sub-
threshold from all other isolates would be in a cluster of size 1. mitting the raw sequence data to the NCBI public databases. We also thank the
SNP thresholds. Choosing a SNP threshold that implies that two isolates share a CDC epidemiologists for linking isolates together during outbreak investigations
similar source of contamination in this analysis has been discussed in the context and their feedback on the manuscript. In addition, we thank the article referees for
of outbreak detection and source tracking (8, 42). Details on assessing the robust- their in-­depth review and their suggestions and corrections which have substan-
ness of observations with varying SNP thresholds are available in SI Appendix. tially improved the manuscript. This work was supported in part by the intramural
research program of the National Library of Medicine, NIH. The opinions expressed
Data, Materials, and Software Availability. Some of the metadata for the in this article are those of the authors and do not reflect the view of the NIH, the
CDC PulseNet data within the SEDRIC (System for Enteric Disease Response, Department of Health and Human Services, or the United States government.
Investigation, and Coordination) database (41) require registration for access.
Therefore, some of the metadata—state, collection date, and age—would only be
available through SEDRIC (41). All other data are available on public websites. Author affiliations: aFood and Drug Administration, Center for Food Safety and Applied
Nutrition, Office of Regulatory Science, College Park, MD 20740; bNational Center for
Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20892; and
ACKNOWLEDGMENTS. We wish to thank the PulseNet participating laboratories c
Division of International Epidemiology and Population Studies, Fogarty International
for isolating and sequencing the Salmonella, Listeria, E. coli, and Campylobacter Center, NIH, Bethesda, MD 20892

1. B. R. Jackson et al., Implementation of nationwide real-­time whole-­genome sequencing to enhance 23. J. Zhang et al., High genetic similarity of Salmonella Enteritidis as a predominant serovar by an
listeriosis outbreak detection and investigation. Clin. Infect. Dis. 63, 380–386 (2016). independent survey in 3 large-­scale chicken farms in China. Poult. Sci. 100, 100941 (2021).
2. B. Brown, M. Allard, M. C. Bazako, J. Blankenship, T. Minor, An economic evaluation of the Whole 24. T.-­M. La et al., Whole-­genome analysis of multidrug-­resistant Salmonella Enteritidis strains isolated
Genome Sequencing source tracking program in the U.S.. PLoS ONE 16, e0258262 (2021). https:// from poultry sources in Korea. Pathogens 10 (2021).
doi.org/10.1371/journal.pone.0258262. 25. C.-­W. Lei et al., Vertical transmission of Salmonella Enteritidis with heterogeneous antimicrobial
3. E. D. Ebel et al., Comparing characteristics of sporadic and outbreak-­associated foodborne illnesses, resistance from breeding chickens to commercial chickens in China. Vet. Microbiol. 240, 108538
United States, 2004–2011. Emerg. Infect. Dis. 22, 1193–1200 (2016). (2020).
4. E. Brown, U. Dessai, S. McGarry, P. Gerner-­Smidt, Use of whole-­genome sequencing for food safety 26. D. J. Baker et al., Challenges associated with investigating Salmonella Enteritidis with Low genomic
and public health in the United States. Foodborne Pathog. Dis. 16, 441–450 (2019). diversity in New York State: The impact of adjusting analytical methods and correlation with
5. B. Jagadeesan et al., The use of next generation sequencing for improving food safety: Translation epidemiological data. Foodborne Pathog. Dis. 20, 230–236 (2023).
into practice. Food Microbiol. 79, 96–115 (2019). 27. T. Dallman et al., Phylogenetic structure of European Salmonella Enteritidis outbreak correlates with
6. J. Ronholm, N. Nasheri, N. Petronella, F. Pagotto, Navigating microbiological food safety in the era of national and international egg distribution network. Microb. Genom. 2, e000070 (2016).
whole-­genome sequencing. Clin. Microbiol. Rev. 29, 837–857 (2016). 28. A. K. Simon, G. A. Hollander, A. McMichael, Evolution of the immune system in humans from infancy
7. E. L. Stevens et al., Use of whole genome sequencing by the federal interagency collaboration for to old age. Proc. Biol. Sci. 282, 20143085 (2015).
genomics for food and feed safety in the United States. J. Food Protoc. 85, 755–772 (2022). 29. A. Georgountzou, N. G. Papadopoulos, Postnatal innate immune development: From birth to
8. A. W. Pightling et al., Interpreting whole-­genome sequence analyses of foodborne bacteria for adulthood. Front. Immunol. 8, 957 (2017).
regulatory applications and outbreak investigations. Front. Microbiol. 9, 1482 (2018). 30. R. de Alwis et al., The role of maternally acquired antibody in providing protective immunity against
9. B. Tolar et al., An overview of PulseNet USA databases. Foodborne Pathog. Dis. 16, 457–462 (2019). nontyphoidal Salmonella in urban vietnamese infants: A birth cohort study. J. Infect. Dis. 219,
10. E. Scallan et al., Foodborne illness acquired in the United States–major pathogens. Emerg. Infect. 295–304 (2019).
Dis. 17, 7–15 (2011). 31. S. Basha, N. Surendran, M. Pichichero, Immune responses in neonates. Expert Rev. Clin. Immunol.
11. M. K. Thomas et al., Estimates of foodborne illness-­related hospitalizations and deaths in Canada for 10, 1171–1184 (2014).
30 specified pathogens and unspecified agents. Foodborne Pathog. Dis. 12, 820–827 (2015). 32. CDC, Breastfeeding Report Card, Centers for Disease Control and Prevention. (2023). https://www.
12. E. Sarno, D. Pezzutto, M. Rossi, E. Liebana, V. Rizzi, A review of significant european foodborne cdc.gov/breastfeeding/data/reportcard.htm. Accessed 28 July 2023.
outbreaks in the last decade. J. Food Protoc. 84, 2059–2070 (2021). 33. Salmonella Atlas, (2020). https://www.cdc.gov/salmonella/reportspubs/salmonella-­atlas/index.
13. P. Gerner-­Smidt et al., Whole genome sequencing: Bridging one-­health surveillance of foodborne html. Accessed 19 October 2023.
diseases. Front. Public Health 7, 172 (2019). 34. H. J. Shah et al., Reported incidence of infections caused by pathogens transmitted commonly
14. United States Population by Age and Sex, https://www.census.gov/popclock/data_tables. through food: Impact of increased use of culture-­independent diagnostic tests -­foodborne
php?component=pyramid. Accessed 24 October 2023. diseases active surveillance network, 1996–2023. MMWR Morb. Mortal. Wkly. Rep. 73,
15. C. Charlier, O. Disson, M. Lecuit, Maternal-­neonatal listeriosis. Virulence 11, 391–397 (2020). 584–593 (2024).
16. A. L. Boore et al., Salmonella enterica infections in the United States and assessment of coefficients 35. Kitchen Life 2, https://www.food.gov.uk/research/behaviour-and-perception/kitchen-life-2.
of variation: A novel approach to identify epidemiologic characteristics of individual serotypes, Accessed 30 November 2023.
1996–2011. PLoS One 10, e0145416 (2015). 36. E. C. Redmond, C. J. Griffith, The importance of hygiene in the domestic kitchen: Implications for
17. M. C. Judd, R. M. Hoekstra, B. E. Mahon, P. I. Fields, K. K. Wong, Epidemiologic patterns of human preparation and storage of food and infant formula. Perspect. Public Health 129, 69–76 (2009).
Salmonella serotype diversity in the USA, 1996–2016. Epidemiol. Infect. 147, e187 (2019). 37. L. H. Gould et al., Surveillance for foodborne disease outbreaks—United States, 1998–2008. MMWR
18. K. V. Chiang, H. C. Hamner, R. Li, C. G. Perrine, Timing of introduction of complementary foods— Surveill. Summ. 62, 1–34 (2013).
United States, 2016–2018. MMWR Morb. Mortal. Wkly. Rep. 69, 1969–1973 (2023). 38. T. P. Oscar, A quantitative risk assessment model for Salmonella and whole chickens. Int. J. Food
19. B. Devleesschauwer et al., Associating sporadic, foodborne illness caused by Shiga toxin-­producing Microbiol. 93, 231–247 (2004).
Escherichia coli with specific foods: A systematic review and meta-­analysis of case-­control studies. 39. K. Rajan, Z. Shi, S. C. Ricke, Current aspects of Salmonella contamination in the US poultry
Epidemiol. Infect. 147, e235 (2019). production chain and the potential application of risk strategies in understanding emerging
20. K. E. Fullerton et al., Case-­control studies of sporadic enteric infections: A review and discussion hazards. Crit. Rev. Microbiol. 43, 370–392 (2017).
of studies conducted internationally from 1990 to 2009. Foodborne Pathog. Dis. 9, 281–292 40. Home–Pathogen Detection–NCBI, https://www.ncbi.nlm.nih.gov/pathogens/. Accessed 16 August
(2012). 2023.
21. A. R. Domingues, S. M. Pires, T. Halasa, T. Hald, Source attribution of human salmonellosis using a meta-­ 41. SEDRIC: System for Enteric Disease Response, Investigation, and Coordination, (2022). https://www.
analysis of case-­control studies of sporadic infections. Epidemiol. Infect. 140, 959–969 (2012). cdc.gov/foodsafety/outbreaks/tools/sedric.html. Accessed 6 January 2024.
22. B. R. Jackson, P. M. Griffin, D. Cole, K. A. Walsh, S. J. Chai, Outbreak-­associated Salmonella enterica 42. M. A. Chattaway, A. Painset, G. Godbole, S. Gharbia, C. Jenkins, Evaluation of genomic typing methods
serotypes and food commodities, United States, 1998–2008. Emerg. Infect. Dis. 19, 1239–1244 (2013). in the Salmonella reference laboratory in public health, England, 2012–2020. Pathogens 12 (2023).

8 of 8 https://doi.org/10.1073/pnas.2411894121 pnas.org

You might also like