|
Estimating the
level of confidence in the predicted
founding ST
In some cases, particularly in small groups with few SLVs, the predicted primary
founder of a group will have little statistical support and a bootstrapping procedure
is used in eBURST to provide the level of confidence in the assignment of the
primary founder of each group. The bootstrap values are only available using
the default group definition as they are not appropriate with less stringent
group definitions where an eBURST group may include several unlinked clusters
of STs (clonal complexes).
eBURST first divides
the input population into groups and,
for each group, one example of each ST
is extracted, and a user-defined (default
= 1000) number of random datasets of the
same size as the extracted ST set are
produced by re-sampling with replacement.
For each re-sampling the predicted primary
founder is computed, and a tally is kept
of the number or times that each ST in
the group is predicted to be the primary
founder of the group. The bootstrap values
shown for each ST are the percentage of
times the ST was predicted to be the primary
founder of the group in the bootstrap
re-samplings. As a ST cannot be the predicted
founder if it is not present in a re-sampled
dataset, the calculation of the percentage
of times each ST is predicted to be the
primary founder omits those re-samplings
in which that ST is absent. eBURST therefore
produces conditional bootstraps, which
scale the bootstrap confidence values
to 100%; a ST that is the predicted primary
founder in every re-sampling in which
it is present will have a bootstrap value
of 100%.
Bootstrap values for
the primary founders of the groups are
shown for re-samplings from the extracted
ST set, but are also available for re-samplings
taken from the set of all isolates in
the group, which often will include multiple
isolates of some STs. The bootstrap values
for a ST may differ considerably when
one of each ST is used, compared to when
all isolates are included, since the STs
represented in each re-sampling will be
influenced by the frequencies of the STs
in the database. Considerable caution
should therefore be used in evaluating
the bootstrap values using all isolates
as they are difficult to interpret, and
the default setting is to show only the
bootstrap values for the resampled datasets
obtained from the extracted set containing
one of each ST. An option can be set to
show the bootstraps calculated using all
isolates. We recommend that the bootstrap
value obtained using resamplings from
a dataset containing one of each ST is
used to evaluate the robustness of the
assignment of the primary founders.
In some cases two STs
may have relatively high bootstrap support
and the predicted primary founder may
not be the real founder of the group.
For example, a SLV of the real founder
of a group that becomes antibiotic-resistant,
and of much clinical concern, is likely
to be massively over-sampled – this
can result in a number of its antibiotic-resistant
SLVs being sampled. The over-sampling
of the latter SLVs can lead to the resistant
strain being selected as the founder of
the whole group. In this case, mapping
antibiotic resistance onto the eBURST
diagram would indicate that this may have
happened, but the sudden increase in the
success of a SLV that is not antibiotic-resistant
may cause a similar mis-assignment that
would not be so apparent. The presence
of two isolates having substantial bootstraps
scores provides an alert that the assignment
of the primary founder cannot reliably
be discerned in the absence of additional
information. |