Click for Instructions
previous | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | Figures | next

Estimating the level of confidence in the predicted founding ST

In some cases, particularly in small groups with few SLVs, the predicted primary founder of a group will have little statistical support and a bootstrapping procedure is used in eBURST to provide the level of confidence in the assignment of the primary founder of each group. The bootstrap values are only available using the default group definition as they are not appropriate with less stringent group definitions where an eBURST group may include several unlinked clusters of STs (clonal complexes).

eBURST first divides the input population into groups and, for each group, one example of each ST is extracted, and a user-defined (default = 1000) number of random datasets of the same size as the extracted ST set are produced by re-sampling with replacement. For each re-sampling the predicted primary founder is computed, and a tally is kept of the number or times that each ST in the group is predicted to be the primary founder of the group. The bootstrap values shown for each ST are the percentage of times the ST was predicted to be the primary founder of the group in the bootstrap re-samplings. As a ST cannot be the predicted founder if it is not present in a re-sampled dataset, the calculation of the percentage of times each ST is predicted to be the primary founder omits those re-samplings in which that ST is absent. eBURST therefore produces conditional bootstraps, which scale the bootstrap confidence values to 100%; a ST that is the predicted primary founder in every re-sampling in which it is present will have a bootstrap value of 100%.

Bootstrap values for the primary founders of the groups are shown for re-samplings from the extracted ST set, but are also available for re-samplings taken from the set of all isolates in the group, which often will include multiple isolates of some STs. The bootstrap values for a ST may differ considerably when one of each ST is used, compared to when all isolates are included, since the STs represented in each re-sampling will be influenced by the frequencies of the STs in the database. Considerable caution should therefore be used in evaluating the bootstrap values using all isolates as they are difficult to interpret, and the default setting is to show only the bootstrap values for the resampled datasets obtained from the extracted set containing one of each ST. An option can be set to show the bootstraps calculated using all isolates. We recommend that the bootstrap value obtained using resamplings from a dataset containing one of each ST is used to evaluate the robustness of the assignment of the primary founders.

In some cases two STs may have relatively high bootstrap support and the predicted primary founder may not be the real founder of the group. For example, a SLV of the real founder of a group that becomes antibiotic-resistant, and of much clinical concern, is likely to be massively over-sampled – this can result in a number of its antibiotic-resistant SLVs being sampled. The over-sampling of the latter SLVs can lead to the resistant strain being selected as the founder of the whole group. In this case, mapping antibiotic resistance onto the eBURST diagram would indicate that this may have happened, but the sudden increase in the success of a SLV that is not antibiotic-resistant may cause a similar mis-assignment that would not be so apparent. The presence of two isolates having substantial bootstraps scores provides an alert that the assignment of the primary founder cannot reliably be discerned in the absence of additional information.

eBURSTv3 has been developed and is hosted at Imperial College London