Comparing two datasets

eBURSTv3 provides the capability to compare and differentially highlight two datasets. An initial dataset (REFERENCE) is first loaded and then a second dataset (QUERY) can be loaded and compared to the reference dataset. This is a particularly useful enhancement as it allows comparison of a user dataset (QUERY) with the whole MLST database (REFERENCE) for that species, differentially highlighting those STs that are unique to the user data and those that are also already present in the MLST database. Alternatively, two user datasets can be compared highlighting STs unique to either dataset and those common to both datasets.

Datasets can be uploaded here or directly into eBURSTv3 through the File menu.

Species-specific websites contain the facility to upload query datasets for comparison against an entire database allowing you to explore the predicted ancestry of your isolates prior to submission to curators.

The ability to run eBURST on Oxford MLST databases or compare datasets to those in Oxford databases is provided, however, unlike the datasets provided from databases these only contain one example of each ST.

After loading both datasets eBURST initially compares the profiles within both to check that there are no identical allelic profiles assigned as different STs or, conversely, no differing allelic profiles assigned as the same ST. Should discrepancies be found, descriptions of the particular profiles are returned in the Profiles Window and you are asked to correct data prior to reloading.

Once consistent data are loaded both datasets are displayed in the profiles window allowing analysis to begin as with a single dataset. STs in the profiles window are coloured differentially dependant on their membership of the two datasets –

Black - STs found only in the Reference dataset
Green - STs found only in the Query dataset
Cyan - STs found in both the Reference and Query datasets.

When viewing the eBURST diagram of individual groups, or a population snapshot of the two combined datasets, the ST labels also are coloured in this fashion.

To be able to visualise the differences and similarities between the two datasets more clearly, the ST labels can be turned off (From the ‘Diagram’ menu uncheck ‘Show ST labels’) and a ‘halo’ is drawn around the ST circle coloured as follows –

Green - STs found only in the Query dataset
Cyan - STs found in both the Reference and Query datasets..

STs found only in the reference dataset are shown as normal without a halo(black).

The border thickness of the halos can be changed through the Diagram menu to produce the optimal visibility of the differential colouring of STs for saving for a publication of powerpoint slide.

For further clarity, the colouring of the primary founders and subgroup founders can be removed (by selecting ‘black’ in the ‘colour options’ menu from the ‘Diagram’ menu.

Figure 6 depicts two datasets, one (REFERENCE) collected at one timepoint against another (QUERY) collected at a later timepoint. It can be seen that there are a number of minor clonal compexes where the SLV’s of the predicted founder are only present in the query dataset, collected after the reference dataset.

Figure 6

