Statistic Section (long DNA Sequences)
The simulation of the DNA sequencing data of the three species was done separately with the ReadSim simulator without any sequencing errors and frame shifts to verify the general operability of the MetaGenomeThreader. The average DNA sequencing length was 850 DNA bases by sevenfold sequence coverage (cp. test data: short DNA sequences, with frame shifts and sequencing errors).
Tab. 1.2 shows the number of the species, which was used in the genome analysis for the identification of PCS's (column 2) as well as two species, which performed the biggest contribution in the PCS identification referring to the DNA sequence length (column 3). The percentage in column 4 is based on the ratio of the DNA sequence length from the DNA sequence of the species in column 3 relating to the overall length of all hit sequences, which led to PCS identifications. The synthetic metagenome species did not appear in the MetaGenomeThreader result due to the calculation of the combined scores, as they were based on synonymous DNA base exchanges, where homologies were not scored. Due to the fact that the DNA sequencing data were simulated without sequencing errors and without frame shifts, the quality of the MetaGenomeThreader results depended on the availability of DNA sequences from species which differ in synonymous DNA base exchanges and therefore are relatively closely related to the target species.
(cp. Tab. 1.1) number of species in the genome analysis the two most dominant species in the seperate genome analysis proportion of the DNA sequences in the genome analysis
Candidatus Pelagibacter 2 Rhodopseudomonas palustris HaA2 85,7143
Rhodopseudomonas palustris BisB5 14,2857
Vibrio cholerae 76 Vibrio cholerae O395 chromosome 2 18,8365
Vibrio vulnificus YJ016 DNA, chromosome I 15,1935
Pyrococcus horikoshii 5 Pyrococcus furiosus DSM 3638, section 12 of 173 of the complete genome 29,6239
Pyrococcus abyssi complete genome; segment 1/6 27,7330
Tab. 1.2: Summary of the MetaGenomeThreader statistic sections of the separated genome analysis results (cp. statistic section summary by the use of short DNA sequences, here).
The appearance of the species in the MetaGenomeThreader result allows a first assumption on whether the result is qualitatively good or not. Very closely related species in the result are a first sign of a good MetaGenomeThreader outcome whereas the probability of bad results is significantly higher in distantly related species. A good result can be expected when the PCS's of Vibrio cholerae and Pyrococcus horikoshii were calculated based on DNA sequences from closely related species which are on the same level in the taxonomical classification as the target species (cp. tab. 1.3, column 2 bold). With Candidatus Pelagibacter the target species could not be detected more exactly because the BLAST hits of the identified PCS's are based on two species which are only distantly related to the target species (cp. tab. 1.3, column 2 bold). Alphaproteobacteria, as the first identified common taxonomical group, is found relatively early on in the taxonomical classification chart.
Conclusion: Only a few distantly related DNA sequences are available for Candidatus Pelagibacter, whereas closely related DNA sequences are available for Vibrio cholerae and Pyrococcus horikoshii. Thus good results for Vibrio cholera and Pyrococcus horikoshii could be expected. In contrast it could be difficult to get usable results for Candidatus Pelagibacter.
(cp. Tab. 1.1) taxonomical classification of the target organism (upper row) as well as this species which appears most frequently in the MetaGenomeThreader result (bottom row)
Candidatus Pelagibacter Bacteria / Proteobacteria / Alphaproteobacteria / Rickettsiales / SAR11 cluster / Candidatus Pelagibacter / Candidatus Pelagibacter ubique
Bacteria / Proteobacteria / Alphaproteobacteria / Rhizobiales / Bradyrhizobiaceae / Rhodopseudomonas / Rhodopseudomonas palustris
Vibrio cholerae Bacteria / Proteobacteria / Gammaproteobacteria / Vibrionales / Vibrionaceae / Vibrio / Vibrio cholerae / Vibrio cholerae O1 / Vibrio cholerae O1 biovar eltor
Bacteria / Proteobacteria / Gammaproteobacteria / Vibrionales / Vibrionaceae / Vibrio / Vibrio cholerae / Vibrio cholerae O1 / Vibrio cholerae O1 biovar eltor
Pyrococcus horikoshii Archaea / Euryarchaeota / Thermococci / Thermococcales / Thermococcaceae / Pyrococcus / Pyrococcus horikoshii
Archaea / Euryarchaeota / Thermococci / Thermococcales / Thermococcaceae / Pyrococcus / Pyrococcus furiosus
Tab. 1.3: Taxonomical classification of those species, which occured most frequently in the genome analysis.
Test Results: Sequence Data
Test Results: Interpretation