Tallymeris a collection of flexible and memory-efficient programs for k-mer counting and indexing of large sequence sets. BackgroundUnlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much larger flexibility concerning the choice of the k-mer size. Tallymer can process large data sizes of several billion bases. We used it in a variety of applications to study the genomes of maize and other plant species. In particular, Tallymer was used to index a set whole genome shotgun sequences from maize (B73) (total size 109 bp). A manual can be found here. AvailabilityTallymer is available as part of the genometools software (version 1.2.2 and higher). The Perl-scripts post processing the Tallymer output, developed by Apurva Narechania (apurva(at)cshl.org), are available here: Dan Bolser (dan.bolser(at)gmail.com) has developed a tallymer-based pipeline to annotate repeats in a fasta database. The scripts comprising the pipeline can be found here: DevelopersThe Tallymer software was written by Stefan Kurtz. kurtz(at)zbh.uni-hamburg.de. The Perl scripts above were developed by Apurva Narechania and Dan Bolser. Publication S. Kurtz, A. Narechania, J.C. Stein and D. Ware: Contact Stefan Kurtz |