Unwords
is an efficient software for computing shortest strings which do not occur in a given set of DNA sequences. These are called unwords.
Background
The program unwords-bits implements a new algorithm for the computation of unwords. It is more efficient than previous algorithms and easier to use. It uses bit vector encoding of strings and therefore we called it unwords-bits. It directly computes unwords without the need to specify a length estimate. Moreover, it avoids the space requirements of index structures such as suffix trees and suffix arrays. Our implementation is available as an open source package.
Availability
The source code is freely available here. It is released under a BSD-like open source licence.
Developers
Stefan Kurtz, kurtz(at)zbh.uni-hamburg.de
Publications
J. Herold, S. Kurtz and R. Giegerich:
Efficient Computation of Absent Words in Genomic Sequences.
BMC Bioinformatics, 9:167 (2008)
Contact
Stefan Kurtz
Center for Bioinformatics, University of Hamburg
Bundesstr. 43, 20146 Hamburg, Germany
Phone +49 40 42838 7311, Fax. +49 40 42838 7312