is an efficient software for computing shortest strings which do not occur in a given set of DNA sequences. These are called unwords.
The program unwords-bits implements a new algorithm for the computation of unwords. It is more efficient than previous algorithms and easier to use. It uses bit vector encoding of strings and therefore we called it unwords-bits. It directly computes unwords without the need to specify a length estimate. Moreover, it avoids the space requirements of index structures such as suffix trees and suffix arrays. Our implementation is available as an open source package.
The source code is freely available here. It is released under a BSD-like open source licence.
Stefan Kurtz, kurtz(at)zbh.uni-hamburg.de
J. Herold, S. Kurtz and R. Giegerich:
Efficient Computation of Absent Words in Genomic Sequences.
BMC Bioinformatics, 9:167 (2008)
Center for Bioinformatics, University of Hamburg
Bundesstr. 43, 20146 Hamburg, Germany
Phone +49 40 42838 7311, Fax. +49 40 42838 7312