bloomtree
from Kingsford-Group/bloomtree
Last update: Mar 1, 2019
The Sequence Bloom Tree (SBT) is a method that will allow you to index a set of short-read sequencing experiments and then query them quickly for a given sequence. The code base provided here is an implementation of SBT written in C++ and is open source. To install using the source: 1. Install gcc (Version 4.9.1 or later) 2. Install Jellyfish (Version 2.2.0 or later) 3. Install SDSL-lite (Version 2.0 or later recommended) 4. Download the latest version of SBT using Github 5. Compile: cd bloomtree/src make For further details please refer to the user manual (user_manual/sbt-manual.pdf) FAQs: 1. Cannot find "bloomtreefile" in the example files. There is no "bloomtreefile" in the example files. But you can build one using these files "Example SBT Uncompressed Leaves" (http://download.srv.cs.cmu.edu/~carlk/sbt/uncompressedleafSBT.tar.gz). The steps are: - bt hashes [-k 20] hashfile nb_hashes # create "hashfile", the typical value for nb_hashes is 1 - ls *.bf.bv > filterlistfile # *.bf.bv files are from the downloaded example data - bt build hashfile filterlistfile bloomtreefile # create "bloomtreefile" for querying - bt compress bloomtreefile compressedbloomtreefile # create a compressed version that leads to a fater query time Then you can query using the command: - bt query bloomtreefile/compressedbloomtreefile queryfile outfile 2. Which file do I query? Refer to the user manual, you can query both "bloomtreefile" and "compressedbloomtreefile". Using the compressed file results in a substantially faster query time. If you use this work, please cite: Brad Solomon and Carl Kingsford. Fast search of thousands of short-read sequencing experiments. Nature biotechnology. 2016 doi: 10.1038/nbt.3442 The current SBT implementation uses the SDSL library under the GNU General Public License (GPLv3) and is also freely distributed under the same license. If you use SBT please also cite the following paper: Gog, Simon and Beller, Timo and Moffat, Alistair and petri, Matthias. From Theory to Practice: Plug and Play with Succinct Data Structures. 13th International Symposium on Experimental Algorithms (SEA 2014). doi: http://dx.doi.org/10.1007/978-3-319-07959-2_28