NVBIO

NVBIO is a GPU-accelerated C++ framework for High-Throughput Sequence Analysis for both short and long read alignment. It is a modular library which includes data structures, algorithms, and utility routines useful for building complex computational genomics applications on both CPU-GPU and CPU-only systems. NVBIO is open source, documented, licensed under GPLv2 and available on github. Github Project

NVBIO is intended to be used by bioinformatics software developers as high-performance, GPU-accelerated, implementation examples which can be utilized within your own code “as is” or which can serve as “how to do” examples to be modified to fit your individual needs. NVBowtie is a full functioning application built on top of NVBIO, also available on github. Github Project

NVBIO’s overall design focus

Optimization of the whole pile-line, not just a single component (e.g., including data transfers, SAM, BAM, CRAM I/O, etc.)
Flexibility and customizability; it is a template-library
Parallelism at every level
Optimized throughput, server-like design

NVBIO string search performance on GPUs (note logarithmic scale)

Modular Structure

Examples of NVBIO functionality

File IO routines for many common file formats (read data, aligned reads, genome data)
- IO routines are implemented so that they can pass data efficiently to and from the GPU
Fast CUDA implementations of Smith-Waterman, Edit Distance, and Gotoh aligners.
- Flexible control over scoring
- Flexible control over alignment calculation ({banded, traceback, batched} x {local, global, semi-global})
Text indices
- FM-Index, suffix tries, sorted dictionary tries, sampled suffix arrays, rank dictionaries
- Includes routines for constructing indices and routines for querying from CUDA kernels or CPU code
- New routines for GPU based BWT construction of both large strings and large string collections (both many times faster than traditional CPU based codes!)
Utilities built using the library, including
- nvBWT and nvSSA –tools for building a Burrows-Wheeler Transform based text index from FASTA files (with genome data)
- nvFM-server – loads an FM-index in a shared memory region accessible by other processes
- nvBowtie – a largely complete implementation of the Bowtie2 aligner on top of NVBIO, with good coverage of Bowtie2 features and comparable quality results

Examples of possible performance using NVBIO routines on GPUs

Speed & Flexibility

nvBowtie Performance

NVBIO BWT Construction

NVBIO Smith-Waterman