Skip to main content

Genome Analysis Unit

NCI DNAnexus Pilot

NCI has established a Pilot program with DNAnexus to evaluate their cloud-based platform for NextGen Sequence analysis. While DNAnexus is based on a pay-as-you-go model the OSTR has provided funding for this Pilot phase.

[DNAnexus provides a global network for sharing and management of genomic data and tools to accelerate genomics. The DNAnexus cloud-based platform is optimized to address the challenges of security, scalability, and collaboration, for organizations that are pursuing genomic-based approaches to health, in the clinic and in the research lab. Additionally, DNAnexus hosted the recent precisionFDA challenge.]

During this Pilot phase we hope to determine the potential usefulness of this platform to the CCR user community.  DNAnexus offers the following features and policies that make it an attractive product:

  1. A simple to use GUI interface to complex prebuilt standardized workflows.
  2. Complete access to powerful virtually unlimited cloud-based resources via  a command line interface and scripting options that allow batch processing of large datasets.
  3. A significant portfolio of prebuilt software (mostly well accepted tools, and best-practice workflows) optimized for execution in a  cloud environment.
  4. DNAnexus has a number of unique partnerships that enable access to specific proprietary tools – e.g. Sentieon and IVA  integration.
  5. An open system that allows integration of “homegrown” applications and workflows into their environment for execution via the GUI or CLI interfaces.
  6. The option of building enhanced data viewers to facilitate data exploration by non-expert users.
  7. A billing model that would allow CCR to set up a single “corporate account” and allocate specific funds to selected subgroups.
  8. DNAnexus appears very open to installing and optimizing new software tools on demand.
  9. Lastly, the platform offers the potential of providing a distribution medium (to their entire user community) of any homegrown tools or workflows.

With this in mind we see this platform being of potential use to the CCR community in the following ways:

  1. A cost effective way of accessing proprietary high performance tools such as Sentieon..
  2. An alternative to Biowulf for some high throughput batch data analysis. [Keep in mind that while Biowulf  appears to be free to the enduser in actually cost both the NIH, and CCR specifically, hundreds of thousands to millions of dollars each year in maintenance and update costs.]
  3. Giving bench-scientist access to performance tuned, reliable, reproducible workflows, with enhanced viewer options.
  4. Finally it seems possible that at some point this platform could be integrated into sequencing  facilities’ workflow allowing the automatic deliver of processed data in a user-friendly environment.

If there are any test or use case you would like to explore, but don’t have the time, energy or expertise to perform yourself, please pass on these suggestions, to the group in general or its managers. 

To aid in this Pilot study we are offering the following resources:

  1. A number of Web pages (think wiki) offering our early insights into the platform, and further explaining some of its operation.
  2. A SLACK workspace to enable communications between the different individuals involved in this evaluation. (i.e. a place to seek help and share one’s insights, problem or frustrations).

We also encourage you to explore DNAnexus’s on line documentation and tutorials. Please note that DNAnexus has made an application specialist available to assist in resolving issues, so please inform us of any problems that you may encounter.