About rnaseq_salmon

RNAseq Salmon (rnaseq_salmon)  is a DNAnexus Workflow that combines the two DNAnexus applets Salmon Scatter-Process-Gather Workflow  and quant_sf2express_table. Salmon Scatter-Process-Gather Workflow (salmon_spg_wf) is a DNAnexus applet that process a batch of pair-end FASTQ read files and runs Salmon to produce expression count files. quant_sf2express_table is a DNAnexus applet that generates expression table files suitable for RNA-seq Expression Analysis (i.e  BioJupies or iDEP) from a set of quant.sf files produced by Salmon.

 

Required Input Files


  • A Set of quant.sf Files – Select a set of files with naming convention sample_name_quant.sf. Where, you substitute sample_name with a unique sample name containing alpha-numeric characters and no spaces. You can obtain quant.sf files:
    1. Use GAU’s DNAnexus applet salmon_spg_wf, or
    2. Run Salmon to produce a set of quant.sf files, rename and upload to a DNAnexus project.
  • FASTQ Gzip Compressed Paired-end Files – A batch of sample pair-end read files with the form sample_name_R1.fastq.gz_ and sample_name_R2.fastq.gz. Where, you substitute sample_name with a unique sample name containing alpha-numeric characters and no spaces.
  • Salmon Index tar.gz File – A Salmon Indexed genome files with the form genome_name_salmon_idx.tar.gz . Where, you substitute genome_name with a unique genome name containing alpha-numeric characters and no spaces. This is generated using salmon_indexer.

Output Files


  • Expression HTML File – An output file that provides useful links, DNAnexus job information and instructions on submitting to BioJupies or iDEP which provide downstreamRNA-seq Expression Analysis.
  • Raw Counts Table File – File containing table with unprocessed raw counts.
  • TPM Counts Table File – File containing table TPM (transcripts per million reads) counts.
  • Design Table File for iDEP – File used by iDEP containing table of sample names and conditions or treatments. The file can be opened in Excel or a text editor and customized.
  • Salmon Results Directory tar.gz File – A file of form sample_name_salmon.tar.gz. This is a directory that is tar.gz compressed and needs to be expanded using the command tar -xzf tarfile. These files are provided if you wish to do some custom analysis. Otherwise, it can be ignored.
  •  Salmon’s Quant.sf File – A file of form sample_name_quant.sf. This file contains counts.
  • Kallisto’s abundance.h5 File – A file of form sample_name_abundance.h5. This files is transformed from sample_name_quant.sf file into a Kallisto Hierarchical Data Format (HDF) file.

 

For NCI Members


 

Developed by GAU