Trinity's In silico Read Normalization

###############################################################################
#
# Required:
#
#  --seqType <string>      :type of reads: ( 'fq' or 'fa')
#  --JM <string>            :(Jellyfish Memory) number of GB of system memory to use for
#                            k-mer counting by jellyfish  (eg. 10G) *include the 'G' char
#
#  --max_cov <int>         :targeted maximum coverage for reads.
#
#
#  If paired reads:
#      --left  <string>    :left reads
#      --right <string>    :right reads
#
#  Or, if unpaired reads:
#      --single <string>   :single reads
#
#  Or, if you have read collections in different files you can use 'list' files, where each line in a list
#  file is the full path to an input file.  This saves you the time of combining them just so you can pass
#  a single file for each direction.
#      --left_list  <string> :left reads, one file path per line
#      --right_list <string> :right reads, one file path per line
#
####################################
##  Misc:  #########################
#
#  --pairs_together                :process paired reads by averaging stats between pairs and retaining linking info.
#
#  --SS_lib_type <string>          :Strand-specific RNA-Seq read orientation.
#                                   if paired: RF or FR,
#                                   if single: F or R.   (dUTP method = RF)
#                                   See web documentation.
#  --output <string>               :name of directory for output (will be
#                                   created if it doesn't already exist)
#                                   default( "normalized_reads" )
#
#  --JELLY_CPU <int>                     :number of threads for Jellyfish to use (default: 2)
#  --PARALLEL_STATS                :generate read stats in parallel for paired reads (Figure 2X Inchworm memory requirement)
#
#  --KMER_SIZE <int>               :default 25
#
#  --min_kmer_cov <int>            :minimum kmer coverage for catalog construction (default: 1)
#
#  --max_pct_stdev <int>           :maximum pct of mean for stdev of kmer coverage across read (default: 100)
#
###############################################################################

Sample data

You can run the normalization process on the sample data provided at:

TRINITY_RNASEQ_ROOT/sample_data/test_Trinity_Assembly

by running

__run_read_normalization_pairs_together_fastq.sh

(note there are several such __run_read_normalization… scripts to demonstrate alternate scenarios for execution.)

Trinity's In silico Read Normalization

Results

Sample data