RSEM can be used for abundance estimation for transcriptome assemblies. The current version of RSEM, as of the release data of the corresponding Trinity software, is bundled with the Trinity software package.

To run the included version of RSEM, execute the following:

$TRINITY_HOME/util/RSEM_util/run_RSEM_align_n_estimate.pl
#########################################################################
#
#  --transcripts <string>           transcript fasta file
#  --seqType <string>              fq|fa
#
#  If Paired-end:
#
#  --left <string>
#  --right <string>
#
#    or Single-end:
#
#  --single <string>
#
#
#
# Optional:
#
# --prefix <string>                prefix for RSEM output files (default: 'RSEM')
#
# --SS_lib_type <string>           strand-specific library type:  paired('RF' or 'FR'), single('F' or 'R').
#
# --no_group_by_component          Not Trinity transcripts (use --gene_trans_map to specify gene/trans relationships)
#
# --thread_count                   number of threads to use (default = 4)
#
# --debug                  retain intermediate files
#
#####################
#  Non-Trinity options:
#
#  --gene_trans_map <string>        file containing 'gene(tab)transcript' identifiers per line.
#
#
#########################################################################
#
#  To pass additional parameters to rsem-calculate-expression,
#    type ' -- ' followed by additional pass-through params
#
#########################################################################

An example command with PE reads would be:

$TRINITY_HOME/util/RSEM_util/run_RSEM_align_n_estimate.pl --transcripts Trinity.fasta \
        --seqType fq --left left.reads.fq --right right.reads.fq
Note
If you have strand-specific data, be sure to include the --SS_lib_type parameter.
Note
The run_RSEM_align_n_estimate.pl script simply maps the familiar Trinity parameters to those of the RSEM software and then executes RSEM accordingly. The RSEM command generated and executed will be shown via stdout. If you should encounter problems with running RSEM, please contact the RSEM developers and provide them with the corresponding auto-generated RSEM command, not the run_RSEM_align_n_estimate.pl parameters.

Running the above command will generate two primary output files containing the abundance estimation information (note RSEM is the default output file prefix):

RSEM.isoforms.results  : EM read counts per Trinity transcript
RSEM.genes.results     : EM read counts on a per-Trinity-component (aka... gene) basis, 'gene' used loosely here.

The output for the isoforms file looks like so:

transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct

comp128_c0_seq1

comp128_c0

209

1.73

0.00

0.00

0.00

0.00

comp13_c0_seq1

comp13_c0

235

7.16

1.00

12561.51

5282.75

100.00

comp22_c0_seq1

comp22_c0

215

2.62

0.00

0.00

0.00

0.00

comp28_c0_seq1

comp28_c0

329

54.60

4.00

6591.85

2772.21

100.00

comp33_c0_seq1

comp33_c0

307

40.30

3.00

6697.56

2816.66

100.00

comp35_c0_seq1

comp35_c0

219

3.33

0.00

0.00

0.00

0.00

comp35_c1_seq1

comp35_c1

204

1.19

1.00

75295.99

31665.75

100.00

comp39_c0_seq1

comp39_c0

348

68.20

1.00

1319.32

554.84

100.00

comp39_c0_seq2

comp39_c0

255

13.97

0.00

0.00

0.00

0.00

comp41_c0_seq1

comp41_c0

592

295.77

12.00

3650.37

1535.16

100.00

comp44_c0_seq1

comp44_c0

361

78.10

1.00

1151.96

484.46

100.00

comp44_c1_seq1

comp44_c1

280

25.22

1.00

3568.05

1500.54

100.00

and the genes file provides expression results on a per-Trinity component basis:

gene_id transcript_id(s) length effective_length expected_count TPM FPKM

comp128_c0

comp128_c0_seq1

0.00

0.00

0.00

0.00

0.00

comp13_c0

comp13_c0_seq1

235.00

7.16

1.00

12561.51

5282.75

comp22_c0

comp22_c0_seq1

0.00

0.00

0.00

0.00

0.00

comp28_c0

comp28_c0_seq1

329.00

54.60

4.00

6591.85

2772.21

comp33_c0

comp33_c0_seq1

307.00

40.30

3.00

6697.56

2816.66

comp35_c0

comp35_c0_seq1

0.00

0.00

0.00

0.00

0.00

comp35_c1

comp35_c1_seq1

204.00

1.19

1.00

75295.99

31665.75

comp39_c0

comp39_c0_seq1,comp39_c0_seq2

348.00

68.20

1.00

1319.32

554.84

comp41_c0

comp41_c0_seq1

592.00

295.77

12.00

3650.37

1535.16

comp44_c0

comp44_c0_seq1

361.00

78.10

1.00

1151.96

484.46

comp44_c1

comp44_c1_seq1

280.00

25.22

1.00

3568.05

1500.54

comp45_c0

comp45_c0_seq1

0.00

0.00

0.00

0.00

0.00

comp47_c1

comp47_c1_seq1

562.00

265.78

8.00

2708.23

1138.95

comp48_c0

comp48_c0_seq1

433.00

139.70

5.00

3220.28

1354.29

comp49_c0

comp49_c0_seq1

272.00

21.31

3.00

12667.38

5327.27

comp49_c1

comp49_c1_seq1

324.00

51.21

2.00

3514.23

1477.91

comp52_c0

comp52_c0_seq1

301.00

36.70

2.00

4902.98

2061.95

comp53_c0

comp53_c0_seq1

304.00

38.48

1.00

2337.98

983.24

Filtering lowly supported transcripts

If you want to filter out the likely transcript artifacts and lowly expressed transcripts, you might consider retaining only those that represent at least 1% of the per-component (IsoPct) expression level. Because Trinity transcripts are not currently scaffolded across sequencing gaps, there will be cases where smaller transcript fragments may lack enough properly-paired read support to show up as expressed, but are still otherwise supported by the read data. Therefore, filter cautiously and we don’t recommend discarding such lowly expressed (or seemingly unexpressed) transcripts, but rather putting them aside for further study.

The utility script TRINITY_RNASEQ_ROOT/util/filter_fasta_by_rsem_values.pl can be used to filter your FASTA file of assembled transcripts using the RSEM values within the RSEM.isoforms.results file, according to min IsoPct, min FPKM, and min TPM. Note, you can provide a list of RSEM output files, one for each sample, and filter out those transcripts that do not meet the specified requirements in any of the samples given.

Sample Data

Under TRINITY_RNASEQ_ROOT/sample_data/test_Trinity_Assembly, execute

% runMe.sh 1

to build Trinity transcript assemblies using the sample data, and then run through the downstream alignment and abundance estimation steps.