RSEM can be used for abundance estimation for transcriptome assemblies. The current version of RSEM, as of the release data of the corresponding Trinity software, is bundled with the Trinity software package.
To run the included version of RSEM, execute the following:
$TRINITY_HOME/util/RSEM_util/run_RSEM_align_n_estimate.pl
#########################################################################
#
# --transcripts <string> transcript fasta file
# --seqType <string> fq|fa
#
# If Paired-end:
#
# --left <string>
# --right <string>
#
# or Single-end:
#
# --single <string>
#
#
#
# Optional:
#
# --prefix <string> prefix for RSEM output files (default: 'RSEM')
#
# --SS_lib_type <string> strand-specific library type: paired('RF' or 'FR'), single('F' or 'R').
#
# --no_group_by_component Not Trinity transcripts (use --gene_trans_map to specify gene/trans relationships)
#
# --thread_count number of threads to use (default = 4)
#
# --debug retain intermediate files
#
#####################
# Non-Trinity options:
#
# --gene_trans_map <string> file containing 'gene(tab)transcript' identifiers per line.
#
#
#########################################################################
#
# To pass additional parameters to rsem-calculate-expression,
# type ' -- ' followed by additional pass-through params
#
#########################################################################
An example command with PE reads would be:
$TRINITY_HOME/util/RSEM_util/run_RSEM_align_n_estimate.pl --transcripts Trinity.fasta \
--seqType fq --left left.reads.fq --right right.reads.fq
Note
|
If you have strand-specific data, be sure to include the --SS_lib_type parameter. |
Note
|
The run_RSEM_align_n_estimate.pl script simply maps the familiar Trinity parameters to those of the RSEM software and then executes RSEM accordingly. The RSEM command generated and executed will be shown via stdout. If you should encounter problems with running RSEM, please contact the RSEM developers and provide them with the corresponding auto-generated RSEM command, not the run_RSEM_align_n_estimate.pl parameters. |
Running the above command will generate two primary output files containing the abundance estimation information (note RSEM is the default output file prefix):
RSEM.isoforms.results : EM read counts per Trinity transcript
RSEM.genes.results : EM read counts on a per-Trinity-component (aka... gene) basis, 'gene' used loosely here.
The output for the isoforms file looks like so:
transcript_id | gene_id | length | effective_length | expected_count | TPM | FPKM | IsoPct |
---|---|---|---|---|---|---|---|
comp128_c0_seq1 |
comp128_c0 |
209 |
1.73 |
0.00 |
0.00 |
0.00 |
0.00 |
comp13_c0_seq1 |
comp13_c0 |
235 |
7.16 |
1.00 |
12561.51 |
5282.75 |
100.00 |
comp22_c0_seq1 |
comp22_c0 |
215 |
2.62 |
0.00 |
0.00 |
0.00 |
0.00 |
comp28_c0_seq1 |
comp28_c0 |
329 |
54.60 |
4.00 |
6591.85 |
2772.21 |
100.00 |
comp33_c0_seq1 |
comp33_c0 |
307 |
40.30 |
3.00 |
6697.56 |
2816.66 |
100.00 |
comp35_c0_seq1 |
comp35_c0 |
219 |
3.33 |
0.00 |
0.00 |
0.00 |
0.00 |
comp35_c1_seq1 |
comp35_c1 |
204 |
1.19 |
1.00 |
75295.99 |
31665.75 |
100.00 |
comp39_c0_seq1 |
comp39_c0 |
348 |
68.20 |
1.00 |
1319.32 |
554.84 |
100.00 |
comp39_c0_seq2 |
comp39_c0 |
255 |
13.97 |
0.00 |
0.00 |
0.00 |
0.00 |
comp41_c0_seq1 |
comp41_c0 |
592 |
295.77 |
12.00 |
3650.37 |
1535.16 |
100.00 |
comp44_c0_seq1 |
comp44_c0 |
361 |
78.10 |
1.00 |
1151.96 |
484.46 |
100.00 |
comp44_c1_seq1 |
comp44_c1 |
280 |
25.22 |
1.00 |
3568.05 |
1500.54 |
100.00 |
and the genes file provides expression results on a per-Trinity component basis:
gene_id | transcript_id(s) | length | effective_length | expected_count | TPM | FPKM |
---|---|---|---|---|---|---|
comp128_c0 |
comp128_c0_seq1 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
comp13_c0 |
comp13_c0_seq1 |
235.00 |
7.16 |
1.00 |
12561.51 |
5282.75 |
comp22_c0 |
comp22_c0_seq1 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
comp28_c0 |
comp28_c0_seq1 |
329.00 |
54.60 |
4.00 |
6591.85 |
2772.21 |
comp33_c0 |
comp33_c0_seq1 |
307.00 |
40.30 |
3.00 |
6697.56 |
2816.66 |
comp35_c0 |
comp35_c0_seq1 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
comp35_c1 |
comp35_c1_seq1 |
204.00 |
1.19 |
1.00 |
75295.99 |
31665.75 |
comp39_c0 |
comp39_c0_seq1,comp39_c0_seq2 |
348.00 |
68.20 |
1.00 |
1319.32 |
554.84 |
comp41_c0 |
comp41_c0_seq1 |
592.00 |
295.77 |
12.00 |
3650.37 |
1535.16 |
comp44_c0 |
comp44_c0_seq1 |
361.00 |
78.10 |
1.00 |
1151.96 |
484.46 |
comp44_c1 |
comp44_c1_seq1 |
280.00 |
25.22 |
1.00 |
3568.05 |
1500.54 |
comp45_c0 |
comp45_c0_seq1 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
comp47_c1 |
comp47_c1_seq1 |
562.00 |
265.78 |
8.00 |
2708.23 |
1138.95 |
comp48_c0 |
comp48_c0_seq1 |
433.00 |
139.70 |
5.00 |
3220.28 |
1354.29 |
comp49_c0 |
comp49_c0_seq1 |
272.00 |
21.31 |
3.00 |
12667.38 |
5327.27 |
comp49_c1 |
comp49_c1_seq1 |
324.00 |
51.21 |
2.00 |
3514.23 |
1477.91 |
comp52_c0 |
comp52_c0_seq1 |
301.00 |
36.70 |
2.00 |
4902.98 |
2061.95 |
comp53_c0 |
comp53_c0_seq1 |
304.00 |
38.48 |
1.00 |
2337.98 |
983.24 |
Filtering lowly supported transcripts
If you want to filter out the likely transcript artifacts and lowly expressed transcripts, you might consider retaining only those that represent at least 1% of the per-component (IsoPct) expression level. Because Trinity transcripts are not currently scaffolded across sequencing gaps, there will be cases where smaller transcript fragments may lack enough properly-paired read support to show up as expressed, but are still otherwise supported by the read data. Therefore, filter cautiously and we don’t recommend discarding such lowly expressed (or seemingly unexpressed) transcripts, but rather putting them aside for further study.
The utility script TRINITY_RNASEQ_ROOT/util/filter_fasta_by_rsem_values.pl can be used to filter your FASTA file of assembled transcripts using the RSEM values within the RSEM.isoforms.results file, according to min IsoPct, min FPKM, and min TPM. Note, you can provide a list of RSEM output files, one for each sample, and filter out those transcripts that do not meet the specified requirements in any of the samples given.
Sample Data
Under TRINITY_RNASEQ_ROOT/sample_data/test_Trinity_Assembly, execute
% runMe.sh 1
to build Trinity transcript assemblies using the sample data, and then run through the downstream alignment and abundance estimation steps.