lncRNA Expression - P.generosa lncRNA Expression Using StringTie

After identifying lncRNA in P.generosa, Steven asked that I generate an tissue-specific expression/count matrix (GitHub Issue). Looking through the documentation for StringTie, I decided that StringTie would work for this. The overall approach:

This was all run on Raven, using a Jupyter Notebook. Links below:

Jupyter Notebook (NB Viewer):


This produced ballgown expression files, as well as a transcript read count matrix with a column for each tissue/sample. I’m only linking directly to the final matrix file due to the number of samples and redundant ballgown files/structure. To view the organization of the output directory, see the directory tree below

Output folder:

Directory tree

├── [4.0K]  ctenidia
│   ├── [3.9M]  ctenidia-pgen-lncRNA-stringtie.gtf
│   ├── [137K]  e2t.ctab
│   ├── [997K]  e_data.ctab
│   ├── [  10]  i2t.ctab
│   ├── [  48]  i_data.ctab
│   └── [1.2M]  t_data.ctab
├── [316K]  gene_count_matrix.csv
├── [4.0K]  gonad
│   ├── [137K]  e2t.ctab
│   ├── [1002K]  e_data.ctab
│   ├── [3.9M]  gonad-pgen-lncRNA-stringtie.gtf
│   ├── [  10]  i2t.ctab
│   ├── [  48]  i_data.ctab
│   └── [1.2M]  t_data.ctab
├── [4.0K]  heart
│   ├── [137K]  e2t.ctab
│   ├── [990K]  e_data.ctab
│   ├── [3.9M]  heart-pgen-lncRNA-stringtie.gtf
│   ├── [  10]  i2t.ctab
│   ├── [  48]  i_data.ctab
│   └── [1.2M]  t_data.ctab
├── [4.0K]  juvenile
│   ├── [137K]  e2t.ctab
│   ├── [1.0M]  e_data.ctab
│   ├── [  10]  i2t.ctab
│   ├── [  48]  i_data.ctab
│   ├── [3.9M]  juvenile-pgen-lncRNA-stringtie.gtf
│   └── [1.2M]  t_data.ctab
├── [4.0K]  larvae
│   ├── [137K]  e2t.ctab
│   ├── [1001K]  e_data.ctab
│   ├── [  10]  i2t.ctab
│   ├── [  48]  i_data.ctab
│   ├── [3.9M]  larvae-pgen-lncRNA-stringtie.gtf
│   └── [1.2M]  t_data.ctab
└── [409K]  transcript_count_matrix.csv

5 directories, 32 files