lncRNA Identification - P.generosa lncRNAs using CPC2 and bedtools

After trimming P.generosa RNA-seq reads on 20230426 and then aligning and annotating them to the Panopea-generosa-v1.0 genome on 20230426, I proceeded with the final step of lncRNA identification. To do this, I used Zach’s notebook entry on lncRNA identification for guidance. I utilized the annotated GTF generated by gffcompare during the alignment/annotation step on 20230426. I used ‘bedtools getfasta](https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html) and [CPC2` with an aribtrary 200bp minimum length to identify lncRNAs. All of this was done in a Jupyter Notebook (links below).

Jupyter Notebook (GitHub):

Jupyter Notebook (NB Viewer):


RESULTS

Some very brief “stats”:

Total P.generosa transccripts ID’s by HiSat2/Stringtie: 79,269

Total P.generosa lncRNA ID’d by CPC2 (>= 200bp): 13,606

Percentage of transcripts which are lncRNAs: 17%

Output folder: