Sam’s Notebook

Trimming and QC - E5 Coral sRNA-seq Trimming Parameter Tests and Comparisons

In preparation for FastQC and trimming of the E5 coral sRNA-seq data, I noticed that my “default” trimming settings didn’t produce the results I expected. Specifically, since these are sRNAs and the NEBNext® Multiplex Small RNA Library Prep Set for Illumina (PDF) protocol indicates that the sRNAs should be ~21 - 30bp, it seemed odd that I was still ending up with read lengths of 150bp. So, I tried a couple of quick trimming comparisons on just a single pair of sRNA FastQs to use as examples to get feeback on how trimming should proceed.

Data Wrangling - P.meandrina Genome GFF to GTF Using gffread

As part of getting P.meandrina genome info added to our Lab Handbook Genomic Resources page, I will index the P.meandrina genome file (Pocillopora_meandrina_HIv1.assembly.fasta) using HISAT2, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread to do this on my computer. Process is documented in Jupyter Notebook linked below.

FastQ QC and Trimming - E5 Coral RNA-seq Data for A.pulchra P.evermanni and P.meandrina Using FastQC fastp and MultiQC on Mox

After downloading and then reorganizing the E5 coral RNA-seq data from Azenta project 30-789513166, I ran FastQC for initial quality checks, followed by trimming with fastp, and then final QC with FastQC/MultiQC. This was performed on all three species in the data sets: A.pulchra, P.evermanni, and P.meandrina. All aspects were run on Mox.

Data Management - E5 Coral RNA-seq and sRNA-seq Reorganizing and Renaming

Downloaded the E5 coral sRNA-seq data from Azenta project 30-852430235 on 20230515 and the E5 coral RNA-seq data from Azenta project 30-789513166 on 20230516. The data required some reorganization, as the project included data from three different species (Acropora pulchra, Pocillopora meandrina, and Porites evermanni). Additionally, since the project was sequencing the same exact samples with both RNA-seq and sRNA-seq, the resulting FastQ files ended up being the same. This fact seemed like it could lead to potential downstream mistakes and/or difficulty tracking whether or not someone was actually using an RNA-seq or an sRNA-seq FastQ.

Data Received - Coral RNA-seq Data from Azenta Project 30-789513166

Small RNA-seq (sRNA-seq) data was made available from the coral E5 Azenta project 30-789513166. Sample sheet:

Data Received - Coral sRNA-seq Data from Azenta Project 30-852430235

Small RNA-seq (sRNA-seq) data was made available from the coral E5 Azenta project 30-852430235. Sample sheet is below.

lncRNA Expression - P.generosa lncRNA Expression Using StringTie

After identifying lncRNA in P.generosa, Steven asked that I generate an tissue-specific expression/count matrix (GitHub Issue). Looking through the documentation for StringTie, I decided that StringTie would work for this. The overall approach:

Daily Bits - May 2023

20230517

lncRNA Identification - P.generosa lncRNAs using CPC2 and bedtools

After trimming P.generosa RNA-seq reads on 20230426 and then aligning and annotating them to the Panopea-generosa-v1.0 genome on 20230426, I proceeded with the final step of lncRNA identification. To do this, I used Zach’s notebook entry on lncRNA identification for guidance. I utilized the annotated GTF generated by gffcompare during the alignment/annotation step on 20230426. I used ‘bedtools getfasta](https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html) and [CPC2` with an aribtrary 200bp minimum length to identify lncRNAs. All of this was done in a Jupyter Notebook (links below).

Containers - Apptainer Explorations

At some point, our HPC nodes on Mox will be retired. When that happens, we’ll likely purchase new nodes on the newest UW cluster, Klone. Additionally, the coenv nodes are no longer available on Mox. One was decommissioned and one was “migrated” to Klone. The primary issue at hand is that the base operating system for Klone appears to be very, very basic. I’d previously attempted to build/install some bioinformatics software on Klone, but could not due to a variety of missing libraries; these libraries are available by default on Mox… Part of this isn’t surprising, as UW IT has been making a concerted effort to get users to switch to containerization - specifically using Apptainer (formerly Singularity) containers.