Sam’s Notebook

University of Washington - Fishery Sciences - Roberts Lab

Posts - Page 4 of 148

Trimming and QC - E5 Coral sRNA-seq Trimming Parameter Tests and Comparisons

  • 2 min read

In preparation for FastQC and trimming of the E5 coral sRNA-seq data, I noticed that my “default” trimming settings didn’t produce the results I expected. Specifically, since these are sRNAs and the NEBNext® Multiplex Small RNA Library Prep Set for Illumina (PDF) protocol indicates that the sRNAs should be ~21 - 30bp, it seemed odd that I was still ending up with read lengths of 150bp. So, I tried a couple of quick trimming comparisons on just a single pair of sRNA FastQs to use as examples to get feeback on how trimming should proceed.

Read More

Data Wrangling - P.meandrina Genome GFF to GTF Using gffread

  • ~1 min read

As part of getting P.meandrina genome info added to our Lab Handbook Genomic Resources page, I will index the P.meandrina genome file (Pocillopora_meandrina_HIv1.assembly.fasta) using HISAT2, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread to do this on my computer. Process is documented in Jupyter Notebook linked below.

Read More

FastQ QC and Trimming - E5 Coral RNA-seq Data for A.pulchra P.evermanni and P.meandrina Using FastQC fastp and MultiQC on Mox

  • 5 min read

After downloading and then reorganizing the E5 coral RNA-seq data from Azenta project 30-789513166, I ran FastQC for initial quality checks, followed by trimming with fastp, and then final QC with FastQC/MultiQC. This was performed on all three species in the data sets: A.pulchra, P.evermanni, and P.meandrina. All aspects were run on Mox.

Read More

Data Management - E5 Coral RNA-seq and sRNA-seq Reorganizing and Renaming

  • ~1 min read

Downloaded the E5 coral sRNA-seq data from Azenta project 30-852430235 on 20230515 and the E5 coral RNA-seq data from Azenta project 30-789513166 on 20230516. The data required some reorganization, as the project included data from three different species (Acropora pulchra, Pocillopora meandrina, and Porites evermanni). Additionally, since the project was sequencing the same exact samples with both RNA-seq and sRNA-seq, the resulting FastQ files ended up being the same. This fact seemed like it could lead to potential downstream mistakes and/or difficulty tracking whether or not someone was actually using an RNA-seq or an sRNA-seq FastQ.

Read More

lncRNA Identification - P.generosa lncRNAs using CPC2 and bedtools

  • ~1 min read

After trimming P.generosa RNA-seq reads on 20230426 and then aligning and annotating them to the Panopea-generosa-v1.0 genome on 20230426, I proceeded with the final step of lncRNA identification. To do this, I used Zach’s notebook entry on lncRNA identification for guidance. I utilized the annotated GTF generated by gffcompare during the alignment/annotation step on 20230426. I used ‘bedtools getfasta](https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html) and [CPC2` with an aribtrary 200bp minimum length to identify lncRNAs. All of this was done in a Jupyter Notebook (links below).

Read More

Containers - Apptainer Explorations

  • 6 min read

At some point, our HPC nodes on Mox will be retired. When that happens, we’ll likely purchase new nodes on the newest UW cluster, Klone. Additionally, the coenv nodes are no longer available on Mox. One was decommissioned and one was “migrated” to Klone. The primary issue at hand is that the base operating system for Klone appears to be very, very basic. I’d previously attempted to build/install some bioinformatics software on Klone, but could not due to a variety of missing libraries; these libraries are available by default on Mox… Part of this isn’t surprising, as UW IT has been making a concerted effort to get users to switch to containerization - specifically using Apptainer (formerly Singularity) containers.

Read More