Posts - Page 3 of 148
Daily Bits - July 2023
File Conversion - M.magister MEGANized DAA to RMA6
After the initial DIAMOND BLASTx and subsequent MEGANization(notebook) ran for 41 days, I attempted to open the extremely large files in the MEGAN6 GUI to get an overview of taxonomic breakdown. Due to the large file sizes (the smallest is 68GB!), the GUI consistently crashed. Also, each attempt took an hour or two before it would crash. Looking into this a bit more, I realized that I needed to convert the MEGANized DAA files to RMA6 format before attempting to import into the MEGAN6 GUI! Gah! The RMA6 files are significantly smaller (like “only” 2GB) and there should be ample memory to import them.
Data Received - Pacific Cod (G.macrocephalus) Sequencing Data from Novogene
Downloaded Pacific cod (_G.macrocephalus) sequencing data. I believe this is a NOAA project, and I don’t have any info about it. So, not much to report here.
sRNA-seq Alignments - E5 Coral A.pulchra P.meandrina Using ShortStack on Mox
Steven had asked that I align the coral E5 sRNA-seq reads using ShortStack (GitHub Issue). I previously trimmed the sRNA-seq reads to 35bp in length (notebook). Next up was to actually perform the alignments using ShortStack4. A.pulchra was aligned to the P.millepora genome, per this GitHub Issue. This was run on Mox.
Trimming and QC - E5 Coral sRNA-seq Data fro A.pulchra P.evermanni and P.meandrina Using FastQC flexbar and MultiQC on Mox
After downloading (notebook) and then reorganizing the E5 coral RNA-seq data from Azenta project 30-789513166 (notebook), and after testing some trimming options for sRNA-seq data (notebook), I opted to use the trimming software flexbar
. I ran FastQC for initial quality checks, followed by trimming with flexbar
, and then final QC with FastQC/MultiQC. This was performed on all three species in the data sets: A.pulchra, P.evermanni, and P.meandrina. All aspects were run on Mox. Final trimming length was set to 35bp.
ORF Identification - L.staminea De Novo Transcriptome Assembly v1.0 Using Transdecoder on Mox
After performing a de novo transcriptome assembly with L.staminea RNA-seq data, the Trinity
assembly stats were quite a bit more “exaggerated” than normally expected. In an attempt to get a better sense of which contigs might be more useful candidates for downstream analysis, I decided to run the assembly through Transdecoder to identify open reading frames (ORFs). This was run on Mox.
Transcriptome Assembly - De Novo L.staminea Trimmed RNAseq Using Trinity on Mox
As part of this GitHub Issue to create a de novo transcriptome assembly from L.staminea RNA-seq data, I trimmed the reads earlier today. Next up is the actual do novo assembly. I performed this using Trinity
on Mox.
Trimming - L.staminea RNA-seq Using FastQC fastp and MultiQC on Mox
Per this GitHub Issue, Steven asked me to perform a de novo transcriptome assembly on one set of paired FastQ from some little neck clam (L.staminea) RNA-seq. Prior to assembly, I needed to trim the FastQs.
Daily Bits - June 2023
20230605
Repeats Identification - P.meandrina Using RepeatMasker on Mox
Steven asked me to run RepeatMasker on the P.meandrina genome (GitHub Issue) for eventual use in developing a count matrix for transposable elements. The P.meandrina genome is from here: