Tag Archives: Panopea generosa

Transcriptome Assembly – Geoduck RNAseq data

Used all of our current geoduck RNAseq data to assemble a transcriptome using Trinity.

Trinity was run our our Mox HPC node. Specifically, I had to use just a single node with 500GB of RAM. Trinity could not run with much less than that. Initially, I attempted to run with two nodes, but our smaller node (120GB) ended up limiting the available RAM (the system only uses the RAM available on the smallest node; it cannot combine RAM or dynamically allocate computing to a node with larger RAM when needed) and Trinity consistently crashed due to memory limitations.

Reads were trimmed using the built-in version of Trimmomatic with the default settings.

SBATCH script:

20180827_geo_trinity.sh

Due to the huge number of input files, I won’t post the entire script contents here. Instead, here’s a snippet of the script showing the commands used to start the Trinity run:


#!/bin/bash
## Job Name
#SBATCH --job-name=20180829_trinity
## Allocation Definition 
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=30-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/20180827_trinity_geoduck_RNAseq

# Load Python Mox module for Python module availability

module load intel-python3_2017

# Document programs in PATH (primarily for program version ID)

date >> system_path.log
echo "" >> system_path.log
echo "System PATH for $SLURM_JOB_ID" >> system_path.log
echo "" >> system_path.log
printf "%0.s-" {1..10} >> system_path.log
echo ${PATH} | tr : \\n >> system_path.log


# Run Trinity
/gscratch/srlab/programs/trinityrnaseq-Trinity-v2.8.3/Trinity \
--trimmomatic \
--seqType fq \
--max_memory 500G \
--CPU 28 \

Despite the naming conventions, this job was submitted to the Mox scheduler on 20180829 and finished on 20180901.

After job completion, the entire folder was gzipped (the following method of gzipping is SUPER fast, btw):

tar -c 20180827_trinity_geoduck_RNAseq | pigz > 20180827_trinity_geoduck_RNAseq.tar.gz

RESULTS:

Output folder:

20180827_trinity_geoduck_RNAseq/

Trinity assembly (FastA):

20180827_trinity_geoduck_RNAseq/Trinity.fasta

Next up, I’ll get some annotations going by running through TransDecoder and blastx.

Assembly Stats – Geoduck Hi-C Final Assembly Comparison

0000-0002-2747-368X

We received the final geoduck genome assembly data from Phase Genomics, in which they updated the assembly by performing some manual curation:

geoduck_manual_scaffolds.fasta

There are additional assembly files that provide some additional assembly data. See the following directory:

20180822_phase_genomics_geoduck_Results/geoduck_manual/

Actual sequencing data and two previous assemblies were previously received on 20180421.

All assembly data (both old and new) from Phase Genomics was downloaded in full from the Google Drive link provided by them and stored here on Owl:

20180822_phase_genomics_geoduck_Results/

Ran Quast to compare all three assemblies provided (command run on Swoose):


/home/sam/software/quast-4.5/quast.py \
-t 24 \
--labels 20180403_pga,20180421_pga,20180810_geo_manual \
/mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts results 2018-04-03 11:05:41.596285/PGA_assembly.fasta \ /mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts results 2018-04-21 18:09:04.514704/PGA_assembly.fasta \ /mnt/owl/Athaliana/20180822_phase_genomics_geoduck_Results/geoduck_manual/geoduck_manual_scaffolds.fasta

Results:

Quast output folder: results_2018_08_23_07_38_28/

Quast report (HTML): results_2018_08_23_07_38_28/report.html

Library Construction – Geoduck Water Filter Metagenome with Nextera DNA Flex Kit (Illumina)

0000-0002-2747-368X

Made Illumina libraries with goeduck metagenome water filter DNA I previously isolated on:

We used a free Nextera DNA Flex Kit (Illumina) that we won in a contest held by Illumina!

Followed the manufacturer’s protocol for input DNA quantities <10ng with the following changes/notes:

PCR steps performed in 200uL thin-walled PCR tubes.
Magnetic separations were performed in 1.7mL snap cap tubes.
Thermalcycler: PTC-200 (MJ Research)
Magnet: DynaMag 2 (Invitrogen)

See the Library Calcs sheet (link below) for original sample names and subsequent library sample names.

IMPORTANT!

The sheet also contains the indexes used for each library. This info will be necessary for sequencing facility.

Library Calcs (Google Sheet):

20180606_nextera_library_geoduck_metagenome_calcs

Links to the Illumina manuals are below:

After library construction was completed, individual libraries were quantified on the Roberts Lab Qubit 3.0 (Invitrogen) with the Qubit 1x dsDNA HS Assay Kit.

2uL of each sample was used for each assay.

Library quality was assessed using the Seeb Lab 2100 Bioanalyzer (Agilent) with a High Sensitivity DNA Kit, using 1uL of each sample.

Libraries were stored in the small -20C in FTR213:

Sam’s gDNA Box #2
Slots H6 – I3

Results:

Qubit Raw Data (Google Sheet):

20180606_qubit_geoduck_metagenome_libraries

Bioanalyzer File (XAD):

20180606_133725.xad

All libraries have DNA in them, so that’s good!

Except for one library (Library Geoduck MG #04 is bad), the other libraries look OK (i.e. not great). Compared to the example on Pg. 12 in the manual, these libraries all have some extra high molecular weight stuff.

When selecting the range listed in the Nextera Kit manual, the average fragment size is ~530bp – the expected size should be ~600bp.

Spoke with Steven about Library Geoduck MG #04 and we’ve opted to just leave it out.

All other samples were pooled into a single samples according to the manufacturer’s protocol.

This pooled sample was stored in the same -20C box as above, in position I4.

UPDATE 20180808

After some confusion with the sequencing facility, I contacted Illumina regarding adapter sequences. I used the sequences provided for the Nextera DNA 24 CD Indexes (which was the index kit we used) on p.18 of the Illumina Index Adapter Pooling Guide.

As it turns out, these sequences are incorrect. The correct sequences are on p.12 of that document (the Nextera DNA 96 CD Indexes).

I’ve updated the Google Sheet (linked above) to reflect the correct index sequences.

Email from Illumina is below. Even though he specifically references the H705 adapter, the correct sequence information for all i7 index adapters is found on p.12.

Hi Sam,

Thanks for the clarification! For the index sequence H705, this sequence is incorrect in the Index Adapters Pooling Guide. The correct information is found on page 12 of the same document and should be:

H705 “AGGAGTCC” (Bases in Adapter) and “GGACTCCT” (bases for sample sheet.

This is also consistent with the Illumina Adapters letter.

We have provided this feed back to our colleagues to update the document so that all the information is consistent.

Thanks for your patience and understanding while we evaluated this issue. If we do have any other questions or concerns, please let us know and we would be happy to discuss this further.

Best,

Russell

Russell Chan, Ph.D.

Technical Applications Scientist

Illumina Technical Support

Telephone available 24 hours

Monday through Friday

Technical Bulletins: https://support.illumina.com/bulletins.html

Trainings: http://support.illumina.com/traidexes

TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data (directional)

0000-0002-2747-368X

Earlier this week, I ran TrimGalore!, but set the trimming, incorrectly – due to a copy/paste mistake, as --non-directional, so I re-ran with the correct settings.

Steven requested that I trim the Geoduck RRBS libraries that we have, in preparation to run them through Bismark.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

project_juvenile_geoduck_OA/Sample_Processing (GitHub)

All analysis is documented in a Jupyter Notebook; see link below.

Overview of process:

Run TrimGalore! with --paired and --rrbs settings.
Run FastQC and MultiQC on trimmed files.
Copy all data to owl (see Results below for link).
Confirm data integrity via MD5 checksums.

Jupyter Notebook:

20180516_roadrunner_geoduck_RRBS_trimming.ipynb (GitHub)

Results:

FastQC – RRBS Geoduck BS-seq FASTQ data

0000-0002-2747-368X

Earlier today I finished trimming Hollie’s RRBS BS-seq FastQ data.

However, the original files were never analyzed with FastQC, so I ran it on the original files.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

project_juvenile_geoduck_OA/Sample_Processing (GitHub)

FastQC was run, followed by MultiQC. Analysis was run on Roadrunner.

All analysis is documented in a Jupyter Notebook; see link below.

Jupyter Notebook:

20180516_roadrunner_geoduck_EPI_fastqc

Results:

FastQC output folder:

20180516_geoduck_EPI_fastqc/

MultiQC output folder:

20180516_geoduck_EPI_fastqc/multiqc_data

MultiQC report (HTML):

multiqc_report.html

TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data

0000-0002-2747-368X

20180516 – UPDATE!!

THIS WAS RUN WITH THE INCORRECT SETTING IN TRIMGALORE! `--non-directional`

WILL RE-RUN

Steven requested that I trim the Geoduck RRBS libraries that we have, in preparation to run them through Bismark.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

project_juvenile_geoduck_OA/Sample_Processing (GitHub)

All analysis is documented in a Jupyter Notebook; see link below.

Overview of process:

Copy EPI* FastQ files from owl/P_generosa to roadrunner.
Confirm data integrity via MD5 checksums.
Run TrimGalore! with --paired, --rrbs, and --non-directional settings.
Run FastQC and MultiQC on trimmed files.
Copy all data to owl (see Results below for link).
Confirm data integrity via MD5 checksums.

Jupyter Notebook:

20180514_roadrunner_geoduck_RRBS_trimming.ipynb (GitHub)

Results:

TrimGalore! output folder:

20180514_geoduck_trimgalore_rrbs

FastQC output folder:

20180514_geoduck_trimgalore_rrbs/20180514_geoduck_trimmed_fastqc/

MultiQC output folder:

20180514_geoduck_trimgalore_rrbs/20180514_geoduck_trimmed_fastqc/multiqc_data

MultiQC report (HTML):

multiqc_report.html

Data Management – Illumina NovaSeq Geoduck Genome Sequencing

0000-0002-2747-368X

As part of the Illumina collaborative geoduck genome sequencing project, their end goal has always been to sequence the genome in a single run.

They’ve finally attempted this by running 10x Genomics, Hi-C, Nextera, and TruSeq libraries in a single run of the NovaSeq.

I downloaded the data using the BaseSpace downloader using Chrome on a Windows 7 computer (this is not available on Ubuntu and the command line tools that are available from Illumina are too confusing for me to bother spending the time on to figure out how to use them just to download the data).

Data was saved here:

nightingales/P_generosa/

Generated MD5 checksums (using md5sum on Ubuntu) and appended to the checksums file:

nightingales/P_generosa/checksums.md5

Illumina was unable to provide MD5 checksums on their end, so I was unable to confirm data integrity post-download.

Illumina sample info is here:

20180403GeoDuckSamples.csv (GitHub)

Will add info to:

List of files received:

10x-Genomics-Libraries-Geo10x5-A3-MultipleA_S10_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleA_S10_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleA_S10_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleA_S10_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleB_S11_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleB_S11_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleB_S11_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleB_S11_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleC_S12_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleC_S12_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleC_S12_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleC_S12_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleD_S13_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleD_S13_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleD_S13_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleD_S13_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleA_S14_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleA_S14_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleA_S14_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleA_S14_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleB_S15_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleB_S15_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleB_S15_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleB_S15_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleC_S16_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleC_S16_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleC_S16_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleC_S16_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleD_S17_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleD_S17_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleD_S17_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleD_S17_L002_R2_001.fastq.gz
HiC-Libraries-GeoHiC-C3-N701_S18_L001_R1_001.fastq.gz
HiC-Libraries-GeoHiC-C3-N701_S18_L001_R2_001.fastq.gz
HiC-Libraries-GeoHiC-C3-N701_S18_L002_R1_001.fastq.gz
HiC-Libraries-GeoHiC-C3-N701_S18_L002_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP10-B2-AD013_S7_L001_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP10-B2-AD013_S7_L001_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP10-B2-AD013_S7_L002_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP10-B2-AD013_S7_L002_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP11-C2-AD014_S8_L001_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP11-C2-AD014_S8_L001_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP11-C2-AD014_S8_L002_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP11-C2-AD014_S8_L002_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP12-D2-AD015_S9_L001_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP12-D2-AD015_S9_L001_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP12-D2-AD015_S9_L002_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP12-D2-AD015_S9_L002_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP9-A2-AD002_S6_L001_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP9-A2-AD002_S6_L001_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP9-A2-AD002_S6_L002_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP9-A2-AD002_S6_L002_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA1-A1-NR006_S1_L001_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA1-A1-NR006_S1_L001_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA1-A1-NR006_S1_L002_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA1-A1-NR006_S1_L002_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA3-C1-NR012_S2_L001_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA3-C1-NR012_S2_L001_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA3-C1-NR012_S2_L002_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA3-C1-NR012_S2_L002_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA5-E1-NR005_S3_L001_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA5-E1-NR005_S3_L001_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA5-E1-NR005_S3_L002_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA5-E1-NR005_S3_L002_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA7-G1-NR019_S4_L001_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA7-G1-NR019_S4_L001_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA7-G1-NR019_S4_L002_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA7-G1-NR019_S4_L002_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L001_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L001_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L002_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L002_R2_001.fastq.gz

Assembly – Geoduck Hi-C Assembly Subsetting

0000-0002-2747-368X

Steven asked me to create a couple of subsets of our Phase Genomics Hi-C geoduck genome assembly (pga_02):

Contigs >10kbp
Contigs >30kbp

I used pyfaidx on Roadrunner and the following commands:

faidx --size-range 10000,100000000 PGA_assembly.fasta > PGA_assembly_10k_plus.fasta

faidx --size-range 30000,100000000 PGA_assembly.fasta > PGA_assembly_30k_plus.fasta

Ran Quast afterwards to get stats on the new FastA files just to confirm that the upper cutoff value was correct and didn’t get rid of the largest contig(s).

Results:

faidx Output folder: 20180512_geoduck_fasta_subsets/

10kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_10k_plus.fasta

30kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_30k_plus.fasta

Quast output folder: results_2018_05_14_06_26_26/

Quast report (HTML): results_2018_05_14_06_26_26/report.html

Everything looks good. The main thing I wanted to confirm by running Quast was that the largest contig in each subset was the same as the original PGA assembly (95,480,635bp.

Read Mapping – Mapping Illumina Data to Geoduck Genome Assemblies with Bowtie2

0000-0002-2747-368X

We have an upcoming meeting with Illumina to discuss how the geoduck genome project is coming along and to decide how we want to proceed.

So, we wanted to get a quick idea of how well our geoduck assemblies are by performing some quick alignments using Bowtie2.

Used the following assemblies as references:

sn_ph_01 : SuperNova assembly of 10x Genomics data
sparse_03 : SparseAssembler assembly of BGI and Illumina project data
pga_02 : Hi-C assembly of Phase Genomics data

The analysis is documented in a Jupyter Notebook.

Jupyter Notebook (GitHub):

20180508_roadrunner_geoduck_bowtie2_genome_mapping.ipynb

NOTE: Due to large amount of stdout from first genome index command, the notebook does not render well on GitHub. I recommend downloading and opening notebook on a locally install version of Jupyter.

Here’s a brief overview of the process:

Generate Bowtie2 indexes for each of the genome assemblies.

Map 1,000,000 reads from the following Illumina NovaSeq FastQ files:

NR013_AD013_S2_L001_R1_001_val_1_val_1.fq.gz

NR013_AD013_S2_L001_R2_001_val_2_val_2.fq.gz

Results:

Bowtie2 Genome Indexes:

20180508_geoduck_assemblies_bowtie2_indexes/

Bowtie2 sn_ph_01 alignment folder:

20180508_geoduck_mapping_nova_to_10x/

Bowtie2 sparse_03 alignment folder:

20180508_geoduck_mapping_nova_to_sparse/

Bowtie2 pga_02 alignment folder:

20180508_geoduck_mapping_nova_to_Hi-C/

MAPPING SUMMARY TABLE

All mapping data was pulled from the respective *.err file in the Bowtie2 alignment folders.

sequence_ID Assembler Alignment Rate (%)

sn_ph_01 SuperNova (10x) 79.89

sparse_03 SparseAssembler 85.83

pga_02 Hi-C (Phase Genomics) 79.90|

Mapping efficiency is similar for all assemblies. After speaking with Steven, we’ve decided we’ll begin exploring genome annotation pipelines.

sequence_ID	Assembler	Alignment Rate (%)
sn_ph_01	SuperNova (10x)	79.89
sparse_03	SparseAssembler	85.83
pga_02	Hi-C (Phase Genomics)	79.90\|

Posted in Geoduck Genome Sequencing and tagged assembly, bowtie2, geoduck, jupyter notebook, Panopea generosa on May 9, 2018 by kubu4. Leave a comment

Assembly & Stats – SparseAssembler (k95) on Geoduck Sequence Data > Quast for Stats

0000-0002-2747-368X
Had a successful assembly with SparseAssembler k101, but figured I’d just tweak the kmer setting and throw it in the queue and see how it compares; minimal effort/time needed.

Initiatied an assembly run using SparseAssembler on our Mox HPC node on all of our geoduck genomic sequencing data:

BGI HiSeq Data

Illumina Mate Pair HiSeq Data

Illumina NovaSeq Data

Kmer size set to 95.

Slurm script: 20180423_sparse_assembler_kmer95_geoduck_slurm.sh

After the run finished, I copied the files to our server (Owl) and then ran Quast on my computer to gather some assembly stats, using the following command:

/home/sam/software/quast-4.5/quast.py \ -t 24 \ --labels 20180423_sparse_k95 \ /mnt/owl/Athaliana/20180423_sparseassembler_kmer95_geoduck/Contigs.txt \

Results:

SparseAssembler output folder: 20180423_sparseassembler_kmer95_geoduck/

SparseAsembler assembley (FastA; 15GB): 20180423_sparseassembler_kmer95_geoduck/Contigs.txt

Quast output folder: quast_results/results_2018_05_10_15_04_07

Quast report (HTML): quast_results/results_2018_05_10_15_04_07/report.html

I’ve embedded the Quast HTML report below, but it may be easier to view by using the link above.

Well, it’s remarkable how different this is than the previous SparseAssembler with k101 setting!

This assembly doesn’t have a single contig >50,000bp, while the previous one has four contigs over that threshold!

Definitely shows what a large impact the kmer setting in assembly software can have on the final assembly!

Posted in Geoduck Genome Sequencing and tagged geoduck, mox, Panopea generosa, QUAST, SparseAssembler on May 1, 2018 by kubu4. Leave a comment

← Older

Loading

Projects

Links
GitHub – Roberts Lab Repository

GitHub – Roberts Lab Wiki

GitHub – Roberts Lab Notebooks

GitHub – My Repository

I Try Linux

Recent Posts

New Notebook November 8, 2018

qPCRs – Ronit’s C.gigas ploidy/dessication/heat stress cDNA (1:5 dilution) October 18, 2018

Samples Received – Crassostrea virginica (Eastern oyster) tissue from Lotterhos Lab (Northeastern University) October 17, 2018

Reverse Transcription – Ronit’s C.gigas DNased ctenidia RNA October 17, 2018

qPCR – Ronit’s DNAsed C.gigas Ploidy/Dessication RNA with elongation factor primers October 17, 2018

Tag Cloud
BB black abalone BS-seq cDNA COX Crassostrea gigas Crassostrea virginica cyclooxygenase DNA Isolation DNA Quantification DNase DNased RNA Eastern oyster EtOH precipitation gDNA gel geoduck gill gonad Haliotis cracherodii Hard clam hemocyte Immomix jupyter notebook library prep Mercenaria mercenaria NanoDrop1000 olympia oyster Opticon2 Ostrea lurida Pacific herring Pacific oyster Panopea generosa PCR PGS prostaglandin synthase qPCR RNA RNA isolation RNA quantification SOLiD SOLiD libraries SYTO13 Turbo DNA-free Vibrio tubiashii
Archives

Meta

Register

Log in

Entries RSS

Comments RSS

Open Notebook Science Network

Sam's Notebook

University of Washington – Fishery Sciences – Roberts Lab

Tag Archives: Panopea generosa

Transcriptome Assembly – Geoduck RNAseq data

RESULTS:

Assembly Stats – Geoduck Hi-C Final Assembly Comparison

Results:

Library Construction – Geoduck Water Filter Metagenome with Nextera DNA Flex Kit (Illumina)

IMPORTANT!

Results:

TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data (directional)

Results:

TrimGalore! output folder:

FastQC output folder:

MultiQC output folder:

MultiQC report (HTML):

FastQC – RRBS Geoduck BS-seq FASTQ data

Jupyter Notebook:

Results:

FastQC output folder:

MultiQC output folder:

MultiQC report (HTML):

TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data

20180516 – UPDATE!!

THIS WAS RUN WITH THE INCORRECT SETTING IN TRIMGALORE! `--non-directional`

WILL RE-RUN

Results:

TrimGalore! output folder:

FastQC output folder:

MultiQC output folder:

MultiQC report (HTML):

Data Management – Illumina NovaSeq Geoduck Genome Sequencing

Assembly – Geoduck Hi-C Assembly Subsetting

Results:

Read Mapping – Mapping Illumina Data to Geoduck Genome Assemblies with Bowtie2

Results:

MAPPING SUMMARY TABLE

Assembly & Stats – SparseAssembler (k95) on Geoduck Sequence Data > Quast for Stats

Results:

RESULTS:

Results:

IMPORTANT!

Results:

Results:

TrimGalore! output folder:

FastQC output folder:

MultiQC output folder:

MultiQC report (HTML):

Jupyter Notebook:

Results:

FastQC output folder:

MultiQC output folder:

MultiQC report (HTML):

20180516 – UPDATE!!

THIS WAS RUN WITH THE INCORRECT SETTING IN TRIMGALORE! --non-directional

WILL RE-RUN

Results:

TrimGalore! output folder:

FastQC output folder:

MultiQC output folder:

MultiQC report (HTML):

Results:

Results:

MAPPING SUMMARY TABLE

Results:

THIS WAS RUN WITH THE INCORRECT SETTING IN TRIMGALORE! `--non-directional`