Data Management – Convert Oly PacBio H5 to FASTQ

After working with all of this Olympia oyster genome sequencing data, I remembered that we had an old, singular PacBio SMRT cell file (from June 2013). This file didn’t seem to be included in any recent assemblies of Sean’s or mine. This is most likely because we have it in the PacBio H5 format and not in FASTQ.

I installed PacBio’s pbh5tools on my computer (swoose), converted the file and moved it to owl/nightingales/O_lurida

python bash5tools.py /mnt/owl/nightingales/O_lurida/m130619_081336_42134_c100525122550000001823081109281326_s1_p0.bas.h5 --outType fastq 

I generated an MD5 checksum and appended to the checksums.md5 file in /owl/nightingales/O_lurida using the following command:

md5sum m130619_081336_42134_c100525122550000001823081109281326_s1_p0.fastq | awk '{print $2 " = " $1}' >> checksums.md5

The command above pipes the output to awk to format the output to match the existing format of the checksums.md5 file (i.e. filename = hash).

I’ve also updated our Nightingales spreadsheet (Google Sheet) to reflect this.

Will generate updated PacBio assemblies with Canu and/or Racon.

2 comments

  1. Pardon us, Sam,

    This is our final test of new code which sends you an email per each new comment on one of your notebooks. This will not affect your moderation settings; e.g., a user such as yourself or Steven will have comments posted automatically and you will receive an email accordingly. That is, for this code to run more liberally, modify discussion settings to publish comments more liberally.

    This comment is on Sam’s Notebook and we will run the test on them both.

Leave a Reply

Your email address will not be published. Required fields are marked *


e.g. 0000-0002-7299-680X

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>