Sam's Notebook » array http://onsnetwork.org/kubu4 University of Washington - Fishery Sciences - Roberts Lab Thu, 08 Nov 2018 21:47:12 +0000 en-US hourly 1 http://wordpress.org/?v=4.0 Data Management – Integrity Check of Final BGI Olympia Oyster & Geoduck Data http://onsnetwork.org/kubu4/2016/12/15/data-management-integrity-check-of-final-bgi-olympia-oyster-geoduck-data/ http://onsnetwork.org/kubu4/2016/12/15/data-management-integrity-check-of-final-bgi-olympia-oyster-geoduck-data/#comments Thu, 15 Dec 2016 22:46:19 +0000 http://onsnetwork.org/kubu4/?p=2403

After completing the downloads of these files from BGI, I needed to verify that the downloaded copies matched the originals. Below is a Jupyter Notebook detailing how I verified file integrity via MD5 checksums. It also highlights the importance of doing this check when working with large sequencing files (or, just large files in general), as a few of them had mis-matching MD5 checksums!

Although the notebook is embedded below, it might be easier viewing via the notebook link (hosted on GitHub).

At the end of the day, I had to re-download some files, but all the MD5 checksums match and these data are ready for analysis:

Final Ostrea lurida genome files

Final Panopea generosa genome files

Jupyter Notebook: 20161214_docker_BGI_data_integrity_check.ipynb

]]>
http://onsnetwork.org/kubu4/2016/12/15/data-management-integrity-check-of-final-bgi-olympia-oyster-geoduck-data/feed/ 0