Downloaded data from the BGI project portal to our server, Owl, using the Synology Download Station. Although the BGI portal is aesthetically nice, it’s set up poorly for bulk downloads and took a few tries to download all of the files.
Data integrity was assessed and read counts for each file were generated. The files were moved to their permanent storage location on Owl: http://owl.fish.washington.edu/nightingales/O_lurida
The readme.md file was updated to include project/file information.
The file manipulations were performed in a Jupyter notebook (see below).
Total reads generated for this project: 1,225,964,680
BGI provided us with the raw data files for us to play around with, but they are also currently in the process of performing the genome assembly.
Jupyter Notebook file: 20160126_Olurida_BGI_data_handling.ipynb
Notebook Viewer: 20160126_Olurida_BGI_data_handling.ipynb
What is the difference in all the files? ie read length and insert size..
I believe that info should go https://docs.google.com/spreadsheets/d/1r4twxfBHpWfQoznbn2dAQhgMvmlZvQqW9I2_uVZX_aU/edit#gid=0
maybe?
Here’s the breakdown. Have added info to readme.md file on Owl.
F15FTSUSAT0327
library insert_size read_length
wHAXPI023905_96 300bp PE150
wHAIPI023992_37 500bp PE150
wHAMPI023991_66 800bp PE150
WHOSTibkDCAADWAAPEI_74 2k PE50
WHOSTibkDCABDLAAPEI_62 5k PE50
WHOSTibkDCACDTAAPEI_75 10k PE50