We’re in the process of organizing files for a manuscript dealing with the geoduck genome assembly/annotation we’ve done. As part of that, we need the Stringtie BAM file that was used with GenSAS for Pgenerosa_v074 annotation to upload to the Open Science Foundation repository for this project. Unfortunately, at 73GB, the file far exceeds the individual file size limit for OSF (5GB). So, I split it into 5GB chunks. See the following notebook for deets:
Jupyter Notebook (GitHub):
TL;DR:
-
Use Bash command
split
to split the file into desired chunk sizes -
Reassemble chunks into full size BAM using the Bash
cat
command. -
Run
md5sum
on original BAM and reassembled BAM to confirm the two files are the same.
RESULTS
Output folder:
Will upload split files to OSF repository.