Post Manuscript Review - data crunch

In [1]:
#file ID
#fid="CgM1"

#TIMESTAMP
date=!date +%m%d_%H%M
#working directory (parent)
#wd="/Volumes/web/cnidarian/BiGo_larvae_merge/"

#where is bsmap
#bsmap="/Users/Shared/Apps/bsmap-2.73/"
bsmap="/Volumes/Bay3/Software/BSMAP/bsmap-2.74/"

#fastq files location R1 location
R1="/Volumes/web/trilobite/Crassostrea_gigas_HTSdata/BiGoRNA_GTGTCTAC_1.fastq"

#fastq files location R2 location
#comment out if SE
R2="/Volumes/web/trilobite/Crassostrea_gigas_HTSdata/BiGoRNA_GTGTCTAC_2.fastq"
In []:
#option - number of processes 
!{bsmap}bsmap -a {R1} -b {R2} -d /Volumes/web/cnidarian/oyster.v9.fa -o /Volumes/web/cnidarian/BiGo_bsmap_v9_{date}.sam -p 3

BSMAP v2.74
Start at:  Thu Apr  3 14:41:53 2014

Input reference file: /Volumes/web/cnidarian/oyster.v9.fa 	(format: FASTA)
Load in 11969 db seqs, total size 558601156 bp. 24 secs passed
total_kmers: 43046721
Create seed table. 91 secs passed
max number of mismatches: read_length * 8% 	max gap size: 0
kmer cut-off ratio: 5e-07
max multi-hits: 100	max Ns: 5	seed size: 16	index interval: 4
quality cutoff: 0	base quality char: '!'
min fragment size:28	max fragemt size:500
start from read #1	end at read #4294967295
additional alignment: T in reads => C in reference
mapping strand (read_1): ++,-+
mapping strand (read_2): +-,--
Pair-end alignment(3 threads)
Input read file #1: /Volumes/web/trilobite/Crassostrea_gigas_HTSdata/BiGoRNA_GTGTCTAC_1.fastq 	(format: FASTQ)
Input read file #2: /Volumes/web/trilobite/Crassostrea_gigas_HTSdata/BiGoRNA_GTGTCTAC_2.fastq 	(format: FASTQ)
Output file: /Volumes/web/cnidarian/BiGo_bsmap_v9_[0403_1441].sam	 (format: SAM)
Thread #1: 	50000 read pairs finished. 190 secs passed
Thread #2: 	150000 read pairs finished. 191 secs passed
Thread #0: 	100000 read pairs finished. 192 secs passed
Thread #1: 	200000 read pairs finished. 234 secs passed
Thread #2: 	250000 read pairs finished. 235 secs passed
Thread #0: 	300000 read pairs finished. 236 secs passed
Thread #1: 	350000 read pairs finished. 288 secs passed
Thread #2: 	400000 read pairs finished. 290 secs passed
Thread #0: 	450000 read pairs finished. 293 secs passed
Thread #1: 	500000 read pairs finished. 336 secs passed
Thread #2: 	550000 read pairs finished. 339 secs passed
Thread #0: 	600000 read pairs finished. 343 secs passed
Thread #1: 	650000 read pairs finished. 384 secs passed
Thread #2: 	700000 read pairs finished. 389 secs passed
Thread #0: 	750000 read pairs finished. 393 secs passed
Thread #1: 	800000 read pairs finished. 439 secs passed
Thread #2: 	850000 read pairs finished. 445 secs passed
Thread #0: 	900000 read pairs finished. 453 secs passed
Thread #1: 	950000 read pairs finished. 497 secs passed
Thread #2: 	1000000 read pairs finished. 503 secs passed
Thread #0: 	1050000 read pairs finished. 505 secs passed
Thread #1: 	1100000 read pairs finished. 547 secs passed
Thread #2: 	1150000 read pairs finished. 550 secs passed
Thread #0: 	1200000 read pairs finished. 554 secs passed
Thread #1: 	1250000 read pairs finished. 589 secs passed
Thread #2: 	1300000 read pairs finished. 592 secs passed
Thread #0: 	1350000 read pairs finished. 596 secs passed
Thread #1: 	1400000 read pairs finished. 631 secs passed
Thread #2: 	1450000 read pairs finished. 633 secs passed
Thread #0: 	1500000 read pairs finished. 639 secs passed
Thread #1: 	1550000 read pairs finished. 680 secs passed
Thread #2: 	1600000 read pairs finished. 683 secs passed
Thread #0: 	1650000 read pairs finished. 687 secs passed
Thread #1: 	1700000 read pairs finished. 727 secs passed
Thread #2: 	1750000 read pairs finished. 733 secs passed
Thread #0: 	1800000 read pairs finished. 743 secs passed
Thread #1: 	1850000 read pairs finished. 784 secs passed
Thread #2: 	1900000 read pairs finished. 787 secs passed
Thread #0: 	1950000 read pairs finished. 794 secs passed
Thread #1: 	2000000 read pairs finished. 828 secs passed
Thread #2: 	2050000 read pairs finished. 833 secs passed
Thread #0: 	2100000 read pairs finished. 843 secs passed
Thread #1: 	2150000 read pairs finished. 881 secs passed
Thread #2: 	2200000 read pairs finished. 885 secs passed
Thread #0: 	2250000 read pairs finished. 892 secs passed
Thread #1: 	2300000 read pairs finished. 938 secs passed
Thread #2: 	2350000 read pairs finished. 948 secs passed
Thread #0: 	2400000 read pairs finished. 957 secs passed
Thread #1: 	2450000 read pairs finished. 1003 secs passed
Thread #2: 	2500000 read pairs finished. 1012 secs passed
Thread #0: 	2550000 read pairs finished. 1025 secs passed
Thread #1: 	2600000 read pairs finished. 1079 secs passed
Thread #2: 	2650000 read pairs finished. 1091 secs passed
Thread #0: 	2700000 read pairs finished. 1102 secs passed
Thread #1: 	2750000 read pairs finished. 1150 secs passed
Thread #2: 	2800000 read pairs finished. 1161 secs passed
Thread #0: 	2850000 read pairs finished. 1170 secs passed
Thread #1: 	2900000 read pairs finished. 1221 secs passed
Thread #2: 	2950000 read pairs finished. 1236 secs passed
Thread #0: 	3000000 read pairs finished. 1245 secs passed
Thread #1: 	3050000 read pairs finished. 1299 secs passed
Thread #2: 	3100000 read pairs finished. 1318 secs passed
Thread #0: 	3150000 read pairs finished. 1332 secs passed
Thread #1: 	3200000 read pairs finished. 1401 secs passed
Thread #2: 	3250000 read pairs finished. 1416 secs passed
Thread #0: 	3300000 read pairs finished. 1428 secs passed
Thread #1: 	3350000 read pairs finished. 1496 secs passed
Thread #2: 	3400000 read pairs finished. 1507 secs passed
Thread #0: 	3450000 read pairs finished. 1521 secs passed
Thread #1: 	3500000 read pairs finished. 1585 secs passed
Thread #2: 	3550000 read pairs finished. 1597 secs passed
Thread #0: 	3600000 read pairs finished. 1623 secs passed
Thread #1: 	3650000 read pairs finished. 1682 secs passed
Thread #2: 	3700000 read pairs finished. 1690 secs passed

bsmap on iplant


BSMAP v2.74
Start at:  Thu Apr  3 14:49:12 2014

Input reference file: oyster.v9.fa.gz   (format: gzipped FASTA)
Load in 11969 db seqs, total size 558601156 bp. 12 secs passed
total_kmers: 43046721
Create seed table. 34 secs passed
max number of mismatches: read_length * 8%  max gap size: 0
kmer cut-off ratio: 5e-07
max multi-hits: 100 max Ns: 5   seed size: 16   index interval: 4
quality cutoff: 0   base quality char: '!'
min fragment size:28    max fragemt size:500
start from read #1  end at read #4294967295
additional alignment: T in reads => C in reference
mapping strand (read_1): ++,-+
mapping strand (read_2): +-,--
Pair-end alignment(8 threads)
Input read file #1: BiGoRNA_GTGTCTAC_1.fastq    (format: FASTQ)
Input read file #2: BiGoRNA_GTGTCTAC_2.fastq    (format: FASTQ)
Output file: bsmap_out.sam   (format: SAM)
Thread #2:  50000 read pairs finished. 47 secs passed
Total number of aligned reads: 
pairs:       14245353 (56%)
single a:    6073960 (24%)
single b:    5624917 (22%)
Done.
Finished at Thu Apr  3 15:03:24 2014
Total time consumed:  852 secs
In []:
!python {bsmap}methratio.py -d {genome} -u -z -g -o methratio_out.txt -s {bsmap}samtools bsmap_out.sam 
#command for only obtaining the context '__CG_'
!grep "[A-Z][A-Z]CG[A-Z]" <methratio_out.txt> methratio_out_CG.txt 
#5x coverage
!awk '{if ($8 >= 5) print $1,$2-1,$2+1,"CpG",$5}' <methratio_out_CG.txt> filt_methratio_out_CG.igv 
!tr ' ' "\t" <filt_methratio_out_CG.igv> filt_methratio_{fid}.igv
In []: