BSMAP-methatio-methykit workflow

In [42]:
#Setting Variables
#file ID
fid="M1_nov"
#where is bsmap
bsmap="/Users/Shared/Apps/bsmap-2.73/"
#fastq files location R1 location
R1="/Volumes/web/trilobite/Crassostrea_gigas_HTSdata/batterbox/FCC39EM/Sample_BS_CgM1/filtered_BS_CgM1_ACTTGA_L004_R1.fastq.gz"
#genome file 
genome="/Volumes/web/whale/ensembl/ftp.ensemblgenomes.org/pub/release-21/metazoa/fasta/crassostrea_gigas/dna/Crassostrea_gigas.GCA_000297895.1.21.dna_sm.genome.fa"
#location of sqlshare python client tools
spt="/Users/Mackenzie/sqlshare-pythonclient/tools/"
In [5]:
cd /Volumes/web/Mollusk/bs_larvae_exp/
/Volumes/web/Mollusk/bs_larvae_exp

In [6]:
mkdir {fid}
In [7]:
cd {fid}
/Volumes/web/Mollusk/bs_larvae_exp/M1_nov

In [8]:
!{bsmap}bsmap -a {R1} -d {genome} -o bsmap_out.sam -p 1

BSMAP v2.73
Start at:  Thu Feb 13 09:15:47 2014

Input reference file: /Volumes/web/whale/ensembl/ftp.ensemblgenomes.org/pub/release-21/metazoa/fasta/crassostrea_gigas/dna/Crassostrea_gigas.GCA_000297895.1.21.dna_sm.genome.fa 	(format: FASTA)
Load in 7658 db seqs, total size 557717710 bp. 25 secs passed
total_kmers: 43046721
Create seed table. 40 secs passed
max number of mismatches: read_length * 8% 	max gap size: 0
kmer cut-off ratio:5e-07
max multi-hits: 100	max Ns: 5	seed size: 16	index interval: 4
quality cutoff: 0	base quality char: '!'
min fragment size:28	max fragemt size:500
start from read #1	end at read #4294967295
additional alignment: T in reads => C in reference
mapping strand: ++,-+
Single-end alignment(1 threads)
Input read file: /Volumes/web/trilobite/Crassostrea_gigas_HTSdata/batterbox/FCC39EM/Sample_BS_CgM1/filtered_BS_CgM1_ACTTGA_L004_R1.fastq.gz 	(format: gzipped FASTQ)
Output file: bsmap_out.sam	 (format: SAM)
Thread #0: 	50000 reads finished. 45 secs passed
Thread #0: 	100000 reads finished. 50 secs passed
Thread #0: 	150000 reads finished. 56 secs passed
Thread #0: 	200000 reads finished. 61 secs passed
Thread #0: 	250000 reads finished. 65 secs passed
Thread #0: 	300000 reads finished. 70 secs passed
Thread #0: 	350000 reads finished. 75 secs passed
Thread #0: 	400000 reads finished. 80 secs passed
Thread #0: 	450000 reads finished. 85 secs passed
Thread #0: 	500000 reads finished. 90 secs passed
Thread #0: 	550000 reads finished. 95 secs passed
Thread #0: 	600000 reads finished. 100 secs passed
Thread #0: 	650000 reads finished. 113 secs passed
Thread #0: 	700000 reads finished. 118 secs passed
Thread #0: 	750000 reads finished. 123 secs passed
Thread #0: 	800000 reads finished. 128 secs passed
Thread #0: 	850000 reads finished. 134 secs passed
Thread #0: 	900000 reads finished. 140 secs passed
Thread #0: 	950000 reads finished. 145 secs passed
Thread #0: 	1000000 reads finished. 150 secs passed
Thread #0: 	1050000 reads finished. 155 secs passed
Thread #0: 	1100000 reads finished. 161 secs passed
Thread #0: 	1150000 reads finished. 166 secs passed
Thread #0: 	1200000 reads finished. 171 secs passed
Thread #0: 	1250000 reads finished. 176 secs passed
Thread #0: 	1300000 reads finished. 181 secs passed
Thread #0: 	1350000 reads finished. 186 secs passed
Thread #0: 	1400000 reads finished. 191 secs passed
Thread #0: 	1450000 reads finished. 196 secs passed
Thread #0: 	1500000 reads finished. 201 secs passed
Thread #0: 	1550000 reads finished. 206 secs passed
Thread #0: 	1600000 reads finished. 211 secs passed
Thread #0: 	1650000 reads finished. 216 secs passed
Thread #0: 	1700000 reads finished. 222 secs passed
Thread #0: 	1750000 reads finished. 228 secs passed
Thread #0: 	1800000 reads finished. 234 secs passed
Thread #0: 	1850000 reads finished. 239 secs passed
Thread #0: 	1900000 reads finished. 244 secs passed
Thread #0: 	1950000 reads finished. 249 secs passed
Thread #0: 	2000000 reads finished. 254 secs passed
Thread #0: 	2050000 reads finished. 260 secs passed
Thread #0: 	2100000 reads finished. 265 secs passed
Thread #0: 	2150000 reads finished. 270 secs passed
Thread #0: 	2200000 reads finished. 276 secs passed
Thread #0: 	2250000 reads finished. 281 secs passed
Thread #0: 	2300000 reads finished. 286 secs passed
Thread #0: 	2350000 reads finished. 291 secs passed
Thread #0: 	2400000 reads finished. 296 secs passed
Thread #0: 	2450000 reads finished. 303 secs passed
Thread #0: 	2500000 reads finished. 308 secs passed
Thread #0: 	2550000 reads finished. 313 secs passed
Thread #0: 	2600000 reads finished. 318 secs passed
Thread #0: 	2650000 reads finished. 324 secs passed
Thread #0: 	2700000 reads finished. 329 secs passed
Thread #0: 	2750000 reads finished. 334 secs passed
Thread #0: 	2754039 reads finished. 334 secs passed
Total number of aligned reads: 1736531 (63%)
Done.
Finished at Thu Feb 13 09:21:22 2014
Total time consumed:  335 secs

In [9]:
!python {bsmap}methratio.py -d {genome} -u -z -g -o methratio_out.txt -s {bsmap}samtools bsmap_out.sam
@ Thu Feb 13 09:21:22 2014: reading reference /Volumes/web/whale/ensembl/ftp.ensemblgenomes.org/pub/release-21/metazoa/fasta/crassostrea_gigas/dna/Crassostrea_gigas.GCA_000297895.1.21.dna_sm.genome.fa ...
@ Thu Feb 13 09:21:58 2014: reading bsmap_out.sam ...
[samopen] SAM header is present: 7658 sequences.
@ Thu Feb 13 09:22:55 2014: combining CpG methylation from both strands ...
@ Thu Feb 13 09:23:25 2014: writing methratio_out.txt ...
@ Thu Feb 13 09:27:34 2014: done.
total 1265415 valid mappings, 11761356 covered cytosines, average coverage: 1.18 fold.

In [27]:
!python {spt}singleupload.py -u che625@washington.edu -p 5234162537ce6a35236569c28ab62f65 -d _methratio{fid} methratio_out.txt 
processing chunk line 0 to 1828607 (2.19254803658 s elapsed)
pushing methratio_out.txt...
parsing 08FF7CA6...
processing chunk line 1828607 to 3623132 (160.471320152 s elapsed)
pushing methratio_out.txt...
parsing 03ACD3F3...
processing chunk line 3623132 to 5428168 (318.861622095 s elapsed)
pushing methratio_out.txt...
parsing 0FFE828B...
processing chunk line 5428168 to 7226496 (492.835301161 s elapsed)
pushing methratio_out.txt...
parsing A77FABFA...
processing chunk line 7226496 to 9005242 (685.053544044 s elapsed)
pushing methratio_out.txt...
parsing 58DDBA60...
processing chunk line 9005242 to 10826451 (923.542622089 s elapsed)
pushing methratio_out.txt...
parsing 593DF5E3...
processing chunk line 10826451 to 11761357 (1274.3491621 s elapsed)
pushing methratio_out.txt...
parsing 336DCB4F...
finished _methratioM1_nov

In [45]:
!python {spt}fetchdata.py -s "SELECT * FROM [che625@washington.edu].[_methratioM1_nov] WHERE context like '__CG_ and CT_Count >= 5" -f tsv -o filtered_methratio{fid}.txt
Traceback (most recent call last):
  File "/Users/Mackenzie/sqlshare-pythonclient/tools/fetchdata.py", line 76, in <module>
    main()
  File "/Users/Mackenzie/sqlshare-pythonclient/tools/fetchdata.py", line 71, in main
    data = fetchdata(args.sql, args.format, args.output)
  File "/Users/Mackenzie/sqlshare-pythonclient/tools/fetchdata.py", line 30, in fetchdata
    return conn.download_sql_result(sql, format, output)
  File "build/bdist.macosx-10.8-intel/egg/sqlshare/__init__.py", line 337, in download_sql_result
  File "build/bdist.macosx-10.8-intel/egg/sqlshare/__init__.py", line 264, in poll_selector
sqlshare.SQLShareError: code: 400 : {"Detail":"Invalid object name 'che625@washington.edu._methratioM1_nov'."}

In [43]:
!python {spt}fetchdata.py -
usage: fetchdata.py [-h] (--dataset SQL | --sql SQL) [--format {tsv,csv}]
                    --output OUTPUT

Download a dataset or query from SQLShare.

optional arguments:
  -h, --help            show this help message and exit
  --dataset SQL, -d SQL
                        The name of the dataset to be downloaded.
  --sql SQL, -s SQL     The SQL query, the answer to which will be downloaded.
  --format {tsv,csv}, -f {tsv,csv}
                        The format in which the data will be downloaded
                        (default: csv).
  --output OUTPUT, -o OUTPUT
                        Where to save the downloaded file.

In []: