Posted by & filed under Cgigas DNA Methylation.

tldr
bar


In an effort to find out where heat stress induced differentially methylated loci are in the oyster genome (to ultimately inform on function) I have been using bedtools to see where the DMLs lie on the genome. As this was done on an array platform I also felt I need to take into consideration where probes were, noting that they were not randomly distributed across the genome but rather targetted to genes.

I have determined the proportion of DMLs (n=10028, 10148, 11690) for each oyster that fall within a given genomic feature and compared that to the proporiton of total probes (n=697753) that fall within each genomic feature. For example in just looking at Oyster 2 DMLs and DEGs …

!intersectbed \
 -wb \
 -a ./data/2014.07.02.colson/genomeBrowserTracks/logFC_HS-preHS/2014.07.02.2M_sig.bedGraph \
 -b /Users/sr320/data-genomic/tentacle/Cuffdiff_geneexp.sig.gtf \
 | cut -f 6 \
 | sort | uniq -c
 !intersectbed \
 -wb \
 -a /Users/sr320/git-repos/paper-Temp-stress/ipynb/data/array-design/OID40453_probe_locations.gff \
 -b /Users/sr320/data-genomic/tentacle/Cuffdiff_geneexp.sig.gtf \
 | cut -f 11 \
 | sort | uniq -c
880 Cufflinks
117460 Cufflinks

#Enter the data comparing Oyster 2 then Probes
 obs = array([[880, 10028], [117460, 697753]])
#Calculate the chi-square test
 chi2_corrected = stats.chi2_contingency(obs, correction=True)
 chi2_uncorrected = stats.chi2_contingency(obs, correction=False)
#Print the result
 print('CHI SQUARE')
 print('The corrected chi2 value is {0:5.3f}, with p={1:5.3f}'.format(chi2_corrected[0], chi2_corrected[1]))
 print('The uncorrected chi2 value is {0:5.3f}, with p={1:5.3f}'.format(chi2_uncorrected[0], chi2_uncorrected[1]))
CHI SQUARE
The corrected chi2 value is 352.138, with p=0.000
The uncorrected chi2 value is 352.654, with p=0.000

~ jupyter notebook


To be honest I feel like I am missing some nuance in the analysis, however at this point I believe I will keep pushing through by seeing of the results break out based on whether the DML is hypo or hypermethylated. If you forgot hear is the breakdown.

Oyster Hypo-methylated Hyper-methylated Hypo-3plus-merged Hypo-3plus-merged
2 7224 2803 108 4
4 6560 3587 48 10
6 7645 4044 53 9

This also sheds light on the fact that I am currently ignoring clustering (3-plus), something else to put on the list!

Posted by & filed under qdod.

Below is an updated version of canonical genome tracks as part of the qdod project – @ github. Updates include details on version 25 gff files and adding the TE track derived via WU-Blast.


Canonical Feature Tracks (Ensembl)

Ensemble provides a feature tracks that are updated on a regular basis.
They can be directly accessed at
ftp://ftp.ensemblgenomes.org/pub/current/metazoa/gff3/crassostrea_gigas/
ftp://ftp.ensemblgenomes.org/pub/current/metazoa/gtf/crassostrea_gigas/
Note this will ensure you have the most current version.

Version 25
GTF

http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_ensembl_tracks/Crassostrea_gigas.GCA_000297895.1.25.gtf

GFF3

http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_ensembl_tracks/Crassostrea_gigas.GCA_000297895.1.25.gff3

Screenshot
igv_en

List of gff feature (v25)

5 EnsemblGenomes RNA
2530 EnsemblGenomes exon
13 EnsemblGenomes gene
28 EnsemblGenomes miRNA
28 EnsemblGenomes miRNA_gene
1410 EnsemblGenomes pseudogenic_tRNA
13 EnsemblGenomes rRNA
13 EnsemblGenomes rRNA_gene
47 EnsemblGenomes snRNA
47 EnsemblGenomes snRNA_gene
20 EnsemblGenomes snoRNA
20 EnsemblGenomes snoRNA_gene
994 EnsemblGenomes tRNA_gene
2422 EnsemblGenomes transcript
186890 GigaDB CDS
186938 GigaDB exon
26101 GigaDB gene
26101 GigaDB transcript
650376 dust repeat_region
224899 trf repeat_region

Canonical Feature Tracks (version 9)

Gene

http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff

Exons

http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff

Intron

http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff

Promoter (= 1kbp 5′ of genes)

http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff

Transposable Elements

http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE-WUBLASTX.gff

Complement to Gene, Promoter, and TE tracks

http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed

All CGs

http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff

Screenshot:
shot

Details regarding the development of these tracks can be found in this IPython Notebook as well as in this methods section.

quicklook

==> /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff <==
C16582  flankbed    promoter    386 395 .   -   .   ID=CGI_10000001;
C17212  flankbed    promoter    1   30  .   +   .   ID=CGI_10000002;
C17316  flankbed    promoter    1   29  .   +   .   ID=CGI_10000003;

==> /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff <==
scaffold38980   fuzznuc nucleotide_motif    63420   63421   2   +   .   ID=scaffold38980.741;note=*pat pattern:CG
scaffold38980   fuzznuc nucleotide_motif    63670   63671   2   +   .   ID=scaffold38980.742;note=*pat pattern:CG

==> /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE-WUBLASTX.gff <==
scaffold1479    WUBlastX    LTR_Gypsy   2608    4209    104 +   .   .
C33730  WUBlastX    LTR_Pao 1960    2589    652 -   .   .
C33730  WUBlastX    LTR_Pao 3358    5868    1471    -   .   .

==> /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff <==
C21242  TRF Tandem_Repeat   38  100 72  +   .   .
C21306  TRF Tandem_Repeat   35  143 112 +   .   .
C21306  TRF Tandem_Repeat   574 947 208 +   .   .

==> /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TEx.gff <==
scaffold1479    WUBlastX    LTR_Gypsy   2608    4209    104 +   .   .
C33730  WUBlastX    LTR_Pao 1960    2589    652 -   .   .
C33730  WUBlastX    LTR_Pao 3358    5868    1471    -   .   .

==> /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff <==
C16582  GLEAN   CDS 35  385 .   -   0   Parent=CGI_10000001;
C17212  GLEAN   CDS 31  363 .   +   0   Parent=CGI_10000002;
C17316  GLEAN   CDS 30  257 .   +   0   Parent=CGI_10000003;

==> /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff <==
C16582  GLEAN   mRNA    35  385 0.555898    -   .   ID=CGI_10000001;
C17212  GLEAN   mRNA    31  363 0.999572    +   .   ID=CGI_10000002;
C17316  GLEAN   mRNA    30  257 0.555898    +   .   ID=CGI_10000003;

==> /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff <==
C17476  subtractBed intrn   75  103 .   -   .   Parent=CGI_10000004;
C19392  subtractBed intrn   184 451 .   +   .   Parent=CGI_10000015;
C20262  subtractBed intrn   539 641 .   -   .   Parent=CGI_10000025;

Posted by & filed under Cgigas DNA Methylation.

tldrRNA-seq feature tracks


In an effort to better visualize the RNA-seq data from the heat shock experiment all accepted_hits.bam files from tophat2 analysis of the 6 libraries (3 pre, 3 post) were converted to bedgraphs.

!/Applications/bedtools2/bin/genomeCoverageBed 
-bg 
-split 
-ibam accepted_hits.bam 
-g /Volumes/web/halfshell/qdod3/Cg.GCA_000297895.1.25.dna_sm.toplevel.genomee 
&gt; 2M-HS.bedgraph

As per IGV recommendations, files were further converted to .tdf files.

tdf

I tried to do this at the command line but the bedgraph input format seemed to be a problem.

Ultimately this renders as

igv
  • [IGV xml File](http://owl.fish.washington.edu/halfshell/2015-02-hs-bedgraph/20150226-igv_session.xml) _only renders locally_

Next step still seems to be going back to DMRs and characterizing where the exist in the genome. Nothing yet seems obvious, particularly related to differentially expressed genes.

Steven Roberts

February 20, 2015

Profile photo of Steven Roberts

Created a gtf based on Cuffdiff gene expression output.

!python {sqls}fetchdata.py 
-s &quot;SELECT Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9 
FROM [sr320@washington.edu].[_cuffdiffgenes.sorted_by_expression.sig.txt]sig 
left join 
[sr320@washington.edu].[_rebuilt.gtf.geneIDend]id 
on 
sig.gene_ID=id.Column10&quot; 
-f tsv 
-o /Users/sr320/data-genomic/tentacle/diffgene.gtf
!head /Users/sr320/data-genomic/tentacle/diffgene.gtf    

Available @

http://owl.fish.washington.edu/halfshell/bu-data-genomic/tentacle/Cuffdiff_geneexp.sig.gtf