Comments on: Data Analysis – Continued O.lurida Fst Analysis from GBS Data http://onsnetwork.org/kubu4/2016/12/01/data-analysis-continued-o-lurida-fst-analysis-from-gbs-data/ University of Washington - Fishery Sciences - Roberts Lab Tue, 09 Apr 2019 18:37:58 +0000 hourly 1 http://wordpress.org/?v=4.0 By: kubu4 http://onsnetwork.org/kubu4/2016/12/01/data-analysis-continued-o-lurida-fst-analysis-from-gbs-data/#comment-4504 Wed, 07 Dec 2016 16:11:54 +0000 http://onsnetwork.org/kubu4/?p=2374#comment-4504

Thanks for all of this! Much appreciated!

VCF was generated from PyRad with no alterations.

Mincov setting in PyRad was 4.

Thanks for the Biostar link; I think I’ve looked at that (and the other thread linked in that post) a few times.

I’ll check out your other notebooks; thanks!

I’ll also look into running Stacks’ “Populations” program.

Thanks again for the info on this. Much appreciated!

]]>
By: Katherine Silliman http://onsnetwork.org/kubu4/2016/12/01/data-analysis-continued-o-lurida-fst-analysis-from-gbs-data/#comment-4494 Sat, 03 Dec 2016 17:59:34 +0000 http://onsnetwork.org/kubu4/?p=2374#comment-4494 Nice job getting vcftools to run! Are you using the vcf file that pyrad outputs, or ipyrad? Pyrad will output a VCF file with all variant sites, while ipyrad will output a VCF file with ALL sites variant and invariant. In either case, you want to thin your VCF to 1 variant site per GBS loci when looking at mean FST. In the notebook you are looking at, I am using a VCF file that has already been thinned. See the end of this notebook to see how I did that using scripts from Mikhail Matz’s github: https://github.com/ksil91/2016_Notebook/blob/master/Mapping2Genome.md

What was your MinCov setting in pyrad? Are loci included that are missing in a lot of individuals? I’ve found after digging some more that the “NaN” arise for invariant sites. These can occur when the SNP is only in 1 of the populations, and therefore when comparing the other 2 populations you are left with an invariant site. Make sure to excluded NaN when doing mean.

For negative values, I usually see these with loci that with very uneven coverage between populations (like a loci was only sequenced in 1 individual from Pop A and 10 individuals from Pop B). I found a Biostars post about negative Fst values from Weir-Cockerham, that in the edit suggests throwing out negative Fst value loci as they may be due to calculation errors.
https://www.biostars.org/p/132253/

I have lately been using a different approach for my GBS data with pairwise Fst, using a different Fst formula modified by a friend specifically for the uneven coverage you often get with GBS data. It only calculates pairwise Fst for a site that is genotyped in at least 2 individuals and is variant when considering only individuals from those 2 populations. The notebook is found here: https://github.com/ksil91/2016_Notebook/blob/master/Exploring_GBS_Data_With_R.ipynb

Another alternative that I haven’t explored much yet is the populations program in Stacks. It now accepts a VCF file from another program, without requiring you to use the rest of the Stacks pipeline to assemble and genotype. They have their own modified Fst and nucleotide diversity formulas that I think also account for missingness in GBS-like data.

Let me know if you have any questions!

]]>