User Tools

Site Tools


gwas

This is an old revision of the document!


GWAS

GWAS is one of Lifelines' additional assessments and contains the genome-wide SNP data of 15.400 adult Lifelines participants derived from the Illumina CytoSNP-12v2 array.
The name GWAS might be confusing, as the assessment is not a "genome wide association study" in itself, but rather the formation of a database of SNP data that can be used to perform such studies.
Note that a second set of participants is genotyped in the UGLI project.

Subcohort

The GWAS subcohort consists of 15.400 independent (no biological family relations), Caucasian-ancestry adult Lifelines participants. DNA samples were collected at 1a visit 2.

Table 1: General information on the GWAS subcohort. *Age at Baseline 2nd visit.

GWAS population general information
Ratio male/female 41.8% / 58.2%
Average age* 47.8
Minimal Age* 18
Maximum Age* 89
Age category <18 N=0
Age category 18-64 N=13890
Age category 65+ N=1510

SNP array

The Illumina human CytoSNP12 Beadchip was chosen to study genetic variations in the LifeLines GWAS cohort. The 12-sample HumanCytoSNP-12 BeadChip is a powerful, whole-genome scanning panel designed for efficient, high-throughput analysis of genetic and structural variations that are the most relevant to human disease. Many types and sizes of structural variation in the human genome that affect phenotypes can be detected with the HumanCytoSNP-12 BeadChip, including duplications, deletions, amplifications, copy-neutral LOH, and mosaicism. This BeadChip includes a complete panel of genome-wide tag SNPs and markers targeting all regions of known cytogenetic importance. It incorporates 200,000 SNPs which cover around 250 genomic regions commonly screened in cytogenetics laboratories, including subtelomeric regions, pericentromeric regions, sex chromosomes, and targeted coverage in around 400 additional disease-related genes.

Quality checks

Quality controls of the data are based on SNP filtering on minor allele frequency (MAF) above 0.001, Hardy-Weinberg equilibrium (HWE) P-value >1e-4, call rate of 0.95 using Plink 1), and principal component analysis (PCA) to check for population outliers. Description and criteria for quality checks are listed in the Table 2.

Table 2: Description and criteria for quality checks

Description Criteria Action
Hybridisation normal range GenomeStudio from Illumina the samples will be hybridized again
Call rate call-rate < 95% exclude data from analysis
Samples with call-rates < 80% are excluded before reviewing clustering
SNP genotype calling (samples with a call rate < 80% are excluded) visual inspection of clusters for all SNPs with a GT score of < 0.51 These clusters were changed to give the best results. After these checks a new cluster file is exported and stored with the raw data to enable exact reproduction and checking in future. This file will be used as reference cluster file in the next data release
Sample
Duplicate sample identification included twice or sample mix-up? remove data from sample with the lowest call rate
no relationship or mix-up can be determined remove data from both samples
Excess or deficit of heterozygote SNPs there are consistently across the chromosomes more or less heterozygote SNPs than expected remove data from both samples
there is one chromosome where there are more or less heterozygote SNPs than expected remove data from both samples
Sex check Verify if sex is according to LifeLines database exclude mismatches
Verify lab workflow to determine sex exclude all possible mismatches
SNP
Call-rate per SNP call-rate < 95% remove data from SNP
HWE equilibrium HWE-P < 0.001 discard for classical SNP analysis
Minor allele frequency MAF <1% exclude SNP
1)
Purcell S Neale B Todd-Brown Ket al. . PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007;81:559–75
You could leave a comment if you were logged in.
gwas.1592838516.txt.gz · Last modified: 2020/06/22 17:08 by trynke