User Tools

Site Tools


ugli

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ugli [2025/05/15 10:49] sylviaugli [2026/02/16 14:40] (current) sylvia
Line 5: Line 5:
 ===== Background ===== ===== Background =====
 Genome-wide association (GWAS) data is highly valuable for biobanks such as Lifelines in identifying disease/trait associations, predicting future disease development and personalized treatment.\\ Genome-wide association (GWAS) data is highly valuable for biobanks such as Lifelines in identifying disease/trait associations, predicting future disease development and personalized treatment.\\
-To facilitate the generation, analysis and study of genetic data in Lifelines, the UGLI consortium was founded. UGLI brings together many groups and PIs within the UMCG, RUG and beyond that are interested in performing such research with Lifelines data. They have brought the funding together which led to the initial genotyping of a total of 38,030 Lifelines participants, including [[children]], as part of the [[http://glimdna.org|HUGE]] consortium in Rotterdam on the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0. Together with 15,400 samples already genotyped on the Cytochip ([[GWAS]]), two quality controlled GWAS datasets with a combined sample size of n~50,000 subjects will be made available to members of the UGLI consortium based on specific proposals approved by the UGLI steering committee and by Lifelines.\\ +To facilitate the generation, analysis and study of genetic data in Lifelines, the UGLI consortium was founded. UGLI brings together many groups and PIs within the UMCG, RUG and beyond that are interested in performing such research with Lifelines data. They have brought the funding together which led to the initial genotyping of a total of 38,030 + 29,366 Lifelines participants, including [[children]], as part of the [[http://glimdna.org|HUGE]] consortium in Rotterdam on the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0 and the FinnGen Thermo Fisher Axiom® custom array (resp.). Together with 15,400 samples already genotyped on the CytoSNP array ([[GWAS]]), three quality controlled GWAS datasets with a combined sample size of n~110,000 subjects are available. Genotyping of the remaining Lifelines participants is still ongoing.\\
-The UGLI consortium is actively raising funding for the genotyping of additional samples. The genotyped additional samples are generally referred to as UGLI2. With additional funding of new UGLI members, the consortium will increase the number of genotyped Lifelines participants. These efforts will make Lifelines a more interesting partner for national and international collaborations as well as with non-academic partners that work on healthy ageing. +
  
 +Important to note: **UGLI3 currently has a publication restriction**. the UGLI-consortium is currently preparing their manuscript describing the dataset and its primary analyses. Other manuscripts using UGLI3 data may not be submitted **before 31 December 2026**.
 +\\
 +\\
  
 ===== UGLI1 - GSA ===== ===== UGLI1 - GSA =====
Line 30: Line 32:
 {{:ugli_age_distribution.jpg?400|}} {{:ugli_age_distribution.jpg?400|}}
  
-===== Quality Checks =====+==== Quality Checks ====
 An UGLI1 - GSA (release 2.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the first release of UGLI comprising the genotype of 38,030 participants An UGLI1 - GSA (release 2.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the first release of UGLI comprising the genotype of 38,030 participants
 assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0. {{ :qc_report_ugli1_release_2_-v1.pdf |}} assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0. {{ :qc_report_ugli1_release_2_-v1.pdf |}}
Line 45: Line 47:
 Raw intensity data from the GSA will be made available to the researchers.  Raw intensity data from the GSA will be made available to the researchers. 
  
 +====Updated version - v3====
 +Changes in the GSA genotype and imputed data version 3 (Oct 2025)
 +
 +1. During the QC of the Affymetrix data (UGLI2+3) genetic relationships of all samples genotyped with the CytoSNP, GSA or Affymetrix array were determined and compared with the reported pedigree relations. Some DNA samples appeared not to be from the expected individual and could also not be matched to any other individual using the observed genetic relationships. These samples were therefore excluded. This resulted in a new sample size of N=36,233.
 +
 +2. A new phasing and imputation strategy was adopted. The online HRC imputation on the Sanger Imputation Server makes use of the SHAPEIT2 or EAGLE2 phasing tools and the PBWT imputation tool. These software tools are relatively old (2014). Instead, an in-house phasing and imputation approach was used, using a subset of ~11,000 HRC samples as a reference panel with the EAGLE2 phasing and IMPUTE5 imputation tools.
 +
 +3. Based on a new PCA on the combined data of the CytoSNP, GSA and Affymetrix chips to identify non-European individuals, a new list of non-European GSA individuals now has been constructed (PC's). It is up to the researcer whether he/she wants to exclude them or correct for population ethnicity in the analysis.
 +
 +\\
 +\\
  
 ===== UGLI2 - Affymetrix ===== ===== UGLI2 - Affymetrix =====
 As of March 2023, data of an additional 28,149 genotyped participants has been made available. Samples in this release, called UGLI2, were genotyped using the FinnGen Thermo Fisher Axiom® custom array. As of March 2023, data of an additional 28,149 genotyped participants has been made available. Samples in this release, called UGLI2, were genotyped using the FinnGen Thermo Fisher Axiom® custom array.
  
-29,366 participants were selected for UGLI 2 release and assessed using the pre mentioned array. All genotypes were included for QC screening, but the QC focussed on the the autosomes and chromosomes X for which there are N=617,715 and 22,346 markers available, respectively. A final set of 28,149 samples and 441,596 markers on autosomal and 18,450 X chromosomes markers passed the QC steps described in  +29,366 participants were selected for UGLI 2 release and assessed using the pre mentioned array. All genotypes were included for QC screening, but the QC focussed on the the autosomes and chromosomes X for which there are N=617,715 and 22,346 markers available, respectively. A final set of 28,149 samples and 441,596 markers on autosomal and 18,450 X chromosomes markers passed the QC steps described in the quality check rapport.
-{{ :qc_report_ugli2_release_1_-v1.pdf |}}.+
  
 ^ UGLI2 - Affymetrix cohort - samples that passed QC           || ^ UGLI2 - Affymetrix cohort - samples that passed QC           ||
Line 64: Line 76:
 Please note that the array used for UGLI2 differs from the one used in UGLI1. Overlap in SNPs between these two arrays (GSA chip from Illumina=UGLI1 and FinnGen array from Affymetrix/ThermoFischer=UGLI2) is small, namely 1000-10000 SNPs. Please note that the array used for UGLI2 differs from the one used in UGLI1. Overlap in SNPs between these two arrays (GSA chip from Illumina=UGLI1 and FinnGen array from Affymetrix/ThermoFischer=UGLI2) is small, namely 1000-10000 SNPs.
  
-===== Quality Checks ===== +==== Quality Checks ==== 
-An UGLI2 - Affymetrix (release 2.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the second release of UGLI comprising the genotype of 29,366 participants assessed using the FinnGen Thermo Fisher Axiom® custom array. {{ :qc_report_ugli2_release_1_-v1.pdf |}}+An UGLI2 - Affymetrix (release 2.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the second release of UGLI comprising the genotype of 29,366 participants assessed using the FinnGen Thermo Fisher Axiom® custom array. {{ :qc_report_ugli2_release_2_-v1.pdf |}}
  
 ====Imputation==== ====Imputation====
Line 72: Line 84:
 ====SNP array intensity files==== ====SNP array intensity files====
 Raw intensity data from the FinnGen Thermo Fisher Axiom® custom array will be made available to the researchers.  Raw intensity data from the FinnGen Thermo Fisher Axiom® custom array will be made available to the researchers. 
 +\\
 +\\
  
-===== UGLI3 ===== +===== UGLI2+3 - Affymetrix ===== 
-TBA+Important to note: **UGLI3 currently has a publication restriction**. the UGLI-consortium is currently preparing their manuscript describing the dataset and its primary analyses. Other manuscripts using UGLI3 data may not be submitted **before 31 December 2026**.
  
 +As of december 2025, data of 60,157 genotyped participants have been made available. Samples in this release, called UGLI2+3, were genotyped using the FinnGen Thermo Fisher Axiom® custom array. The sample includes the previously genotyped participants from UGLI2 (see above). The QC and imputation was done on the combined dataset of UGLI2 and UGLI3.
 +
 +63,553 participants were selected for UGLI 2+3 release and assessed using the pre mentioned array. All genotypes were included for QC screening, but the QC focussed on the the autosomes and chromosomes X for which there are N=615,682 and 22,346 markers available, respectively. A final set of 60,157 samples and 476,693 markers on autosomal and X chromosomes passed the QC steps described in the quality check rapport.
 +
 +^ UGLI2+3 - Affymetrix cohort - samples that passed QC           ||
 +| Subgroup                                        | N       |
 +| Total                                           | 60,157  |
 +| Male                                            | TBA  |
 +| Female                                          | TBA  |
 +| Age* [[children|8-17]]                          | TBA  |
 +| Age* 18-64                                      | TBA  |
 +| Age* >64                                        | TBA   |
 +Table 3: UGLI2+3 - Affymetrix cohort information. These are samples that passed QC. 
 +
 +Please note that the array used for UGLI2+3 differs from the one used in UGLI1. Overlap in SNPs between these two arrays (GSA chip from Illumina=UGLI1 and FinnGen array from Affymetrix/ThermoFischer=UGLI2) is small, namely 1000-10000 SNPs.
 +
 +==== Quality Checks ====
 +An UGLI2+3 - Affymetrix (release 3.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the third release of UGLI comprising the genotype of 60,157 participants assessed using the FinnGen Thermo Fisher Axiom® custom array. {{ ::qcreport_ugli2and3_oct2025.pdf |}}
 +
 +====Imputation====
 +A final set of 60,157 samples and 476,693 markers on autosomal and X chromosomes passing all QC steps described in the QC rapport and were used for genetic phasing and imputation. Genetic imputation was done through the 
 +Sanger imputation service using the Haplotype Reference Consortium (http://www.haplotype-reference-consortium.org) panel. 
 +Method: See QC report
 +
 +====SNP array intensity files====
 +Raw intensity data from the FinnGen Thermo Fisher Axiom® custom array will be made available to the researchers. 
 +\\
 +\\
 +
 +\\
  
 ===== Overlap between studies===== ===== Overlap between studies=====
Line 84: Line 128:
 | GWAS4       | 938        | | GWAS4       | 938        |
 Table 2: A number of participants in UGLI1 also participated in other studies, i.e. [[deep|DAG1]], [[dag3|DAG3]], [[dag2|DAG2]]/[[http://www.nlgenome.nl/|GoNL]] and [[gwas|GWAS4]]. In the second column the sample sizes that overlap between these studies and UGLI1 can be found. For DAG1 and DAG3 these are approximations.   Table 2: A number of participants in UGLI1 also participated in other studies, i.e. [[deep|DAG1]], [[dag3|DAG3]], [[dag2|DAG2]]/[[http://www.nlgenome.nl/|GoNL]] and [[gwas|GWAS4]]. In the second column the sample sizes that overlap between these studies and UGLI1 can be found. For DAG1 and DAG3 these are approximations.  
 +\\ 
 +\\
  
 =====UGLI-data release===== =====UGLI-data release=====
Line 107: Line 152:
 | PLINK    | PLINK is a command line program written in C/C++  | | PLINK    | PLINK is a command line program written in C/C++  |
  
 +
 +===== Publications with UGLI data =====
 +
 +  * Li et al. 2024 [[https://academic.oup.com/gpb/article/22/2/qzae031/7649324?login=false|Genome-wide Studies Reveal Genetic Risk Factors for Hepatic Fat Content]], Genomics, Proteomics & Bioinformatics, 22(2):qzae031
 +  * Keaton et al. 2024 [[https://www.nature.com/articles/s41588-024-01714-w | Genome-wide analysis in over 1 million individuals of European ancestry yields improved polygenic risk scores for blood pressure traits]] Nature Genetics. 56(5):778-791
 +  * Qiao et al. 2023 [[https://www.nature.com/articles/s41467-023-36013-1 | Estimation and implications of the genetic architecture of fasting and non-fasting blood glucose]] Nature Communications. 14(1):451
 +  * Warmerdam et al. 2022 [[https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010135 | Increased genetic contribution to wellbeing during the COVID-19 pandemic]] PLoS Genetics. 18(5):e1010135
 +  * Nolte et al. 2017 [[https://www.nature.com/articles/ejhg201750 | Missing heritability: is the gap closing? An analysis of 32 complex traits in the Lifelines Cohort Study]] Eur J Hum Genet. 25(7):877-885
  
  
ugli.1747306142.txt.gz · Last modified: by sylvia