UGLI

UGLI

UGLI is one of Lifelines' additional assessments. UGLI is the abbreviation for UMCG Genetics Lifelines Initiative. UGLI aims at facilitating and accelerating genetic data generation and data analysis and thereby scientific output through using the Lifelines genomics data.

Background

Genome-wide association (GWAS) data is highly valuable for biobanks such as Lifelines in identifying disease/trait associations, predicting future disease development and personalized treatment.
To facilitate the generation, analysis and study of genetic data in Lifelines, the UGLI consortium was founded. UGLI brings together many groups and PIs within the UMCG, RUG and beyond that are interested in performing such research with Lifelines data. They have brought the funding together which led to the initial genotyping of a total of 38,030 + 29,366 Lifelines participants, including children, as part of the HUGE consortium in Rotterdam on the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0 and the FinnGen Thermo Fisher Axiom® custom array (resp.). Together with 15,400 samples already genotyped on the CytoSNP array (GWAS), three quality controlled GWAS datasets with a combined sample size of n~110,000 subjects are available. Genotyping of the remaining Lifelines participants is still ongoing.

Important to note: UGLI3 currently has a publication restriction. the UGLI-consortium is currently preparing their manuscript describing the dataset and its primary analyses. Other manuscripts using UGLI3 data may not be submitted before 31 December 2026.

UGLI1 - GSA

38,030 Lifelines participants were selected for UGLI1 using the following criteria:

availability of isolated DNA-samples of adequate volume and concentration at Lifelines
Caucasian-ancestry samples

The genotype of 38,030 participants was assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0¹⁾. In the QC screening all genotyped samples were included, and the focuss of the QC of genetic markers was on the autosomes and chromosomes X (N=691,072 markers). A final set of 36,339 samples and 548,029 markers on autosomal and X chromosomes passed the QC steps described in qc_report_ugli1_release_2_-v1.pdf.

UGLI1 - GSA cohort - samples that passed QC
Subgroup	N
Total	36,339
Male	15,098
Female	21,241
Age* 8-17	3,522*
Age* 18-64	30,416
Age* >64	2,401

Table 1: UGLI1 - GSA cohort information. These are samples that passed QC. Age at Baseline assessment first visit. *One participant did not visit during Baseline, but did visit during 2nd screening. Since participant was under 18 years of age at 2nd screening visit 1, this participant has been added to the children 8-17 group.

Quality Checks

An UGLI1 - GSA (release 2.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the first release of UGLI comprising the genotype of 38,030 participants assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0. qc_report_ugli1_release_2_-v1.pdf

Imputation

A final set of 36,339 samples and 548,029 markers on autosomal and X chromosomes passing all QC steps described in qc_report_ugli1_release_2_-v1.pdf were used for genetic imputation. Genetic imputation was done through the Sanger imputation service using the Haplotype Reference Consortium ( https://www.sanger.ac.uk/collaboration/haplotype-reference-consortium/ ) panel. The dataset was formatted following the instructions from the Sanger webpage ( https://www.sanger.ac.uk/science/tools/sanger-imputation-service ).

SNP array intensity files

Raw intensity data from the GSA will be made available to the researchers.

Updated version - v3

Changes in the GSA genotype and imputed data version 3 (Oct 2025)

1. During the QC of the Affymetrix data (UGLI2+3) genetic relationships of all samples genotyped with the CytoSNP, GSA or Affymetrix array were determined and compared with the reported pedigree relations. Some DNA samples appeared not to be from the expected individual and could also not be matched to any other individual using the observed genetic relationships. These samples were therefore excluded. This resulted in a new sample size of N=36,233.

2. A new phasing and imputation strategy was adopted. The online HRC imputation on the Sanger Imputation Server makes use of the SHAPEIT2 or EAGLE2 phasing tools and the PBWT imputation tool. These software tools are relatively old (2014). Instead, an in-house phasing and imputation approach was used, using a subset of ~11,000 HRC samples as a reference panel with the EAGLE2 phasing and IMPUTE5 imputation tools.

3. Based on a new PCA on the combined data of the CytoSNP, GSA and Affymetrix chips to identify non-European individuals, a new list of non-European GSA individuals now has been constructed (PC's). It is up to the researcer whether he/she wants to exclude them or correct for population ethnicity in the analysis.

UGLI2 - Affymetrix

As of March 2023, data of an additional 28,149 genotyped participants has been made available. Samples in this release, called UGLI2, were genotyped using the FinnGen Thermo Fisher Axiom® custom array.

29,366 participants were selected for UGLI 2 release and assessed using the pre mentioned array. All genotypes were included for QC screening, but the QC focussed on the the autosomes and chromosomes X for which there are N=617,715 and 22,346 markers available, respectively. A final set of 28,149 samples and 441,596 markers on autosomal and 18,450 X chromosomes markers passed the QC steps described in the quality check rapport.

UGLI2 - Affymetrix cohort - samples that passed QC
Subgroup	N
Total	28,149
Male	TBA
Female	TBA
Age* 8-17	TBA
Age* 18-64	TBA
Age* >64	TBA

Table 3: UGLI2 - Affymetrix cohort information. These are samples that passed QC.

Please note that the array used for UGLI2 differs from the one used in UGLI1. Overlap in SNPs between these two arrays (GSA chip from Illumina=UGLI1 and FinnGen array from Affymetrix/ThermoFischer=UGLI2) is small, namely 1000-10000 SNPs.

Quality Checks

An UGLI2 - Affymetrix (release 2.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the second release of UGLI comprising the genotype of 29,366 participants assessed using the FinnGen Thermo Fisher Axiom® custom array. qc_report_ugli2_release_2_-v1.pdf

Imputation

A final set of 28,149 samples and 460,136 markers on autosomal and X chromosomes passing all QC steps described above were used for genetic imputation. Genetic imputation was done through the Sanger imputation service using the Haplotype Reference Consortium (https://www.sanger.ac.uk/collaboration/haplotype-reference-consortium/) panel.

SNP array intensity files

Raw intensity data from the FinnGen Thermo Fisher Axiom® custom array will be made available to the researchers.

UGLI2+3 - Affymetrix

As of december 2025, data of 60,157 genotyped participants have been made available. Samples in this release, called UGLI2+3, were genotyped using the FinnGen Thermo Fisher Axiom® custom array. The sample includes the previously genotyped participants from UGLI2 (see above). The QC and imputation was done on the combined dataset of UGLI2 and UGLI3.

63,553 participants were selected for UGLI 2+3 release and assessed using the pre mentioned array. All genotypes were included for QC screening, but the QC focussed on the the autosomes and chromosomes X for which there are N=615,682 and 22,346 markers available, respectively. A final set of 60,157 samples and 476,693 markers on autosomal and X chromosomes passed the QC steps described in the quality check rapport.

UGLI2+3 - Affymetrix cohort - samples that passed QC
Subgroup	N
Total	60,157
Male	TBA
Female	TBA
Age* 8-17	TBA
Age* 18-64	TBA
Age* >64	TBA

Table 3: UGLI2+3 - Affymetrix cohort information. These are samples that passed QC.

Please note that the array used for UGLI2+3 differs from the one used in UGLI1. Overlap in SNPs between these two arrays (GSA chip from Illumina=UGLI1 and FinnGen array from Affymetrix/ThermoFischer=UGLI2) is small, namely 1000-10000 SNPs.

Quality Checks

An UGLI2+3 - Affymetrix (release 3.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the third release of UGLI comprising the genotype of 60,157 participants assessed using the FinnGen Thermo Fisher Axiom® custom array. qcreport_ugli2and3_oct2025.pdf

Imputation

A final set of 60,157 samples and 476,693 markers on autosomal and X chromosomes passing all QC steps described in the QC rapport and were used for genetic phasing and imputation. Genetic imputation was done through the Sanger imputation service using the Haplotype Reference Consortium (https://www.sanger.ac.uk/collaboration/haplotype-reference-consortium/) panel. Method: See QC report

SNP array intensity files

Raw intensity data from the FinnGen Thermo Fisher Axiom® custom array will be made available to the researchers.

Overlap between studies

Study name	N in UGLI1	N in UGLI2+3
DEEP (DAG1)	~600	~50
GoNL (DAG2)	143
DAG3	~8000

Table 2: A number of participants in UGLI also participated in other studies, i.e. DAG1, DAG2/GoNL and DAG3. In the second and third column the sample sizes that overlap between these studies and UGLI can be found. For more on overlap between studies see Omics studies overlap.

UGLI-data release

UGLI data is available on the HPC (Linux environment) of the UMCG. The data will not be accessible through the Lifelines workspace. The applicant’s proposal will be reviewed by both Lifelines and the UGLI steering committee (UGLI SC).

Requesting UGLI data: The applicant applies via the regular Lifelines application procedure. This means the applicant submits the proposal together with the dataset order using our online catalogue (https://data-catalogue.lifelines.nl/). UGLI data cannot be selected through the online catalogue. The applicant can request UGLI data by stating this in the application form (Appendix: Request for Source Data (Not in catalogue).

Abbreviations

GWAS	Genome Wide Association Study
UGLI	UMCG Genetics Lifelines Initiative
UGLI SC	UGLI steering committee
GSA	Global Screening Array
SNP	Single-nucleotide polymorphism
HW	Hardy-Weinberg Equilibrium
WGS	Whole Genome Sequencing
MAF	Minor allele frequency
PCA	Principle Components Analysis
HPC	High Performance Computing
PLINK	PLINK is a command line program written in C/C++

Publications with UGLI data

Li et al. 2024 Genome-wide Studies Reveal Genetic Risk Factors for Hepatic Fat Content, Genomics, Proteomics & Bioinformatics, 22(2):qzae031
Keaton et al. 2024 Genome-wide analysis in over 1 million individuals of European ancestry yields improved polygenic risk scores for blood pressure traits Nature Genetics. 56(5):778-791
Qiao et al. 2023 Estimation and implications of the genetic architecture of fasting and non-fasting blood glucose Nature Communications. 14(1):451
Warmerdam et al. 2022 Increased genetic contribution to wellbeing during the COVID-19 pandemic PLoS Genetics. 18(5):e1010135
Nolte et al. 2017 Missing heritability: is the gap closing? An analysis of 32 complex traits in the Lifelines Cohort Study Eur J Hum Genet. 25(7):877-885

¹⁾

https://www.illumina.com/content/dam/illumina-marketing/documents/products/datasheets/infinium-commercial-gsa-data-sheet-370-2016-016.pdf

Table of Contents

UGLI

Background

UGLI1 - GSA

Quality Checks

Imputation

SNP array intensity files

Updated version - v3

UGLI2 - Affymetrix

Quality Checks

Imputation

SNP array intensity files

UGLI2+3 - Affymetrix

Quality Checks

Imputation

SNP array intensity files

Overlap between studies

UGLI-data release

Abbreviations

Publications with UGLI data