Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Genetic classification of populations using supervised learning

Bridges, Michael, Heron, Elizabeth A., O'Dushlaine, Colm, Segurado, Ricardo, Morris, Derek, Corvin, Aiden, Gill, Michael, Pinto, Carlos, O'Donovan, Michael Conlon ORCID: https://orcid.org/0000-0001-7073-2379, Kirov, George ORCID: https://orcid.org/0000-0002-3427-3950, Craddock, Nicholas John ORCID: https://orcid.org/0000-0003-2171-0610, Holmans, Peter Alan ORCID: https://orcid.org/0000-0003-0870-9412, Williams, Nigel Melville ORCID: https://orcid.org/0000-0003-1177-6931, Georgieva, Lyudmila, Nikolov, Ivan, Norton, Nadine ORCID: https://orcid.org/0000-0002-3848-4288, Williams, Hywel John ORCID: https://orcid.org/0000-0001-7758-0312, Toncheva, Draga, Milanova, Vihra and Owen, Michael John ORCID: https://orcid.org/0000-0003-4798-0862 2011. Genetic classification of populations using supervised learning. PLoS ONE 6 (5) , e14802. 10.1371/journal.pone.0014802

[thumbnail of Bridges 2011.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (696kB) | Preview

Abstract

There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case–control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available. In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Neuroscience and Mental Health Research Institute (NMHRI)
Medicine
MRC Centre for Neuropsychiatric Genetics and Genomics (CNGG)
Systems Immunity Research Institute (SIURI)
Subjects: Q Science > QH Natural history > QH426 Genetics
R Medicine > R Medicine (General)
Publisher: Public Library of Science
ISSN: 1932-6203
Date of First Compliant Deposit: 30 March 2016
Last Modified: 11 Oct 2023 20:20
URI: https://orca.cardiff.ac.uk/id/eprint/28767

Citation Data

Cited 15 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics