Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity

Grimm, Dominik G., Azencott, Chloé-Agathe, Aicheler, Fabian, Gieraths, Udo, MacArthur, Daniel G., Samocha, Kaitlin E., Cooper, David Neil ORCID: https://orcid.org/0000-0002-8943-8484, Stenson, Peter Daniel, Daly, Mark J., Smoller, Jordan W., Duncan, Laramie E. and Borgwardt, Karsten M. 2015. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Human Mutation 36 (5) , pp. 513-523. 10.1002/humu.22768

[thumbnail of Grimm_et_al-2015-Human_Mutation.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution Non-commercial.

Download (1MB) | Preview

Abstract

Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Medicine
Subjects: Q Science > QH Natural history > QH426 Genetics
Uncontrolled Keywords: pathogenicity prediction tools; exome sequencing
Publisher: Wiley-Blackwell
ISSN: 1059-7794
Date of First Compliant Deposit: 30 March 2016
Date of Acceptance: 6 February 2015
Last Modified: 19 Jun 2023 16:28
URI: https://orca.cardiff.ac.uk/id/eprint/84063

Citation Data

Cited 194 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics