Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

AVADA: toward automated pathogenic variant evidence retrieval directly from the full-text literature

Birgmeier, Johannes, Deisseroth, Cole A., Hayward, Laura E., Galhardo, Luisa M. T., Tierno, Andrew P., Jagadeesh, Karthik A., Stenson, Peter D., Cooper, David N. ORCID: https://orcid.org/0000-0002-8943-8484, Bernstein, Jonathan A., Haeussler, Maximilian and Bejerano, Gill 2020. AVADA: toward automated pathogenic variant evidence retrieval directly from the full-text literature. Genetics in Medicine 22 (2) , pp. 362-370. 10.1038/s41436-019-0643-6

[thumbnail of COOPER, David - AVADA toward automated pathogenic variant.pdf]
Preview
PDF - Accepted Post-Print Version
Download (1MB) | Preview

Abstract

Purpose: Both monogenic pathogenic variant cataloging and clinical patient diagnosis start with variant-level evidence retrieval followed by expert evidence integration in search of diagnostic variants and genes. Here, we try to accelerate pathogenic variant evidence retrieval by an automatic approach. Methods: Automatic VAriant evidence DAtabase (AVADA) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic genetic variant evidence in full-text primary literature about monogenic disease and convert it to genomic coordinates. Results AVADA automatically retrieved almost 60% of likely disease-causing variants deposited in the Human Gene Mutation Database (HGMD), a 4.4-fold improvement over the current best open source automated variant extractor. AVADA contains over 60,000 likely disease-causing variants that are in HGMD but not in ClinVar. AVADA also highlights the challenges of automated variant mapping and pathogenicity curation. However, when combined with manual validation, on 245 diagnosed patients, AVADA provides valuable evidence for an additional 18 diagnostic variants, on top of ClinVar’s 21, versus only 2 using the best current automated approach. Conclusion : AVADA advances automated retrieval of pathogenic monogenic variant evidence from full-text literature. Far from perfect, but much faster than PubMed/Google Scholar search, careful curation of AVADA-retrieved evidence can aid both database curation and patient diagnosis.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Medicine
Publisher: Springer Nature
ISSN: 1098-3600
Date of First Compliant Deposit: 12 September 2019
Date of Acceptance: 13 August 2019
Last Modified: 18 Nov 2023 16:46
URI: https://orca.cardiff.ac.uk/id/eprint/125422

Citation Data

Cited 17 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics