Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

SenseDefs: a multilingual corpus of semantically annotated textual definitions

Camacho-Collados, Jose, Delli Bovi, Claudio, Raganato, Alessandro and Navigli, Roberto 2018. SenseDefs: a multilingual corpus of semantically annotated textual definitions. Language Resources and Evaluation , -. 10.1007/s10579-018-9421-3

[thumbnail of Camacho-Collados2018_Article_SEnseDEfsAMultilingualCorpusOf.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution Non-commercial.

Download (1MB) | Preview

Abstract

Definitional knowledge has proved to be essential in various Natural Language Processing tasks and applications, especially when information at the level of word senses is exploited. However, the few sense-annotated corpora of textual definitions available to date are of limited size: this is mainly due to the expensive and time-consuming process of annotating a wide variety of word senses and entity mentions at a reasonably high scale. In this paper we present SenseDefs, a large-scale high-quality corpus of disambiguated definitions (or glosses) in multiple languages, comprising sense annotations of both concepts and named entities from a wide-coverage unified sense inventory. Our approach for the construction and disambiguation of this corpus builds upon the structure of a large multilingual semantic network and a state-of-the-art disambiguation system: first, we gather complementary information of equivalent definitions across different languages to provide context for disambiguation; then we refine the disambiguation output with a distributional approach based on semantic similarity. As a result, we obtain a multilingual corpus of textual definitions featuring over 38 million definitions in 263 languages, and we publicly release it to the research community. We assess the quality of SenseDefs’s sense annotations both intrinsically and extrinsically on Open Information Extraction and Sense Clustering tasks.

Item Type: Article
Date Type: Published Online
Status: Published
Schools: Computer Science & Informatics
Publisher: Springer Verlag (Germany)
ISSN: 1574-020X
Date of First Compliant Deposit: 9 August 2018
Date of Acceptance: 23 July 2018
Last Modified: 04 May 2023 09:49
URI: https://orca.cardiff.ac.uk/id/eprint/114012

Citation Data

Cited 4 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics