A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics |
|
Author: | Mohamed, Muhidin1; Oussalah, Mourad2 |
Organizations: |
1Department of Computer Science, EAS, Aston University, Birmingham B4 7ET, UK 2Centre for Ubiquitous Computing, Faculty of Information Technology Computer Science, University of Oulu, P.O. Box 4500, 90014 Oulu, Finland |
Format: | article |
Version: | published version |
Access: | open |
Online Access: | PDF Full Text (PDF, 0.7 MB) |
Persistent link: | http://urn.fi/urn:nbn:fi-fe202001202588 |
Language: | English |
Published: |
Springer Nature,
2020
|
Publish Date: | 2020-01-20 |
Description: |
AbstractIn this paper, we propose a hybrid approach for sentence paraphrase identification. The proposal addresses the problem of evaluating sentence-to-sentence semantic similarity when the sentences contain a set of named-entities. The essence of the proposal is to distinguish the computation of the semantic similarity of named-entity tokens from the rest of the sentence text. More specifically, this is based on the integration of word semantic similarity derived from WordNet taxonomic relations, and named-entity semantic relatedness inferred from Wikipedia entity co-occurrences and underpinned by Normalized Google Distance. In addition, the WordNet similarity measure is enriched with word part-of-speech (PoS) conversion aided with a Categorial Variation database (CatVar), which enhances the lexico-semantics of words. We validated our hybrid approach using two different datasets; Microsoft Research Paraphrase Corpus (MSRPC) and TREC-9 Question Variants. In our empirical evaluation, we showed that our system outperforms baselines and most of the related state-of-the-art systems for paraphrase detection. We also conducted a misidentification analysis to disclose the primary sources of our system errors. see all
|
Series: |
Language resources and evaluation |
ISSN: | 1574-020X |
ISSN-E: | 1574-0218 |
ISSN-L: | 1574-020X |
Volume: | 54 |
Pages: | 457 - 485 |
DOI: | 10.1007/s10579-019-09466-4 |
OADOI: | https://oadoi.org/10.1007/s10579-019-09466-4 |
Type of Publication: |
A1 Journal article – refereed |
Field of Science: |
113 Computer and information sciences |
Subjects: | |
Copyright information: |
© The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
https://creativecommons.org/licenses/by/4.0/ |