University of Oulu

P. Kostakos, "Strings and Things: A Semantic Search Engine for news quotes using Named Entity Recognition," 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020, pp. 835-839, doi: 10.1109/ASONAM49781.2020.9381383

Strings and things : a semantic search engine for news quotes using named entity recognition

Saved in:
Author: Kostakos, Panos1
Organizations: 1Center of Ubiquitus Computing, University of Oulu, Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 0.8 MB)
Persistent link:
Language: English
Published: IEEE Computer Society, 2021
Publish Date: 2021-05-12


Emerging methods for content delivery such as quote-searching and entity-searching, enable users to quickly identify novel and relevant information from unstructured texts, news articles, and media sources. These methods have widespread applications in web surveillance and crime informatics, and can help improve intention disambiguation, character evaluation, threat analysis, and bias detection. Furthermore, quote-based and entity-based searching is also an empowering information retrieval tool that can enable non-technical users to gauge the quality of public discourse, allowing for more fine-grained analysis of core sociological questions. The paper presents a prototype search engine that allows users to search a news database containing quotes using a combination of strings and things. The ingestion pipeline, which forms the backend of the service, comprises of the following modules i) a crawler that ingests data from the GDELT Global Quotation Graph ii) a named entity recognition (NER) filter that labels data on the fly iii) an indexing mechanism that serves the data to an Elasticsearch cluster and iv) a user interface that allows users to formulate queries. The paper presents the high-level configuration of the pipeline and reports basic metrics and aggregations.

see all

Series: International Conference on Advances in Social Network Analysis and Mining
ISSN: 2473-991X
ISSN-E: 2473-991X
ISSN-L: 2473-991X
ISBN: 978-1-7281-1056-1
ISBN Print: 978-1-7281-1057-8
Pages: 835 - 839
Article number: 9381383
DOI: 10.1109/ASONAM49781.2020.9381383
Host publication: 12th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2020
Host publication editor: Atzmuller, M.
Coscia, M.
Missaoui, R.
Conference: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
Funding: Partly funded by European Commission grants CUTLER (770469) and PRINCE (815362), and by Academy of Finland 6Genesis Flagship (318927).
EU Grant Number: (770469) CUTLER - Coastal Urban developmenT through the LEnses of Resiliency
Academy of Finland Grant Number: 318927
Detailed Information: 318927 (Academy of Finland Funding decision)
Copyright information: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.