University of Oulu

J. Sui, Z. Liu, L. Liu, A. Jung and X. Li, "Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams," in IEEE Transactions on Cybernetics, doi: 10.1109/TCYB.2020.3023973

Dynamic sparse subspace clustering for evolving high-dimensional data streams

Saved in:
Author: Sui, Jinping1,2; Liu, Zhen1; Liu, Li1,3;
Organizations: 1College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China
2Department of Computer Science, Aalto University, 02150 Espoo, Finland
3Center for Machine Vision and Signal Analysis, University of Oulu, 02150 Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 1.8 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2022022220354
Language: English
Published: Institute of Electrical and Electronics Engineers, 2020
Publish Date: 2022-02-22
Description:

Abstract

In an era of ubiquitous large-scale evolving data streams, data stream clustering (DSC) has received lots of attention because the scale of the data streams far exceeds the ability of expert human analysts. It has been observed that high-dimensional data are usually distributed in a union of low-dimensional subspaces. In this article, we propose a novel sparse representation-based DSC algorithm, called evolutionary dynamic sparse subspace clustering (EDSSC). It can cope with the time-varying nature of subspaces underlying the evolving data streams, such as subspace emergence, disappearance, and recurrence. The proposed EDSSC consists of two phases: 1) static learning and 2) online clustering. During the first phase, a data structure for storing the statistic summary of data streams, called EDSSC summary, is proposed which can better address the dilemma between the two conflicting goals: 1) saving more points for accuracy of subspace clustering (SC) and 2) discarding more points for the efficiency of DSC. By further proposing an algorithm to estimate the subspace number, the proposed EDSSC does not need to know the number of subspaces. In the second phase, a more suitable index, called the average sparsity concentration index (ASCI), is proposed, which dramatically promotes the clustering accuracy compared to the conventionally utilized SCI index. In addition, the subspace evolution detection model based on the Page-Hinkley test is proposed where the appearing, disappearing, and recurring subspaces can be detected and adapted. Extinct experiments on real-world data streams show that the EDSSC outperforms the state-of-the-art online SC approaches.

see all

Series: IEEE transactions on cybernetics
ISSN: 2168-2267
ISSN-E: 2168-2275
ISSN-L: 2168-2267
Issue: In press
Pages: 1 - 14
DOI: 10.1109/TCYB.2020.3023973
OADOI: https://oadoi.org/10.1109/TCYB.2020.3023973
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
213 Electronic, automation and communications engineering, electronics
Subjects:
Funding: This work was supported in part by the National Natural Science Foundation of China under Grant 62022091, Grant 61921001, Grant 61872379, Grant 71701205, and Grant 91846301; in part by the Hunan Science and Technology Plan Project under Grant 2019GK2131; and in part by the Academy of Finland under Grant 331883.
Academy of Finland Grant Number: 331883
Detailed Information: 331883 (Academy of Finland Funding decision)
Copyright information: © The Author(s) 2021. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
  https://creativecommons.org/licenses/by/4.0/