Human motion detection and gesture recognition using computer vision methods

Liu, Xin

Human motion detection and gesture recognition using computer vision methods

Liu, Xin (2019-02-21)

Avaa tiedosto

isbn978-952-62-2201-1.pdf (6.272Mt)

isbn978-952-62-2201-1_meta.xml (122.7Kt)

isbn978-952-62-2201-1_solr.xml (71.96Kt)

Lataukset:

Liu, Xin

University of Oulu

21.02.2019

Tämä Kohde on tekijänoikeuden ja/tai lähioikeuksien suojaama. Voit käyttää Kohdetta käyttöösi sovellettavan tekijänoikeutta ja lähioikeuksia koskevan lainsäädännön sallimilla tavoilla. Muunlaista käyttöä varten tarvitset oikeudenhaltijoiden luvan.

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:9789526222011

Kuvaus

Academic dissertation to be presented with the assent of the Doctoral Training Committee of Information Technology and Electrical Engineering of the University of Oulu for public defence in the OP auditorium (L10), Linnanmaa, on 8 March 2019, at 12 noon

Tiivistelmä

Abstract

Gestures are present in most daily human activities and automatic gestures analysis is a significant topic with the goal of enabling the interaction between humans and computers as natural as the communication between humans. From a computer vision perspective, a gesture analysis system is typically composed of two stages, the low-level stage for human motion detection and the high-level stage for understanding human gestures. Therefore, this thesis contributes to the research on gesture analysis from two aspects, 1) Detection: human motion segmentation from video sequences, and 2) Understanding: gesture cues extraction and recognition.

In the first part of this thesis, two sparse signal recovery based human motion detection methods are presented. In real videos the foreground (human motions) pixels are often not randomly distributed but have the group properties in both spatial and temporal domains. Based on this observation, a spatio-temporal group sparsity recovery model is proposed, which explicitly consider the foreground pixels’ group clustering priors of spatial coherence and temporal contiguity. Moreover, a pixel should be considered as a multi-channel signal. Namely, if a pixel is equal to the adjacent ones that means all the three RGB coefficients should be equal. Motivated by this observation, a multi-channel fused Lasso regularizer is developed to explore the smoothness of multi-channels signals.

In the second part of this thesis, two human gesture recognition methods are presented to resolve the issue of temporal dynamics, which is crucial to the interpretation of the observed gestures. In the first study, a gesture skeletal sequence is characterized by a trajectory on a Riemannian manifold. Then, a time-warping invariant metric on the Riemannian manifold is proposed. Furthermore, a sparse coding for skeletal trajectories is presented by explicitly considering the labelling information, with the aim to enforcing the discriminant validity of the dictionary. In the second work, based on the observation that a gesture is a time series with distinctly defined phases, a low-rank matrix decomposition model is proposed to build temporal compositions of gestures. In this way, a more appropriate alignment of hidden states for a hidden Markov model can be achieved.

Tiivistelmä

Eleet ovat läsnä useimmissa päivittäisissä ihmisen toiminnoissa. Automaattista eleiden analyysia tarvitaan laitteiden ja ihmisten välisestä vuorovaikutuksesta parantamiseksi ja tavoitteena on yhtä luonnollinen vuorovaikutus kuin ihmisten välinen vuorovaikutus. Konenäön näkökulmasta eleiden analyysijärjestelmä koostuu ihmisen liikkeiden havainnoinnista ja eleiden tunnistamisesta. Tämä väitöskirjatyö edistää eleanalyysin-tutkimusta erityisesti kahdesta näkökulmasta: 1) Havainnointi — ihmisen liikkeiden segmentointi videosekvenssistä. 2) Ymmärtäminen — elemarkkerien erottaminen ja tunnistaminen.

Väitöskirjan ensimmäinen osa esittelee kaksi liikkeen havainnointi menetelmää, jotka perustuvat harvan signaalin rekonstruktioon. Videokuvan etualan (ihmisen liikkeet) pikselit eivät yleensä ole satunnaisesti jakautuneita vaan niillä toisistaan riippuvia ominaisuuksia spatiaali- ja aikatasolla tarkasteltuna. Tähän havaintoon perustuen esitellään spatiaalis-ajallinen harva rekonstruktiomalli, joka käsittää etualan pikseleiden klusteroinnin spatiaalisen koherenssin ja ajallisen jatkuvuuden perusteella. Lisäksi tehdään oletus, että pikseli on monikanavainen signaali (RGB-väriarvot). Pikselin ollessa samankaltainen vieruspikseliensä kanssa myös niiden värikanava-arvot ovat samankaltaisia. Havaintoon nojautuen kehitettiin kanavat yhdistävä lasso-regularisointi, joka mahdollistaa monikanavaisen signaalin tasaisuuden tutkimisen.

Väitöskirjan toisessa osassa esitellään kaksi menetelmää ihmisen eleiden tunnistamiseksi. Menetelmiä voidaan käyttää eleiden ajallisen dynamiikan ongelmien (eleiden nopeuden vaihtelu) ratkaisemiseksi, mikä on ensiarvoisen tärkeää havainnoitujen eleiden oikein tulkitsemiseksi. Ensimmäisessä menetelmässä ele kuvataan luurankomallin liikeratana Riemannin monistossa (Riemannian manifold), joka hyödyntää aikavääristymille sietoista metriikkaa. Lisäksi esitellään harvakoodaus (sparse coding) luurankomallien liikeradoille. Harvakoodaus perustuu nimiöintitietoon, jonka tavoitteena on varmistua koodisanaston keskinäisestä riippumattomuudesta. Toisen menetelmän lähtökohtana on havainto, että ele on ajallinen sarja selkeästi määriteltäviä vaiheita. Vaiheiden yhdistämiseen ehdotetaan matala-asteista matriisihajotelmamallia, jotta piilotilat voidaan sovittaa paremmin Markovin piilomalliin (Hidden Markov Model).

Original papers

Original papers are not included in the electronic version of the dissertation.

Liu, X., Yao, J., Hong, X., Huang, X., Zhou, Z., Qi, C., & Zhao, G. (2018). Background subtraction using spatio-temporal group sparsity recovery. IEEE Transactions on Circuits and Systems for Video Technology. 28(8), 1737–1751. IEEE. https://doi.org/10.1109/TCSVT.2017.2697972
Self-archived version
Liu, X., & Zhao, G. (in press). Background subtraction using multi-channel fused Lasso. In Proceedings of the IS&T 2019 International Symposium on Electronic Imaging (EI 2019). IS&T. https://doi.org/10.2352/ISSN.2470-1173.2019.11.IPAS-269
Self-archived version
Liu, X., & Zhao, G. (2019). 3D skeletal gesture recognition using sparse coding of time-warping invariant Riemannian trajectories. In Proceedings of the 2019 International Conference on Multimedia Modeling (MMM 2019), 678–690. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_56
Self-archived version
Liu, X., Shi, H., Hong, X., Chen, H., Tao, D., & Zhao, G. (2019). Hidden states exploration for 3D skeleton-based gesture recognition. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV 2019). IEEE. Accepted for publication. https://doi.org/10.1109/WACV.2019.00201
Self-archived version

Osajulkaisut

Osajulkaisut eivät sisälly väitöskirjan elektroniseen versioon.

Liu, X., Yao, J., Hong, X., Huang, X., Zhou, Z., Qi, C., & Zhao, G. (2018). Background subtraction using spatio-temporal group sparsity recovery. IEEE Transactions on Circuits and Systems for Video Technology. 28(8), 1737–1751. IEEE. https://doi.org/10.1109/TCSVT.2017.2697972
Rinnakkaistallennettu versio
Liu, X., & Zhao, G. (in press). Background subtraction using multi-channel fused Lasso. In Proceedings of the IS&T 2019 International Symposium on Electronic Imaging (EI 2019). IS&T. https://doi.org/10.2352/ISSN.2470-1173.2019.11.IPAS-269
Rinnakkaistallennettu versio
Liu, X., & Zhao, G. (2019). 3D skeletal gesture recognition using sparse coding of time-warping invariant Riemannian trajectories. In Proceedings of the 2019 International Conference on Multimedia Modeling (MMM 2019), 678–690. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_56
Rinnakkaistallennettu versio
Liu, X., Shi, H., Hong, X., Chen, H., Tao, D., & Zhao, G. (2019). Hidden states exploration for 3D skeleton-based gesture recognition. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV 2019). IEEE. Accepted for publication. https://doi.org/10.1109/WACV.2019.00201
Rinnakkaistallennettu versio

Kokoelmat

Avoin saatavuus [32009]