University of Oulu

Raisuddin, A.M., Vaattovaara, E., Nevalainen, M. et al. Critical evaluation of deep neural networks for wrist fracture detection. Sci Rep 11, 6006 (2021). https://doi.org/10.1038/s41598-021-85570-2

Critical evaluation of deep neural networks for wrist fracture detection

Saved in:
Author: Raisuddin, Abu Mohammed1; Vaattovaara, Elias1,2; Nevalainen, Mika1,2;
Organizations: 1University of Oulu, Oulu, Finland
2Oulu University Hospital, Oulu, Finland
3City of Oulu, Oulu, Finland
4Ailean Technologies Oy, Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 1.3 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2021042311496
Language: English
Published: Springer Nature, 2021
Publish Date: 2021-04-23
Description:

Abstract

Wrist Fracture is the most common type of fracture with a high incidence rate. Conventional radiography (i.e. X-ray imaging) is used for wrist fracture detection routinely, but occasionally fracture delineation poses issues and an additional confirmation by computed tomography (CT) is needed for diagnosis. Recent advances in the field of Deep Learning (DL), a subfield of Artificial Intelligence (AI), have shown that wrist fracture detection can be automated using Convolutional Neural Networks. However, previous studies did not pay close attention to the difficult cases which can only be confirmed via CT imaging. In this study, we have developed and analyzed a state-of-the-art DL-based pipeline for wrist (distal radius) fracture detection—DeepWrist, and evaluated it against one general population test set, and one challenging test set comprising only cases requiring confirmation by CT. Our results reveal that a typical state-of-the-art approach, such as DeepWrist, while having a near-perfect performance on the general independent test set, has a substantially lower performance on the challenging test set—average precision of 0.99 (0.99–0.99) versus 0.64 (0.46–0.83), respectively. Similarly, the area under the ROC curve was of 0.99 (0.98–0.99) versus 0.84 (0.72–0.93), respectively. Our findings highlight the importance of a meticulous analysis of DL-based models before clinical use, and unearth the need for more challenging settings for testing medical AI systems.

see all

Series: Scientific reports
ISSN: 2045-2322
ISSN-E: 2045-2322
ISSN-L: 2045-2322
Volume: 11
Issue: 1
Article number: 6006
DOI: 10.1038/s41598-021-85570-2
OADOI: https://oadoi.org/10.1038/s41598-021-85570-2
Type of Publication: A1 Journal article – refereed
Field of Science: 3126 Surgery, anesthesiology, intensive care, radiology
Subjects:
Funding: This project was supported by the internal funds of the Research Unit of Medical Imaging, Physics and Technology, University of Oulu.
Copyright information: © The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
  https://creativecommons.org/licenses/by/4.0/