Abstract
This study pertains to the development of a smartphone mixed-reality (MR) educational app intended to improve the experience of using physical 3D models in classrooms by identifying and labeling various anatomical features on models, built on a deep-learning based computer vision framework. Research at the intersection of MR applications and anatomy education has routinely demonstrated a role for new MR-based modalities in improving anatomy education, but most MR apps rely on custom illustrated projections of 3D-models into user and screen space. These virtual assets are subject to device-intrinsic or developer-based differences in display fidelity and specimen art quality. An intrinsic barrier is present in the development of digital 3D models, which are not trivial to create. Existing evidence also suggests that virtual models may produce inferior results for learning in some use-cases compared to existing physical models. Such evidence forms a case to instead place emphasis on improving the experience of using existing physical models. The described application is hence intended to improve the experience of using real models by labeling anatomical features of interest for the user. The current implementation of the application is trained solely on skull-base anatomy (with class labels including selected bones of the calvarium, paranasal sinuses, and skull processes), but may be extended to other anatomical areas of interest. This labeling is made possible by a hybrid depth-estimation and semantic segmentation-focused machine learning (ML) architecture, which is deployed on consumer-grade smartphones to promote student uptake. When creating ML-based tools, the primary barrier is often the generation of quality ground truth data. Image collections of anatomical specimens must ideally be taken under different conditions, with different augmentations applied to images to improve the ability of the application to robustly recognize and label different parts of a specimen. Manual collection and annotation of the hundreds or thousands of such images required for training is infeasible. However, using procedurally generated images from a 3D-modelled skull (here, in the open-source computer graphics software Blender), developers can produce arbitrarily large, photorealistic ground-truth training datasets with pixel-perfect semantic segmentation of anatomical features. The generalizability of the ML classifier to different models was improved through augmentation of individual renders in Blender by randomizing the model textures; lighting; background environments; skull topology via displacement mapping; the use of “distractor” objects in renders; and camera angles. This work demonstrates the feasibility of developing an anatomical landmark classifier from RGB-image data, trained on fully synthetic data. Future steps include optimization of data augmentations to emphasize shape recognition over texture recognition, formal characterization of segmentation accuracy on cadaveric specimens, and to train alternative models to incorporate depth data, to leverage depth-sensing capabilities that are available on select higher-end mobile devices.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have