Abstract

Multi-instance learning, a commonly used technique in artificial intelligence for analyzing slides, can be applied to diagnose thyroid cancer based on cytological smears. Since smears do not have multidimensional histological features similar to histopathology, mining potential contextual information and diversity of features is crucial for better classification performance. In this paper, we propose a pyramid multi-loss vision transformer model called PyMLViT, a novel algorithm with two core modules to address these issues. Specifically, we design a pyramid token extraction module to acquire potential contextual information on smears. The pyramid token structure extracts multi-scale local features, and the vision transformer structure further obtains global information through the self-attention mechanism. Furthermore, we construct multi-loss fusion module based on the conventional multi-instance learning framework. With carefully designed bag and patch weight allocation strategies, we incorporate slide-level annotations as pseudo-labels for patches to participate in training, thus enhancing the diversity of supervised information. Extensive experimental results on the real-world dataset show that PyMLViT has a high performance and a competitive number of parameters compared to popular methods for diagnosing thyroid cancer in cytological smears.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.