Abstract

The Connectionist Temporal Classification (CTC) loss function [1] enables end-to-end training of a neural network for sequence-to-sequence tasks without the need for prior alignments between the input and output. CTC is traditionally used for training sequential, single-label problems; each element in the sequence has only one class. In this work, we show that CTC is not suitable for multi-label tasks and we present a novel Multi-label Connectionist Temporal Classification (MCTC) loss function for multi-label, sequence-to-sequence classification. Multi-label classes can represent meaningful attributes of a single element; for example, in Optical Music Recognition (OMR), a music note can have separate duration and pitch attributes. Our approach achieves state-of-the-art results on Joint Handwritten Text Recognition and Name Entity Recognition, Asian Character Recognition, and OMR.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call