Abstract

Supervised machine learning generally requires pre-labelled data. Although there are several open access and pre-annotated datasets available for training machine learning algorithms, most contain a limited number of object classes, which may not be suitable for specific tasks. As previously available pre-annotated data is not usually sufficient for custom models, most of the real world applications require collecting and preparing training data. There is an obvious trade-off between annotation quality and quantity. Time and resources can be allocated for ensuring superior data quality or for increasing the quantity of the annotated data. We test the degree of the detrimental effect caused by the annotation errors. We conclude that while the results deteriorate if annotations are erroneous; the effect - at least while using relatively homogeneous sequential video data - is limited. The benefits from the increased annotated data set size (created by using imperfect auto-annotation methods) outweighs the deterioration caused by annotated data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call