Проактивная разметка примеров для адаптации к домену

M.A Ryndin,D.Y Turdakov

doi:10.15514/ispras-2019-31(5)-11

Abstract

Getting tagged data is an expensive and time-consuming process. There are several approaches to how to reduce the number of examples needed for training. For example, the methods used in active learning are aimed at choosing only the most difficult examples for marking. Using active learning allows to achieve results similar to supervised learning, using much less labeled data. However, such methods are often dispersive and highly dependent on the choice of the initial approximation, and the optimal strategies for choosing examples for marking up either depend on the type of classifier or are computationally complex. Another approach is domain adaptation. Most of the approaches in this area are unsupervised and are based on approximating the distribution of data in domains by solving the problem of optimal transfer or extraction of domain-independent features. Supervised learning approaches are not resistant to changes in the distribution of the target variable. This is one of the reasons why the task of semis-supervised domain adaptation is posed: there are labeled data in the source domain, a lot of unlabeled data in the target domain and the ability to get labels for some of the data from the target domain. In this work, we show how proactive labeling can help transfer knowledge from one source domain to a different but relative target domain. We propose to use a machine learning model trained on source domain as a free fallible oracle. This oracle can determine complexity of a training example to make several decisions. First, this example should be added to training dataset. Second, do we have enough knowldge learnt from source to label this example ourself or we need to call a trusted expert? We present an algorithm that utilize this ideas and one of its features is ability to work with any classifier that has probabilistic interpretation of its outputs. Experimental evaluation on Amazon review dataset establish the effectiveness of proposed method.

Highlights

Получение размеченных данных – дорогостоящий и трудозатратный процесс
Примеры из домена-источника Целевая переменная для примеров из домена-источника Примеры из целевого домена Модель, построенная на данных из домена-источника Целевая модель Платный оракул
При каждом запуске алгоритма начальные значения генератора псевдослучайных чисел для каждого из источников изменялись

Summary

Введение

Существует несколько подходов к тому, как снизить количество примеров, необходимых для обучения. Использование активного обучения позволяет добиться результатов, аналогичных обучению с учителем, используя намного меньше размеченных данных. Еще один открытой проблемой является активная разметка примеров несколькими слабоквалифицированными аннотаторами – объединение идей активного обучения и краудсорсинга [3]. Это является одной из причин, по которой ставится задача адаптации с учителем: имеются размеченные данные в домене-источнике, много неразмеченных данных в целевом домене и возможность получить метки для части данных из целевого домена. Для решения этой задачи естественным выглядит объединение методов активного обучения и адаптации к домену [7]. Одной из проблем приведённых исследований является зависимость алгоритма от типа модели машинного обучения. В то время как использование нелинейных моделей обычно позволяет добиться лучших результатов. Также большинство алгоритмов активного обучения предполагают многочисленное обучение модели на вновь выбранных для разметки данных.

Общая схема предлагаемого решения

Модели оракулов

Алгоритм проактивного выбора примеров и разметки

4: Считаем y

Набор данных

Используемая модель машинного обучения

Результаты

Заключение

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Проактивная разметка примеров для адаптации к домену

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS

Lead the way for us

Journal: Proceedings of the Institute for System Programming of the RAS	Publication Date: Jan 1, 2019
License type: cc-by

Similar Papers

Low-Cost Object Detection Models for Traffic Control Devices through Domain Adaption of Geographical Regions
Dahyun Oh ... Sungchul Seo
Remote sensing | VOL. 15
Dahyun Oh, et. al.Dahyun Oh ... Sungchul Seo
15 May 2023
Remote sensing | VOL. 15

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition
Cheng Lu ... Yuan Zong
Entropy | VOL. 24
Cheng Lu, et. al.Cheng Lu ... Yuan Zong
29 Jul 2022
Entropy | VOL. 24

Source Free Domain Adaptation via Combined Discriminative GAN Model for Image Classification
Yujie Liu ... Xing Wei
-
Yujie Liu, et. al.Yujie Liu ... Xing Wei
18 Jul 2022
18 Jul 2022

A Feature and Parameter Selection Approach for Visual Domain Adaptation using Particle Swarm Optimization
Ravi Ranjan Prasad Karn ... Twinkle Sharma
-
Ravi Ranjan Prasad Karn, et. al.Ravi Ranjan Prasad Karn ... Twinkle Sharma
18 Jul 2022
18 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Проактивная разметка примеров для адаптации к домену

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS