SynEva: Evaluating ML Programs by Mirror Program Synthesis

Yi Qin,Chang Xu,Jian Lu,Xiaoxing Ma,Huiyan Wang

doi:10.1109/qrs.2018.00031

Abstract

Machine learning (ML) programs are being widely used in various human-related applications. However, their testing always remains to be a challenging problem, and one can hardly decide whether and how the existing knowledge extracted from training scenarios suit new scenarios. Existing approaches typically have restricted usages due to their assumptions on the availability of an oracle, comparable implementation, or manual inspection efforts. We solve this problem by proposing a novel program synthesis based approach, SynEva, that can systematically construct an oracle-alike mirror program for similarity measurement, and automatically compare it with the existing knowledge on new scenarios to decide how the knowledge suits the new scenarios. SynEva is lightweight and fully automated. Our experimental evaluation with real-world data sets validates SynEva's effectiveness by strong correlation and little overhead results. We expect that SynEva can apply to, and help evaluate, more ML programs for new scenarios.

Full Text