Abstract

To make support vector machine (SVM) applicable to large-scale data sets, safe screening rules are developed recently. The main idea is to reduce the scale of SVM by safely discarding the redundant training samples. Among existing safe screening rules, the dual screening method with variational inequalities (DVI) and the dynamic screening rule (DSR) based on duality gap are two representative strategies. DVI is efficient, while its safety may be affected by inaccurate solving algorithms. DSR is guaranteed to be safe; however, accurate feasible solutions are required for good efficiency. Based on the above-mentioned studies, in this paper, a two-stage screening (TSS) rule, which fully exploits the advantages of the above-mentioned two approaches and improves their shortcomings, is proposed. First, DVI is applied prior to training for sample screening. It reduces the scale of SVM and, meanwhile, produces a better initial feasible solution for DSR. Then, by embedding DSR into the solving algorithm, the solver becomes more accurate, and the safety of DVI can be strengthened. In the end, for safety guarantee, a postchecking step is added to search the wrongly identified samples and retrain them. To theoretically analyze the safety of DVI, an upper bound of the deviation in DVI is estimated, and a Safe-DVI is given based on it. To ensure the efficiency of TSS, the superiority of DVI over initial DSR is verified. In addition, kernel version of TSS is also given for nonlinear SVM. Numerical experiments on synthetic data sets and 12 real-world data sets verify the efficiency and safety of this TSS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call