Abstract
The problem of fitting a straight line to a finite collection of points in the plane is an important problem in statistical estimation. Robust estimators are particularly important because of their lack of sensitivity to outlying data points. The basic measure of the robustness of an estimator is its breakdown point, that is, the fraction (up to 50%) of outlying data points that can corrupt the estimator. Rousseeuw`s least median-of-squares (LMS) regression (line) estimator is among the best known 50% breakdown-point estimators. The best exact algorithms known for this problem run in O(n{sup 2}) time, where n is the number of data points. Because of this high running time, many practitioners prefer to use a simple O(n log n) Monte Carlo algorithm, which is quite efficient but provides no guarantees of accuracy (even probabilistic) unless the data set satisfies certain assumptions. In this paper, we present two algorithms in an attempt to close the gap between theory and practice. The first is a conceptually simple randomized Las Vegas approximation algorithm for LMS, which runs in O(n log n) time. However, this algorithm relies on somewhat complicated data structures to achieve its efficiency. The second is a practical randomized algorithm formore » LMS that uses only simple data structures. It can be run as either an exact or an approximation algorithm. This algorithm runs no slower than O(n{sup 2} log n) time, but we present empirical evidence that its running time on realistic data sets is much better. This algorithm provides an attractive option for practitioners, combining both the efficiency of a Monte Carlo algorithm and guarantees on the accuracy of the result.« less
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.