Abstract

AbstractThe logistic regression estimator is known to inflate the magnitude of its coefficients if the sample size n is small, the dimension p is (moderately) large or the signal-to-noise ratio $$1/\sigma $$ 1 / σ is large (probabilities of observing a label are close to 0 or 1). With this in mind, we study the logistic regression estimator with $$p\ll n/\log n$$ p ≪ n / log n , assuming Gaussian covariates and labels generated by the Gaussian link function, with a mild optimization constraint on the estimator’s length to ensure existence. We provide finite sample guarantees for its direction, which serves as a classifier, and its Euclidean norm, which is an estimator for the signal-to-noise ratio. We distinguish between two regimes. In the low-noise/small-sample regime ($$\sigma \lesssim (p\log n)/n$$ σ ≲ ( p log n ) / n ), we show that the estimator’s direction (and consequentially the classification error) achieve the rate $$(p\log n)/n$$ ( p log n ) / n - up to the log term as if the problem was noiseless. In this case, the norm of the estimator is at least of order $$n/(p\log n)$$ n / ( p log n ) . If instead $$(p\log n)/n\lesssim \sigma \lesssim 1$$ ( p log n ) / n ≲ σ ≲ 1 , the estimator’s direction achieves the rate $$\sqrt{\sigma p\log n/n}$$ σ p log n / n , whereas its norm converges to the true norm at the rate $$\sqrt{p\log n/(n\sigma ^3)}$$ p log n / ( n σ 3 ) . As a corollary, the data are not linearly separable with high probability in this regime. In either regime, logistic regression provides a competitive classifier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call