Abstract

We describe the R <b>np</b> package via a series of applications that may be of interest to applied econometricians. The np package implements a variety of nonparametric and semiparametric kernel-based estimators that are popular among econometricians. There are also procedures for nonparametric tests of significance and consistent model specification tests for parametric mean regression models and parametric quantile regression models, among others. The <b>np</b> package focuses on kernel methods appropriate for the mix of continuous, discrete, and categorical data often found in applied settings. Data-driven methods of bandwidth selection are emphasized throughout, though we caution the user that data-driven bandwidth selection methods can be computationally demanding.

Highlights

  • Devotees of R (R Development Core Team 2008) are likely to be aware of a number of nonparametric kernel1 smoothing methods that exist in R base and in certain R packages

  • One approach towards handling the presence of both continuous and categorical data is called a ‘frequency’ approach, whereby data is broken up into subsets (‘cells’) corresponding to the values assumed by the categorical variables, and only do you apply say density or locpoly to the continuous data remaining in each cell

  • The np package offers users of R a variety of nonparametric and semiparametric kernel-based methods that are capable of handling the mix of categorical and continuous data typically encountered by applied researchers

Read more

Summary

Introduction

Devotees of R (R Development Core Team 2008) are likely to be aware of a number of nonparametric kernel smoothing methods that exist in R base (e.g., density) and in certain R packages (e.g., locpoly in the KernSmooth package Wand and Ripley 2008). In applied settings we often encounter a combination of categorical and continuous datatypes Those familiar with traditional nonparametric kernel smoothing methods will appreciate that. Recent theoretical developments offer practitioners a variety of kernel-based methods for categorical data only (i.e., unordered and ordered factors), or for a mix of continuous and categorical data These methods have the potential to recapture the efficiency losses associated with nonparametric frequency approaches as they do not rely on sample splitting, rather, they smooth the categorical variables in an appropriate manner; see Li and Racine (2007a) and the references therein for an in-depth treatment of these methods, and see the references listed in the bibliography. The np package implements recently developed kernel methods that seamlessly handle the mix of continuous, unordered, and ordered factor datatypes often found in applied settings.

Important implementation details
The primacy of the bandwidth
Data-driven bandwidth selection methods
Interacting with np functions
Writing your own functions
Generalized product kernels
Nonparametric regression
Univariate regression
Multivariate regression with qualitative and quantitative data
Nonparametric binary outcome and count data models
Nonparametric unconditional PDF and CDF estimation
Nonparametric conditional PDF and CDF estimation
Nonparametric quantile regression
Semiparametric partially linear models
Semiparametric single-index models
10. Semiparametric varying coefficient models
11. Writing your own kernel-based functions
12. A parallel implementation
Findings
13. Summary

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.