Abstract
Several epidemiological studies demonstrated short-term associations between high levels of pollution and increased acute mortality and morbidity. Vehicles emissions are an important source of environmental pollution, so it’s necessary to estimate pollution emissions caused by classes of vehicles in different situations (traffic, road, etc) in order to reduce environmental pollution. The analysis is based on a research developed by the Italian National Research Council (CNR), concerning the relationship between the pollutants produced by auto vehicles and the kinematics parameters, considering different traffic and road situations (driving cycles). The model, based on the vehicle dynamic equation, shows variables strongly correlated, missing data and few observations, so the most proper statistic methodology to analyse the data with these characteristics is the Partial Least Squares (PLS) regression. The results of the CNR analysis showed as the different driving cycles (traffic, road, etc) can produce outliers, because of the different kinematics variables generated. The aim of this thesis is to analyse the proposed model taking into account the outliers by applying a robust approach to the PLS regression. We proceed in the following way. In the first chapter we show that the presence of multicollinearity between the independent variables in regression analysis yields Ordinary Least Squares (OLS) inapplicable, so we have to use other technique, like Ridge Regression, Principal Component Regression, Latent Root regression Analysis, Partial Least Squares (PLS) regression. It’s has been stated that in a lot of cases PLS is the better solution. However the results are affected by outliers. In the second chapter we describe the most important robust methods for estimating the regression parameters and variance/covariance matrix. Unfortunately several affine equivariant estimators with high breakdown point can not be applied when the number of units is smaller than the number of variables. Therefore we propose an approach which combines “leave-one-out” methods and Singular Value Decomposition (SVD). We call this method SSVD. In the third chapter we show that both the algorithms for PLS regression: NIPALS and SIMPLS are affected by outliers. SIMPLS algorithm’s sensitivity to outliers is due to use of cross-covariance matrix between independent and dependent variables as well as and the use of least squares regressions. The NIPALS algorithm’s sensitivity to outliers is due to use of least squares regressions. There are two ways to solve the problem of outliers. The first is to use regression diagnostic to detect outliers. For the multivariate nature of the data, it can be very difficult to detect outliers. The second is to use a robust procedure for PLS regression. Several procedures have been proposed, but evidence of their use in the statistical literature is still scarce. A first class of robust alternatives for PLS regression involves the application of robust regression to the NIPALS algorithm. A second class includes methods which use a robust cross-covariance matrix and a robust regression method. We describe the different robust alternatives, their advantages and disadvantages, propose a robust approach and end with a simulation study. In the fourth chapter we apply some robust methods for PLS regression and our approach on environmental data of CNR in order to compare the results and show that our approach is a valid alternative in presence of multicollinearity and outliers.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.