Articles published on Semiparametric regression
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1531 Search results
Sort by Recency
- New
- Research Article
- 10.1371/journal.pone.0338425
- Dec 5, 2025
- PLOS One
- Jan Porthun + 1 more
IntroductionSurvival time models are commonly employed in medicine and health sciences when analysing data. In these time-to-event analyses, it is often necessary to dichotomise variables that are metrically measured. One example could be to assign patients to different risk groups based on an occurring event. Besides univariable methods, multivariable approaches also exist for establishing cutpoints. Up to now, these multivariable approaches have hardly been investigated.MethodsUsing a Monte Carlo simulation study, we analysed eight multivariable methods from the literature to establish a cutpoint of a biomarker in the context of a semiparametric Cox regression model. The methods are the following: maximising the chi-square statistic, maximising the chi-square statistic with a split-sample approach, maximising the c-index using either the AddFor- or Genetic algorithm, maximising the concordance probability estimator (CPE) with the AddFor- or Genetic algorithm, and minimising the Akaike information criterion (AIC). We compared these methods with each other and in addition with the univariable log-rank minimum p-value approach. The simulation parameters analysed included the cutpoint’s distance from the biomarker’s median, sample size, total censoring, censoring before the end of the follow-up time (drop-outs), and the survival time distribution. Bias and empirical standard error were used as the primary performance measures. Furthermore, each method is illustrated using two practical data examples.ResultsAll analysed methods are biased towards the biomarker’s median. Multivariable methods that estimate the cutpoint by using the lowest AIC or the maximum of the chi-square statistic have the lowest bias and empirical standard error in most simulation scenarios. The difference in bias between the methods based on maximising the c-index or maximising the CPE is minimal. Regardless of the distribution used (Weibull, Gompertz, or exponential), the respective bias shows similar dependencies on the simulation parameters.ConclusionsMultivariable methods to estimate a biomarker’s cutpoint in survival time analyses using the Cox regression model may represent a good alternative to univariable methods. Our simulation has shown that methods maximising the chi-square statistic or minimising the AIC, respectively, perform better than the univariable method using the minimum p-value approach and outperform multivariable methods based on the c-index or CPE.
- New
- Research Article
- 10.32674/z0ng7c58
- Nov 28, 2025
- American Journal of STEM Education
- Wilberforce Jahonga
This paper explores the relationship between the level of STEM academic programs and job search duration for graduates of national polytechnics in Kenya. Specifically, it examines how different certification levels—Artisan, Craft, Diploma, and Higher Diploma—impact the time it takes for graduates to secure employment. focused on the 2016 cohort of graduates from selected national polytechnics. Using stratified sampling and simple random sampling, a sample of 1834 respondents was drawn from a target population of 21,151. The study employed a semi parametric Cox regression survival analysis. The findings indicated that graduates with an Artisan certificate had a median employment time of 60.92 months, Craft certificate holders 50.62 months, Diploma holders 31.93 months, and Higher Diploma holders 47.93 months. However, these were not statistically significantly different. When controlling for number of applications, gender, student geographical mobility, course duration, and exam grade, the academic qualifications were statistically significant for all the levels with reference to the artisan level. The paper concludes that geographic mobility, academic performance, and extended training can positively influence employment outcomes by increasing their chances of securing a job more quickly after graduation.
- New
- Research Article
- 10.30598/barekengvol20iss1pp0255-0270
- Nov 24, 2025
- BAREKENG: Jurnal Ilmu Matematika dan Terapan
- Bambang Widjanarko Otok + 6 more
Diabetes mellitus is a chronic disease with a rising global prevalence, including in Indonesia. Early detection and accurate modeling are crucial for effective prevention and management. Binary Logistic Regression (BLR) is commonly used for binary outcome modeling; however, in practice, the relationship between binary outcomes and continuous predictors is often nonlinear, making BLR less suitable. To address these limitations, alternative methods such as Binary Probit Regression (BPR) and Flexible Semiparametric Nonlinear Binary Logistic Regression (FSNBLR) have been developed. This study aims to compare the performance of BLR, BPR, and FSNBLR models in classifying diabetes mellitus cases at Hajj General Hospital Surabaya. All three models were estimated using the Maximum Likelihood Estimation (MLE) method. Since the resulting estimators do not have closed-form solutions, numerical iteration using the Newton-Raphson method was applied. Model performance was assessed using Area Under the Curve (AUC), accuracy, sensitivity, and specificity. The FSNBLR model outperformed both the BLR and BPR models. It achieved the highest AUC value of 81.86%, while BLR (66.30%) and BPR (66.30%). That is indicated FSNBLR superior discriminative ability. In addition, the FSNBLR model recorded higher accuracy, sensitivity, and specificity compared to the other two models. The FSNBLR model demonstrated better predictive performance in identifying diabetes mellitus cases, especially in scenarios involving nonlinear relationships between predictors and the outcome variable. These findings suggest that flexible semiparametric approaches offer greater effectiveness in medical classification tasks, particularly for chronic conditions like diabetes mellitus.
- New
- Research Article
- 10.1515/ijb-2024-0011
- Nov 20, 2025
- The international journal of biostatistics
- S Ejaz Ahmed + 2 more
This paper considers semiparametric estimation strategies for the nonlinear semiparametric regression model (NSRM) under the sparsity assumption by modifying the Gauss-Newton method for both low- and high-dimensional data scenarios. In the low-dimensional case, coefficients are partitioned into two parts that represent nonzero (strong signals) and sparse coefficients. In the high-dimensional case, a weighted-ridge approach is employed, and coefficients are partitioned into three parts, adding weak signals as well. Shrinkage estimators are then obtained in both cases. More importantly, in this paper, we assume that a nonlinear structure is present in the parametric component of the model, which makes the direct application of penalized least squares to the NSRM impossible. To solve this problem, we employ the iterative Gauss-Newton method to obtain the final NSRM estimators. We provide both theoretical and practical details for the suggested estimators. Asymptotic results are derived for both low- and high-dimensional cases. We conduct an extensive simulation study to evaluate the performance of the estimators in a practical setting. Moreover, we substantiate our findings with data examples from two distinct breast cancer datasets: the Breast Cancer in the United States (BCUS) and Wisconsin datasets. By demonstrating the effectiveness of our introduced estimators in these particular biostatistical contexts, our numerical study provides support for the theoretical efficacy of shrinkage estimators, suggesting their potential relevance to breast cancer research and biostatistical methodologies.
- Research Article
- 10.1177/15741699251390496
- Oct 30, 2025
- Model Assisted Statistics and Applications
- Sthitadhi Das
Semiparametric regression models provide a powerful framework that combines the parametric and nonparametric paradigms, particularly effective for analyzing complex data structures. In practical scenarios, missing data is a pervasive issue that complicates statistical inference. This paper addresses semiparametric estimation when the response variable is subject to missingness under the Missing at Random (MAR) mechanism. We develop a kernel-based estimation strategy for the nonparametric component and employ partial regression methods—specifically, an adaptation of Robinson’s approach—to estimate the parametric part. The estimation procedure incorporates inverse probability weighting and nonparametric imputation to account for missing responses. Theoretical properties such as asymptotic bias, consistency, and variance are derived. The methodology is validated through two real-data analyses using the Abalone and Airfoil Self-Noise datasets, where missingness is artificially induced, demonstrating the effectiveness of the proposed strategy in preserving estimation accuracy. Our results underline the robustness and flexibility of semiparametric models in the presence of incomplete data.
- Research Article
- 10.1111/sjos.70027
- Oct 27, 2025
- Scandinavian Journal of Statistics
- Jose Ameijeiras‐Alonso + 1 more
ABSTRACT A regression model for a circular response variable depending on a linear or a circular predictor is presented in this paper. The conditional density belongs to a parametric flexible family that allows for asymmetry and varying peakedness around the modal direction. The modal direction and concentration depend on the covariate and are nonparametrically modeled via local polynomial fitting with a kernel weight. The asymptotic normality of the estimators for the conditional modal direction and concentration is established. Furthermore, from these theoretical results, the expression of the optimal smoothing parameter and a proposed data‐driven estimator are derived. An application concerns the orientation of migratory birds according to the flight altitude and the wind direction.
- Research Article
- 10.1177/00080683251374811
- Oct 24, 2025
- Calcutta Statistical Association Bulletin
- Mohamed R Abonazel + 2 more
This article proposes two estimators for two semiparametric count regression models, namely semiparametric partially Poisson (SPPO) and semiparametric partially zero-inflated Poisson (SPZIP), via the penalized smoothing (Ps) spline and P-spline (Pb) estimations to address the common issue of nonparametric relationships between the response variable and covariates. Additionally, the SPZIP model incorporates a zero-inflation component to handle excess zeros in count data. Through extensive Monte Carlo simulations, we rigorously evaluate the performance of the proposed penalized spline estimators by comparing them against traditional parametric estimators using multiple statistical criteria, including the Akaike information criterion, Bayesian information criterion, deviance statistic, mean squared error and root mean squared error (RMSE). The results indicate that our estimators are more efficient than other estimators. Also, the SPZIP and SPPO models consistently outperform parametric (Poisson and zero-inflated Poisson) regression models, particularly in scenarios with high levels of zero inflation, demonstrating their superior ability to model complex data structures. Our findings highlight the practical utility of these models for analyzing complex count data with excess zeros and nonparametric covariate effects. A real-life data application further demonstrates the capabilities of the SPPO and SPZIP models, demonstrating their ability to provide more accurate and adaptable statistical analysis in challenging data settings. AMS Subject Classification: 62G08, 62J20, 62J05
- Research Article
- 10.19139/soic-2310-5070-2704
- Oct 14, 2025
- Statistics, Optimization & Information Computing
- Any Tsalasatul Fitriyah + 4 more
The Least Squares Spline (LS-Spline) method offers a flexible approach for modeling fluctuating time series data by adaptively positioning knots at points of structural change. This study develops an LS-Spline estimation method for the Semiparametric Time Series Regression (STSR) model, combining an autoregressive structure as the parametric component and multiple nonparametric functions to capture nonlinear effects. The model is applied to predict the Indonesia Composite Index (ICI), a key indicator of sustainable economic growth. In this framework, the ICI at lag-1 is modeled parametrically, while the BI Rate and Inflation are modeled nonparametrically. Four data splitting schemes 6, 12, 18, and 24 months of testing data are used to evaluate forecasting performance over short, medium, and long term horizons. Results show that the LS-Spline STSR model consistently achieves high predictive accuracy, with MAPE and sMAPE below 10\% and MASE below 1. Residual diagnostics using ACF and PACF confirm that the model satisfies the white noise assumption. These findings emphasize the potential of the LS-Spline STSR model as an economic forecasting tool that can support policies related to one of poin Sustainable Development Goals (SDGs), namely sustainable economic growth.
- Research Article
- 10.1080/03610926.2025.2571662
- Oct 8, 2025
- Communications in Statistics - Theory and Methods
- Nur Farahiyah Che Lah + 3 more
Multicollinearity in logistic semiparametric regression models inflates the variance of maximum likelihood estimators, leading to unreliable parameter estimates. While these models offer flexibility by combining parametric and nonparametric components, they are sensitive to multicollinearity. To address this, we propose a restricted ridge estimator for logistic semiparametric regression models with exact linear restrictions, extending the method introduced by Asar (2017) for standard logistic regression. This approach effectively reduces variance and provides more stable estimates. Additionally, we use generalized cross-validation (GCV) to select the optimal ridge penalty and kernel bandwidth, ensuring a balanced bias-variance tradeoff for both components. We validate the proposed method through simulation and real-data applications. The simulation results show that the restricted ridge estimator outperforms traditional maximum likelihood estimation (MLE) in terms of stability and accuracy under multicollinearity. The real-data application demonstrates its practical advantages in producing reliable and interpretable estimates.
- Research Article
- 10.30598/barekengvol19iss4pp2597-2608
- Sep 1, 2025
- BAREKENG: Jurnal Ilmu Matematika dan Terapan
- Sri Sulistijowati Handajani + 5 more
Climate change can affect rice production through changes in temperature, precipitation patterns, extreme weather events, and atmospheric carbon dioxide levels. A statistical model can be used to understand the correlation between rice production and factors that affect it. The existence of some patterns that are formed from independent variables and others that do not show data patterns due to volatility in weather element data makes semiparametric regression modeling more appropriate. In forming a parametric model, the data pattern needs to be regular to make the model more precise. Irregular data patterns are more appropriately modeled with nonparametric regression models. The existence of several patterns formed from independent variables to their dependent variables, and several others, does not show a particular pattern due to the volatility in climate data, making truncated spline semiparametric regression modeling more appropriate to use. This research aims to model rice production in several regions in East Java Province in 2022 using a semiparametric regression model. The data used were from the Meteorology, Climatology, and Geophysics Agency and the Central Statistics Agency for East Java Province in 2022. The response variable is the rice production (tons) in 2022 in Tuban, Gresik, Nganjuk, Malang, Banyuwangi, and Pasuruan Regency (Y). The predictor variables are paddy harvested area (hectares), average temperature (℃), humidity (percent), and rainfall (mm). The semi-parametric spline truncated regression model is obtained by combining the parametric and non-parametric models based on truncated splines. The analysis showed a spline truncated semiparametric regression model with a combination of knot points (3,3,1) with a minimum GCV value of 12,642,272. The variables significantly affecting rice production were rice harvest area, temperature, air humidity, and rainfall, with an adjusted value of 98.522%.
- Research Article
- 10.1080/02664763.2025.2541252
- Aug 27, 2025
- Journal of Applied Statistics
- Mahdi Roozbeh
Binary logistic semiparametric regression analysis is a commonly used statistical technique when the dependent variable is dichotomous or binary. In this analysis, the relationship between the success probability and certain explanatory variables is assumed to have a linear form, while the relationship to other variables is unknown. Multicollinearity is a serious problem that arises when explanatory variables in logistic semiparametric regression are highly correlated. It is well known that the variance of the maximum likelihood estimator is inflated due to multicollinearity in the semiparametric logistic regression model. Therefore, a novel stochastic restricted iterative weighted ridge estimator for logistic semiparametric regression is introduced, and its statistical properties are extracted asymptotically. Moreover, an extension of the generalized cross validation (GCV) function is introduced and applied for choosing the best values of the ridge parameter and the bandwidth of the kernel smoother. Additionally, some theorems are developed to illustrate the convergence of the GCV mean. Ultimately, the Monte-Carlo simulation studies and an actual real-life data set are conducted to support our theoretical discussion, and the findings indicated that the new estimator outperformed the other estimators under consideration.
- Research Article
- 10.1177/09622802251356592
- Jul 14, 2025
- Statistical methods in medical research
- Yichen Lou + 3 more
This article discusses regression analysis of interval-censored failure time data in the presence of a cure fraction and nonignorable missing covariates. To address the challenges caused by interval censoring, missing covariates and the existence of a cure subgroup, we propose a joint semiparametric modeling framework that simultaneously models the failure time of interest and the missing covariates. In particular, we present a class of semiparametric nonmixture cure models for the failure time and a semiparametric density ratio model for the missing covariates. A two-step likelihood-based estimation procedure is developed and the large sample properties of the resulting estimators are established. An extensive numerical study demonstrates the good performance of the proposed method in practical settings and the proposed approach is applied to an Alzheimer's disease study that motivated this study.
- Research Article
- 10.1093/biomtc/ujaf121
- Jul 3, 2025
- Biometrics
- Benny Ren + 2 more
Bayesian Cox semiparametric regression is an important problem in many clinical settings. The elliptical information geometry of Cox models is underutilized in Bayesian inference but can effectively bridge survival analysis and hierarchical Gaussian models. Survival models should be able to incorporate multilevel modeling such as case weights, frailties, and smoothing splines, in a straightforward manner similar to Gaussian models. To tackle these challenges, we propose the Cox-Pólya-Gamma algorithm for Bayesian multilevel Cox semiparametric regression and survival functions. Our novel computational procedure succinctly addresses the difficult problem of monotonicity-constrained modeling of the nonparametric baseline cumulative hazard along with multilevel regression. We develop two key strategies based on the elliptical geometry of Cox models that allows computation to be implemented in a few lines of code. First, we exploit an approximation between Cox models and negative binomial processes through the Poisson process to reduce Bayesian computation to iterative Gaussian sampling. Next, we appeal to sufficient dimension reduction to address the difficult computation of nonparametric baseline cumulative hazards, allowing for the collapse of the Markov transition within the Gibbs sampler based on beta sufficient statistics. We explore conditions for uniform ergodicity of the Cox-Pólya-Gamma algorithm. We provide software and demonstrate our multilevel modeling approach using open-source data and simulations.
- Research Article
1
- 10.1093/biomtc/ujaf093
- Jul 3, 2025
- Biometrics
- Nate Wiecha + 2 more
ABSTRACTPublic health data are often spatially dependent, but standard spatial regression methods can suffer from bias and invalid inference when the independent variable is associated with spatially correlated residuals. This could occur if, for example, there is an unmeasured environmental contaminant associated with the independent and outcome variables in a spatial regression analysis. Geoadditive structural equation modeling (gSEM), in which an estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters of interest, has previously been proposed as a solution but there has been little investigation of gSEM’s properties with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian processes with Matèrn covariance to estimate the spatial trends and term this class of estimators double spatial regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form variance estimation, and show that in simulations where standard spatial regression estimators are highly biased and have poor coverage, DSR can mitigate bias more effectively than competitors and obtain nominal coverage.
- Research Article
- 10.30598/barekengvol19iss3pp1525-1536
- Jul 1, 2025
- BAREKENG: Jurnal Ilmu Matematika dan Terapan
- Andi Tenri Ampa + 4 more
The issue of gender equality in Southeast Sulawesi still needs further attention, as indicated by the uneven value of the Gender Development Index (GDI) in each district/city in the region. Therefore, an in-depth analysis is needed to identify factors that affect the GDI. One method that can be used is semiparametric regression with the Nadaraya-Watson estimator, which allows modeling the relationship between variables with more flexibility. This study aims to build a semiparametric regression model to identify factors that contribute to HDI in Southeast Sulawesi Province. The results of the analysis showed that the optimal bandwidth values obtained were h1= 1.57, h2=0.49, h3=2.50 and h4=4.61. The resulting model has an R2 and MSE values of 99.8% and 0.14% respectively, indicating that the model has high accuracy in explaining the overall variation in GDI.
- Research Article
- 10.17345/rio34.476
- Jun 30, 2025
- Revista Internacional de Organizaciones
- Junghyun Baik
This study investigates the wage returns to Korean as the official language in South Korea's labor market, focusing on its influence on earnings and the potential nonlinearity in this relationship. Using nationally representative survey data from the Korean Education and Employment Panel Survey 1 (KEEP1), Korean proficiency is measured through reverse-coded levels of the College Scholastic Ability Test (CSAT) Korean Subject. Ordinary Least Squares (OLS) regression, spline functions, and semi-parametric kernel regression are applied to capture both linear and nonlinear wage effects. A differencing method is employed to control for confounding variables such as education and work experience, isolating the independent impact of Korean proficiency. The findings reveal that higher Korean proficiency levels are linked to accelerated wage premiums, possibly reflecting the broader importance of advanced linguistic skills in the Korean labor market. In contrast, lower proficiency levels could be associated with wage penalties, possibly due to linguistic difficulties that limit job opportunities and productivity. Notably, the sample comprises young adults aged 25 to 30 (with an average age of 27.56), which suggests that the observed effects may reflect early career dynamics where linguistic skills play a particularly pronounced role. These results underscore the dual role of Korean proficiency as both a component of human capital and a signaling mechanism, influencing hiring decisions and wage determination in the labor market. This study contributes to the literature by providing empirical evidence on the wage effects of official language proficiency, highlighting its nonlinear influence on earnings. The findings suggest that higher Korean proficiency yields increasing wage premiums, emphasizing the role of advanced language skills in professional success. Additionally, the study underscores the importance of aligning language education policies with labor market demands. Expanding beyond basic literacy, targeted educational and training programs should incorporate advanced linguistic competencies to enhance both academic and workplace language proficiencies, ultimately reducing linguistic disparities in economic opportunities.
- Research Article
- 10.26740/jetis.v1i02.35288
- Jun 25, 2025
- Journal of Education Technology and Information System
- Andrea Dani + 5 more
This research develops an information system based on the R-Shiny Dashboard, allowing users to perform nonparametric regression modeling. Internet-Regression Analysis (I-Regs) is the name of a dashboard that has been successfully developed. I-Regs provides a complete model library in regression analysis modeling, including parametric, nonparametric, and semiparametric regression. It is hoped that I-Regs can become a valuable tool for researchers, practitioners, and students in modeling regression analysis and solving various data analysis problems.
- Research Article
- 10.1371/journal.pone.0325130
- Jun 5, 2025
- PloS one
- Drini Morina + 2 more
The prevailing narrative in the management literature views R&D as a high-risk, high-return activity. Although firms with varying risk-return preferences pursue R&D, this conventional perspective continues to influence decision-making in both corporate strategy and economic policy. This paper questions the narrative by using a novel statistical framework that accounts for competitive strategy and environmental turbulences. Drawing on firm innovation data from the Community Innovation Survey (CIS), we apply semiparametric regression for location and scale to model both the mean and the variance of turnover growth as a function of the interaction between R&D intensity and environmental turbulence, across four common competitive strategy regimes. The findings reveal that for firms prioritizing price leadership across a broad product range, R&D is associated with reduced risk and minimal impact on average growth. Only for firms specifically focused on high quality or small product ranges, the results align with prior research, confirming the expected high-risk, high-return relationship associated with R&D.
- Research Article
- 10.1007/s10986-025-09681-3
- Jun 3, 2025
- Lithuanian Mathematical Journal
- Xufei Tang + 2 more
The Berry–Esseen bounds of wavelet estimator for semiparametric regression model whose errors form a linear process based on ANA sequences
- Research Article
- 10.1016/j.canep.2025.102796
- Jun 1, 2025
- Cancer epidemiology
- Marcela Guadalupe Canale + 3 more
Favorable trends in lung cancer incidence with unfavorable survival prognosis: A spatiotemporal analysis by histology in Córdoba, Argentina.