Machine Learning Approach for Analyzing Mixed Case Interval Censored Data with a Cured Subgroup.
We introduce a novel two-component framework for analyzing mixed case interval censored (MCIC) data featuring a cured subgroup. In such data, the time-to-event is known only within certain intervals determined by multiple random examination time points. Moreover, a portion of the subjects will never experience the event. The first component of our model focuses on estimating the likelihood of being cured (incidence), departing from the conventional generalized linear model to adopt a more adaptable support vector machine (SVM) approach capable of accommodating complex or non-linear covariate effects. The second component addresses the survival distribution of the uncured individuals (latency) and employs a Cox proportional hazards structure to maintain the straightforward interpretation of covariate effects. We develop an expectation maximization algorithm, incorporating the Platt scaling method, to estimate the probability of being cured. Our simulation study demonstrates that our model outperforms both logit-based and spline-based models in capturing complex classification boundaries, leading to more accurate estimates of cured/uncured probabilities and enhanced predictive accuracy for cure. We emphasize that enhancing the estimation accuracy regarding incidence subsequently improves the estimation outcomes concerning latency. Finally, we illustrate the efficacy of our methodology by applying it to the NASA's Hypobaric Decompression Sickness Data.
- Research Article
23
- 10.1016/j.compag.2023.108545
- Dec 17, 2023
- Computers and Electronics in Agriculture
Towards site-specific management of soil organic carbon: Comparing support vector machine and ordinary kriging approaches based on pedo-geomorphometric factors
- Research Article
25
- 10.3390/en12142693
- Jul 13, 2019
- Energies
In order to reduce operation and maintenance cost and improve fault diagnosis and detection accuracy for wind turbines, a study on advanced methods has been carried out. The purpose of this paper is to present a new method developed using radar chart and support vector machine (SVM) approach for fault diagnosis and prediction of wind turbine pitch system as it usually has a higher failure rate. In the study, the supervisory control and data acquisition (SCADA) system data are utilized as source data for SVM prediction. First of all, the characteristics of the indicator variable data collected by the SCADA system are analyzed, and the radar charts corresponding to the normal and faulty operation of the wind turbine pitch system are constructed using the indicator variable data. Secondly, the SVM method is used to extract the gray-level co-occurrence matrix (GLCM) features and histogram of oriented gradients (HOG) features of the radar charts, and the SVM classifier is trained. Then, the operational status is predicted, the classification effect is evaluated by the confusion matrix, and the prediction evaluation index is calculated. Thirdly, the support vector regression method is used to analyze the SCADA indicator variable data, the input and output of the regression model are determined, and the training prediction model is established, and the prediction accuracy of the test model is analyzed using the test sample data. Finally, the forecasting evaluation indexes obtained by the above two methods are compared. It proves that the proposed method using SVM to analyze the system radar charts has a higher prediction accuracy of 91.24% than the support vector regression method. The prediction accuracy is improved by 8.6%. Hence, it is verified that the new method using a radar chart and SVM approach has superiority over the support vector regression method.
- Conference Article
3
- 10.1109/iwsca.2008.15
- Jul 1, 2008
Traditional classification methods, such as neural network approaches, have suffered difficulties with generalization and producing models. Support vector machine (SVM) approach is considered a good candidate because of its high generalization performance without the need to add a priori knowledge, even when the dimension of the input space is very high. In this paper, SVM approach is proposed to segment images and we evaluate thoroughly its segmentation performance. Experimental results show that: (1) the effect of kernel function, model parameters and input vectors on the segmentation performance is significant; (2) SVM approach is suitably used as learning machine under the condition of small sample sizes; (3) SVM approach is less sensitive to noise in image segmentation.
- Research Article
1
- 10.1002/sim.10225
- Nov 14, 2024
- Statistics in medicine
Partially linear models provide a valuable tool for modeling failure time data with nonlinear covariate effects. Their applicability and importance in survival analysis have been widely acknowledged. To date, numerous inference methods for such models have been developed under traditional right censoring. However, the existing studies seldom target interval-censored data, which provide more coarse information and frequently occur in many scientific studies involving periodical follow-up. In this work, we propose a flexible class of partially linear transformation models to examine parametric and nonparametric covariate effects for interval-censored outcomes. We consider the sieve maximum likelihood estimation approach that approximates the cumulative baseline hazard function and nonparametric covariate effect with the monotone splines and -splines, respectively. We develop an easy-to-implement expectation-maximization algorithm coupled with three-stage data augmentation to facilitate maximization. We establish the consistency of the proposed estimators and the asymptotic distribution of parametric components based on the empirical process techniques. Numerical results from extensive simulation studies indicate that our proposed method performs satisfactorily in finite samples. An application to a study on hypobaric decompression sickness suggests that the variable TR360 exhibits a significant dynamic and nonlinear effect on the risk of developing hypobaric decompression sickness.
- Conference Article
3
- 10.1109/ieem.2007.4419173
- Dec 1, 2007
Due to the nature of high-leverage, generous remuneration can be earned by small capital investment. Therefore, analysis of futures prices becomes one of the most interesting topics in financial markets. Recently, by applying the structure risk minimization principle, support vector machines (SVM) approach has been one of the most power techniques to dealing with classification problems. In this investigation, trading information including technical indicators is employed by SVM model to predict movement directions of Taiwan stock index futures prices. Due to data preprocess has essential influence on prediction accuracy of SVM models, preprocessed data provides by different methods are used to examine impacts on prediction performance of SVM models. Experimental results reveal that the SVM approach has the best performance when data are processed by scaling and differencing operations.
- Research Article
67
- 10.1016/j.conbuildmat.2021.123396
- Apr 25, 2021
- Construction and Building Materials
Predicting the mechanical properties of cement mortar using the support vector machine approach
- Research Article
507
- 10.1016/j.jhydrol.2006.04.030
- Jun 27, 2006
- Journal of Hydrology
Downscaling of precipitation for climate change scenarios: A support vector machine approach
- Conference Article
32
- 10.1109/icnc.2007.672
- Jan 1, 2007
Software project development has high failure rate. Software project risk management may gain a high rate of return in investment. Establishing an intelligent risk evaluation model for project will be valuable in the analysis and control of project risks. In this paper, we employed neural network (NN) and support vector machine (SVM) approaches to establish a model for risk evaluation in project development. In the model, the input is a vector of software risk factors that were obtained through interview with 30 experts, and the output is the final outcome of the project. The data for modeling were collected from 120 real software projects through questionnaires. The experiment shows the model is valid. Interestingly, SVM is a powerful supervised learning method, and some believe that it is a more promising classification method that may someday supercede NN. In our study, the standard neural network model had lower prediction accuracy compared to SVM due to its tendency in finding local optima. However, after attempt in optimizing the neural network model with genetic algorithm, the experimental results showed that our enhanced model surpassed SVM in performance.
- Research Article
13
- 10.14311/nnw.2011.21.009
- Jan 1, 2011
- Neural Network World
Bankruptcy has long been an important topic in flnance and account- ing research. Recent headline bankruptcies have included Enron, Fannie Mae, Freddie Mac, Washington Mutual, Merrill Lynch, and Lehman Brothers. These bankruptcies and their flnancial fallout have become a serious public concern due to huge in∞uence these companies play in the real economy. Many researchers be- gan investigating bankruptcy predictions back in the early 1970s. However, until recently, most research used prediction models based on traditional statistics. In recent years, however, newly-developed data mining techniques have been applied to various flelds, including performance prediction systems. This research applies particle swarm optimization (PSO) to obtain suitable parameter settings for a sup- port vector machine (SVM) model and to select a subset of beneflcial features without reducing the classiflcation accuracy rate. Experiments were conducted on an initial sample of 80 electronic companies listed on the Taiwan Stock Exchange Corporation (TSEC). This paper makes four critical contributions: (1) The results indicate the busi- ness cycle factor mainly afiects flnancial prediction performance and has a greater in∞uence than flnancial ratios. (2) The closer we get to the actual occurrence of flnancial distress, the higher the accuracy obtained both with and without fea- ture selection under the business cycle approach. For example, PSO-SVM without feature selection provides 89.37% average correct cross-validation for two quarters prior to the occurrence of flnancial distress. (3) Our empirical results show that PSO integrated with SVM provides better classiflcation accuracy than the Grid search, and genetic algorithm (GA) with SVM approaches for companies as normal or under threat. (4) The PSO-SVM model also provides better prediction accu- racy than do the Grid-SVM, GA-SVM, SVM, SOM, and SVR-SOM approaches for seven well-known UCI datasets. Therefore, this paper proposes that the PSO- SVM approach could be a more suitable method for predicting potential flnancial distress.
- Conference Article
5
- 10.1109/cec.2008.4630802
- Jun 1, 2008
Classification of tumor types based on genomic information is essential for improving future cancer diagnosis and drug development. Since DNA microarray studies produce a large amount of data, effective analytical methods have to be developed to sort out whether specific cancer samples have distinctive features of gene expression over normal samples or other types of cancer samples. In this paper, an integrated approach of support vector machine (SVM) and genetic algorithm (GA) is proposed for this purpose. The proposed approach can simultaneously optimize the feature subset and the classifier through a common solution coding mechanism. As an illustration, the proposed approach is applied in searching the combinational gene signatures for predicting histologic response to chemotherapy of osteosarcoma patients, which is the most common malignant bone tumor in children. Cross-validation results show that the proposed approach outperforms other existing methods in terms of classification accuracy. Further validation using an independent dataset shows misclassification of only one of fourteen patient samples suggesting that the selected gene signatures can reflect the chemoresistance in osteosarcoma.
- Conference Article
6
- 10.1109/ijcnn.2009.5178827
- Jun 1, 2009
To improve cancer diagnosis and drug development, the classification of tumor types based on genomic information is important. As DNA microarray studies produce a large amount of data, expression data are highly redundant and noisy, and most genes are believed to be uninformative with respect to the studied classes. Only a fraction of genes may present distinct profiles for different classes of samples. Classification tools to deal with these issues are thus important. These tools should learn to robustly identify a subset of informative genes embedded in a large dataset that is contaminated with high dimensional noises. In this paper, an integrated approach of support vector machine (SVM) and particle swarm optimization (PSO) is proposed for this purpose. The proposed approach can simultaneously optimize the selection of feature subset and the classifier through a common solution coding mechanism. As an illustration, the proposed approach is applied to search the combinational gene signatures for predicting histologic response to chemotherapy of osteosarcoma patients. Cross-validation results show that the proposed approach outperforms other existing methods in terms of classification accuracy. Further validation using an independent dataset shows misclassification of only one out of fourteen patient samples, suggesting that the selected gene signatures can reflect the chemoresistance in osteosarcoma.
- Research Article
16
- 10.12989/cac.2008.5.5.461
- Oct 25, 2008
- Computers and Concrete
The paper explores the potential of Support Vector Machines (SVM) approach in predicting 28-day compressive strength and slump flow of self-compacting concrete. Total of 80 data collected from the exiting literature were used in present work. To compare the performance of the technique, prediction was also done using a back propagation neural network model. For this data-set, RBF kernel worked well in comparison to polynomial kernel based support vector machines and provide a root mean square error of 4.688 (MPa) (correlation coefficient=0.942) for 28-day compressive strength prediction and a root mean square error of 7.825 cm (correlation coefficient=0.931) for slump flow. Results obtained for RMSE and correlation coefficient suggested a comparable performance by Support Vector Machine approach to neural network approach for both 28-day compressive strength and slump flow prediction.
- Conference Article
2
- 10.1109/rivf.2009.5174613
- Jan 1, 2009
Automatic key phrase extraction is the task of automatically selecting a set of phrases that describe the content of a simple sentence. That a key phrase is extracted means that it is present verbatim in the sentence to which it is assigned. Accurate key phrase extraction is fundamental to the success of many recent digital library applications, clustering, and semantic information retrieval techniques. The present research discusses a support vector machines (SVMs) approach for Vietnamese key phrase extraction and presents a number of experiments in which performance is incrementally improved. In general, the Vietnamese key phrase extracting process consists of three steps: word segmentation for identifying lexical units in an input sentence, part-of-speech tagging for words, and key phrase extraction for phrases. The performance of Vietnamese key phrase extraction systems is generally measured by the precision rate attained. This depends strongly on the nature and the size of a training set of key phrases. Most results are superior to 70.30% with a training set of 9,000 Vietnamese key phrases with of 2,000 sentences which was selected from the corpus of Vietnamese Lexicography Center (www.vietlex.com.vn).
- Research Article
1
- 10.1016/j.insmatheco.2015.01.004
- Jan 20, 2015
- Insurance: Mathematics and Economics
Optimal consumption and investment problem with random horizon in a BMAP model
- Research Article
- 10.56294/dm2025568
- Jan 1, 2025
- Data and Metadata
Malaria remained a significant global health issue, particularly in tropical and subtropical regions. The disease resulted in a substantial number of clinical cases and deaths each year, with high-risk groups including infants, toddlers, and pregnant women. Accurate and prompt diagnosis was a key factor in managing the disease. To address this issue, the research aimed to develop an automated system for the classification of Plasmodium falciparum malaria parasites based on blood smear images. The methods employed included image feature selection using Principal Component Analysis (PCA) and the Support Vector Machine (SVM) approach for classification. The research findings indicated that in the image feature selection process, the category of normal malaria exhibited distinctive characteristics with PC1 and PC2 values that tended to be negative and dispersed, whereas the category of parasitic malaria displayed greater variability in both PC1 and PC2 components. Furthermore, the evaluation of the classification system's accuracy using SVM with three different kernel types showed promising results. The average accuracy through K-fold cross-validation for the polyinomial, linear, and radial basis function kernels was 96.7%, 98.9%, and 94.4%, respectively. These results highlighted the significant potential of SVM utilization in the classification of Plasmodium falciparum malaria parasites based on blood smear images.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.