A Damped Hessian-Free Newton--Conjugate Gradient Method for Weighted Multiclass Neural Classification
This study presents a deterministic damped Hessian-free Newton--CG method for weighted multiclass neural classification. The method is built from a weighted categorical cross-entropy objective, a damped local quadratic model, and a matrix-free curvature representation through Hessian--vector products. The search direction is computed by an inexact conjugate gradient solve, while Armijo backtracking and adaptive damping are used to improve stability. The method is implemented for the classification of academic predicate categories using preprocessed student data with mixed categorical and numerical features. Its numerical behavior is compared with SGD with momentum, RMSProp, and Adam under the same loss, initialization, and network architecture. The proposed method is computationally feasible, attains the best overall weighted test-set performance among the compared methods, and exhibits a distinct optimization trajectory driven by curvature-informed updates. These results show that a damped Hessian-free formulation provides a mathematically transparent, reproducible, and practically competitive framework for second-order optimization in multiclass neural classification.
- Research Article
17
- 10.1017/asb.2023.15
- Apr 24, 2023
- ASTIN Bulletin
We focus on modelling categorical features and improving predictive power of neural networks with mixed categorical and numerical features in supervised learning tasks. The goal of this paper is to challenge the current dominant approach in actuarial data science with a new architecture of a neural network and a new training algorithm. The key proposal is to use a joint embedding for all categorical features, instead of separate entity embeddings, to determine the numerical representation of the categorical features which is fed, together with all other numerical features, into hidden layers of a neural network with a target response. In addition, we postulate that we should initialize the numerical representation of the categorical features and other parameters of the hidden layers of the neural network with parameters trained with (denoising) autoencoders in unsupervised learning tasks, instead of using random initialization of parameters. Since autoencoders for categorical data play an important role in this research, they are investigated in more depth in the paper. We illustrate our ideas with experiments on a real data set with claim numbers, and we demonstrate that we can achieve a higher predictive power of the network.
- Research Article
- 10.36306/konjes.1003916
- Jun 1, 2022
- Konya Journal of Engineering Sciences
In this study, it is aimed to determine the optimal conjugate gradient (CG) method for the geometry fitting of 2D measured profiles. To this end, the three well-known CG methods such as the Fletcher-Reeves, Polak-Ribiere and Hestenes-Stiefel were employed. For testing those methods performances, the five primitive geometries accommodating circle, square, triangle, ellipse and rectangle were first built with a 3D printer, and then they were scanned with a coordinate measuring machine (CMM) to achieve their 2D profiles. The nonlinear least squares procedure was implemented to minimize the error between those measured data and modeled ones. An iterative line search was utilized for this task. The search direction was calculated using the above-mentioned CG methods. During the geometry fitting process, the number of function evaluations at each iteration were computed and the total number of function evaluations were set to be a performance measure of the CG method in question when it converged. By using these performance measures, the performance and data profiles were created to efficiently determine the optimal CG method. Based on performance profiles, it can be stated that the Fletcher-Reeves and Polak-Ribiere methods are the fastest ones on three test geometries out of five. In addition to that, all the CG methods were able to complete the geometry fitting of 80% of test geometries. On the other hand, by examining the data profiles, it was determined that the Polak-Ribiere and Hestenes-Stiefel methods achieve their maximum capabilities of the completing geometry fitting (i.e., 80%) with much lower number of function evaluations than the Fletcher-Reeves method. Besides, in most geometries, the Polak-Ribiere method outperformed the others, thereby it was determined to be the optimal one for the geometry fitting. As a conclusion, the reported results in this work might help the end-users who study on the CMM data processing to conduct an efficient geometry fitting.
- Research Article
4
- 10.1080/10739149.2022.2152459
- Dec 6, 2022
- Instrumentation Science & Technology
Electromagnetic tomography is a process detection technology based upon the principles of electromagnetic induction. The forward problem model and sensitivity distribution matrix of electromagnetic tomography are introduced as the basis of the inverse problem. The search direction and iterative parameters of the conjugate gradient algorithm are modified to improve the quality and convergence of image reconstruction. A new spectral parameter conjugate gradient algorithm is described to modify the search direction, which is used to control the angle between the old and new search directions. The search direction is determined according to the iteration of each step in order to find the optimal solution. Combining the advantages of the Fletcher-Reeves and Polak-Ribiere-Polyak algorithms in the nonlinear conjugate gradient algorithm, they are mixed in a specific proportion to obtain a new hybrid conjugate gradient algorithm. In order to verify the effectiveness of the modified conjugate gradient algorithm, three physical models of electromagnetic tomography system are constructed, and the modified conjugate gradient algorithm is compared with the traditional algorithm. The experimental results show that the reconstructed image quality of the modified spectral conjugate gradient algorithm is higher and has better numerical performance. The hybrid conjugate gradient algorithm highlights the advantages of the Fletcher-Reeves and Polak-Ribiere-Polyaks algorithms. The convergence speed is faster than the Polak-Ribiere-Polyak method, and the imaging quality is higher than the other algorithms.
- Research Article
1110
- 10.1137/1011036
- Apr 1, 1969
- SIAM Review
Convergence Conditions for Ascent Methods
- Research Article
64
- 10.1016/j.bspc.2022.103666
- Apr 5, 2022
- Biomedical Signal Processing and Control
Impact of categorical and numerical features in ensemble machine learning frameworks for heart disease prediction
- Research Article
37
- 10.1016/j.cam.2017.04.045
- May 5, 2017
- Journal of Computational and Applied Mathematics
Accelerated adaptive Perry conjugate gradient algorithms based on the self-scaling memoryless BFGS update
- Research Article
- 10.1142/s0219876225500100
- Mar 19, 2025
- International Journal of Computational Methods
This study introduces an improved three-term conjugate gradient (CG) algorithm which fulfills the descent conditions and exhibits good global convergence properties. The development of this new algorithm is based on the findings of recently introduced generalized RMIL CG technique. The algorithm has been modified so that the search direction would always meet the sufficient descent condition no matter the line-search techniques employed. Under some given assumptions, the algorithm’s global convergence result for the general nonconvex functions is established. The numerical efficacy of the suggested three-term CG algorithm is assessed through comparisons with four other CG algorithms. The assessment is performed using an array of unconstrained optimization benchmark functions. The acquired findings indicate that all the considered algorithms exhibit similar performance, particularly the variants of the RMIL method. One algorithm is more robust and slightly faster than the rest. However, it is essential to highlight that the newly developed CG formula outperforms all the other algorithms, including a recently presented three-term PRP algorithm. The study further expanded the proposed method to address an image restoration problem. The computational experiments yielded promising results, proving that the proposed CG algorithm surpasses others. It produces higher-quality output images, requires less CPU time (sec), and achieves higher PSNR values.
- Research Article
- 10.1080/0305215x.2026.2644585
- Apr 7, 2026
- Engineering Optimization
Conjugate Gradient (CG) methods have gained more traction for several classes of problem owing to their characteristics such as low memory requirements and simple implementation. It is well known that two characteristics that define CG methods are step-length α ( k ) and search direction d ( k ) (which has in its formulation a β ( k ) parameter—a crucial component of CG methods) and yet, on the other hand, most researchers of CG methods focus only on developing the β ( k ) term with little concern given to innovation on the α ( k ) front. Thus, they mainly adopt classical monotone Line Search (LS), which has been shown to creep in for problems with a curved narrow valley. To mitigate some of these shortcomings, this article introduces a non-monotone LS approach that is computationally efficient. Thus, it is suitable for large-scale problems. Moreover, two variants of CG directions based on the Hestenes–Stiefel (HSCG) method are coined by incorporating a newly formulated diagonal-based Barzilai–Borwein (BB) spectral method. The diagonal-based BB approach may capture more information about the curvature of the objective function. Consequently, two new non-monotone CG methods are introduced and, under some mild assumptions, the methods are shown to have sufficient descent property and be globally convergent. Moreover, the convergence rate and complexity of the methods is analysed and is shown to achieve O ( log ( 1 / ϵ ) ) iteration complexity for functions satisfying the Polyak–Łojasiewicz condition. Finally, the applicability of the methods is investigated by solving a data-driven model on real-world datasets and signal processing problems.
- Conference Article
2
- 10.1109/nssmic.2010.5874221
- Oct 1, 2010
The Conjugate Gradient (CG) method is an optimization algorithm used to determine the numerical solution of particular systems of linear equations which may be expressed as a symmetric and positive definite matrix. The CG method is iterative, so it can be applied to systems which are too large to be handled by direct methods. The CG method can also be used to solve unconstrained optimization problems such as PET reconstruction. In the Bayesian PET reconstruction problem, Preconditioned Conjugate Gradient (PCG) algorithms were previously shown to have more favorable convergence rates than expectation maximization (EM) type algorithms [1]. However, PCG fails to converge on partial datasets. Block iterative methods such as Ordered Subset Expectation Maximization (OSEM) have become the most commonly used methods in PET reconstruction, as they require less iteration than PCG. This work combines both algorithms, PCG-OSEM, to reduce the number of iterations and speed up the convergence of OSEM. The proposed search direction of the CG is orthogonal to previous search directions, and in the image space rather than projection domain. Therefore, single iteration can be performed to achieve an acceptable PET reconstructed image.
- Research Article
5
- 10.15587/1729-4061.2022.254017
- Apr 28, 2022
- Eastern-European Journal of Enterprise Technologies
Optimization is now considered a branch of computational science. This ethos seeks to answer the question «what is best?» by looking at problems where the quality of any answer can be expressed numerically. One of the most well-known methods for solving nonlinear, unrestricted optimization problems is the conjugate gradient (CG) method. The Hestenes and Stiefel (HS-CG) formula is one of the century’s oldest and most effective formulas. When using an exact line search, the HS method achieves global convergence; however, this is not guaranteed when using an inexact line search (ILS). Furthermore, the HS method does not always satisfy the descent property. The goal of this work is to create a new (modified) formula by reformulating the classic parameter HS-CG and adding a new term to the classic HS-CG formula. It is critical that the proposed method generates sufficient descent property (SDP) search direction with Wolfe-Powell line (sWPLS) search at every iteration, and that global convergence property (GCP) for general non-convex functions can be guaranteed. Using the inexact sWPLS, the modified HS-CG (mHS-CG) method has SDP property regardless of line search type and guarantees GCP. When using an sWPLS, the modified formula has the advantage of keeping the modified scalar non-negative sWPLS. This paper is significant in that it quantifies how much better the new modification of the HS performance is when compared to standard HS methods. As a result, numerical experiments between the mHSCG method using the sWPL search and the standard HS optimization problem show that the CG method with the mHSCG conjugate parameter is more robust and effective than the CG method without the mHSCG parameter
- Research Article
5
- 10.1186/s13660-024-03142-0
- May 28, 2024
- Journal of Inequalities and Applications
The stationary point of optimization problems can be obtained via conjugate gradient (CG) methods without the second derivative. Many researchers have used this method to solve applications in various fields, such as neural networks and image restoration. In this study, we construct a three-term CG method that fulfills convergence analysis and a descent property. Next, in the second term, we employ a Hestenses-Stiefel CG formula with some restrictions to be positive. The third term includes a negative gradient used as a search direction multiplied by an accelerating expression. We also provide some numerical results collected using a strong Wolfe line search with different sigma values over 166 optimization functions from the CUTEr library. The result shows the proposed approach is far more efficient than alternative prevalent CG methods regarding central processing unit (CPU) time, number of iterations, number of function evaluations, and gradient evaluations. Moreover, we present some applications for the proposed three-term search direction in image restoration, and we compare the results with well-known CG methods with respect to the number of iterations, CPU time, as well as root-mean-square error (RMSE). Finally, we present three applications in regression analysis, image restoration, and electrical engineering.
- Research Article
- 10.1016/j.knosys.2025.115049
- Feb 1, 2026
- Knowledge-Based Systems
• Neural Network feature selection methodology for both numerical and categorical data. • Embedded feature selection provides model and feature subset in a single training. • Novel feature embedding for dealing with high-cardinality categorical features. • Validation of the methodology on several open and real industry dataset. • Comparison with several state-of-the-art feature selection methodologie. In an era of effortless data collection, the impact of machine learning — especially neural networks (NNs) — is undeniable. As datasets grow in size and complexity, efficiently handling mixed data types, including categorical and numerical features, becomes critical. Feature encoding and selection play a key role in improving NN performance, efficiency, interpretability, and generalisation. This paper presents GLEm-Net (Grouped Lasso with Embeddings Network), a novel NN-based approach that seamlessly integrates feature encoding and selection directly into the training process. GLEm-Net uses embedding layers to process categorical features with high cardinality, simplifying the model and improving generalisation. By extending the grouped Lasso regularisation to explicitly consider categorical features, GLEm-Net automatically identifies the most relevant features during training and returns them to the analyst. We evaluate GLEm-Net on open and proprietary industry datasets and compare it to state-of-the-art feature selection methodologies. Results show that GLEm-Net adapts to each dataset by allowing the NN to directly select subsets of most important features, offering on par performance with the best state-of-the-art feature selection methods, while eliminating the need for the external feature encoding and selection steps that are now incorporated in the NN training stage.
- Research Article
19
- 10.1080/00423114.2023.2239391
- Jul 25, 2023
- Vehicle System Dynamics
Semi-active primary suspensions are an effective means of improving ride quality in high-speed railway vehicles in relation to the mitigation of car-body bending vibration. In this paper, prototype magnetorheological (MR) dampers are tested and the results are used to define a mathematical model of the dampers. Then, three control schemes for semi-active primary suspensions are proposed: Skyhook, LQG and Mix-1-Sensor, and their performance is assessed by means of Hardware-In-the-Loop (HIL) tests considering a simple quarter-vehicle model which is run on a real-time board and set in interaction with one physical MR damper. The results show that all three considered control strategies lead to a reduction of car-body vibration by around 30% and a very good agreement is found between HIL tests and numerical simulations in which the physical damper is replaced by the mathematical damper model. The damper model is finally interfaced with a flexible multi-body model of the complete vehicle to provide further assessment of semi-active control. The results of the latter simulations show that the semi-active suspension could provide an improvement of the N mvz ride quality index in the order of 40–45% with respect to the passive vehicle for all three control schemes.
- Conference Article
1
- 10.2991/iiicec-15.2015.178
- Jan 1, 2015
Solenoid Valve CDC Damper is one of mainstream commercial solutions for Active Suspension.For most of Active Suspension control strategy, CDC dampers are treated as force generator [1].For getting desirable damper force, CDC Damper performance is necessary.There are two way to describe to CDC damper performance, data lookup and model building.Because of unpredictable extensionality, Data lookup require large of experiment data to maintain accuracy that effect efficiency of lookup algorithm.CDC damper model building involve in field of mechanism, hydrodynamics and electronic.Although model building have high quality of extensionality.But complex algorithm make it hard to implement on control unit.I propose a new way, compressing CDC experiment data by fitting basic valve model.By understanding about construction of one type of Solenoid Valve CDC damper, I build basic damper model based on valve theory, then using experiment data to tune model parameter to fit experiment data.Basic damper model I build is simple and easy to implement on controller unit.
- Conference Article
2
- 10.1063/1.4887759
- Jan 1, 2014
- AIP conference proceedings
An approach of using conjugate gradient and classic steepest descent search direction onto quasi-Newton search direction had been proposed in this paper and we called it as 'scaled CGSD-QN' search direction. A new coefficient formula had been successfully constructed for being used in the 'scaled CGSD-QN' search direction and proven here that the coefficient formula is globally converge to the minimizer. The Hessian update formula that has been used in the quasi-Newton algorithm is DFP update formula. This new search direction approach was testes with some some standard unconstrained optimization test problems and proven that this new search direction approach had positively affect quasi-Newton method by using DFP update formula.