In-line NIR coupled with machine learning to predict mechanical properties and dissolution profile of PLA-Aspirin

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

In the production of polymeric drug delivery devices, dissolution profile and mechanical properties of the drug loaded polymeric matrix are considered important Critical Quality Attributes (CQA) for quality assurance. However, currently the industry relies on offline testing methods which are destructive, slow, labour intensive, and costly. In this work, a real-time method for predicting these CQAs in a Hot Melt Extrusion (HME) process is explored using in-line NIR and temperature sensors together with Machine Learning (ML) algorithms. The mechanical and drug dissolution properties were found to vary significantly with changes in processing conditions, highlighting that real-time methods to accurately predict product properties are highly desirable for process monitoring and optimisation. Nonlinear ML methods including Random Forest (RF), K-Nearest Neighbours (KNN) and Recursive Feature Elimination with RF (RFE-RF) outperformed commonly used linear machine learning methods. For the prediction of tensile strength RFE-RF and KNN achieved R2 values 98% and 99%, respectively. For the prediction of drug dissolution, two time points were considered with drug release at t = 6 h as a measure of the extent of burst release, and t = 96 h as a measure of sustained release. KNN and RFE-RF achieved R2 values of 97% and 96%, respectively in predicting the drug release at t = 96 h. This work for the first time reports the prediction of drug dissolution and mechanical properties of drug loaded polymer product from in-line data collected during the HME process.

Similar Papers
  • Research Article
  • Cite Count Icon 51
  • 10.1016/j.ijpharm.2018.07.029
Predicting physical stability of ternary amorphous solid dispersions using specific mechanical energy in a hot melt extrusion process
  • Jul 10, 2018
  • International Journal of Pharmaceutics
  • Masataka Hanada + 4 more

Predicting physical stability of ternary amorphous solid dispersions using specific mechanical energy in a hot melt extrusion process

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.addma.2022.103196
Adjusting the melting point of an Active Pharmaceutical Ingredient (API) via cocrystal formation enables processing of high melting drugs via combined hot melt and materials extrusion (HME and ME)
  • Dec 1, 2022
  • Additive Manufacturing
  • Marta Kozakiewicz-Latała + 7 more

One of the biggest challenges for the application of materials extrusion (ME) technology in the pharmaceutical sector is the lack of ready-to-use polymeric materials (filaments) of pharmaceutical quality. To overcome this challenge materials extrusion can be combined with the Hot Melt Extrusion (HME) process enabling the production of filaments using pharmaceutically approved polymers with incorporated drugs. This manuscript presents a step by step approach for the formulation of additively manufactured tablets containing a high melting point API (hydrochlorothiazide, HCT, T m = 266-268 °C) with two pharmaceutically approved polymers (polyvinyl alcohol, PVA and hydroxypropyl methylcellulose acetate succinate, HPMCAS) of different technological and pharmaceutical properties via combined HME and ME processing. The thermal properties of a model drug were adapted to the processing window of both HME and ME by obtaining a hydrochlorothiazide: nicotinamide cocrystal (HCT:NIC) with a lower melting point (T m = 173.3 °C) than the starting material. Two plasticizers were used - triethyl citrate (TEC) for HPMCAS and sorbitol (SOR) for PVA. Blends containing 10-50 wt. % of plasticizer were prepared using the fusion method and thoroughly analysed using PXRD, DSC, and FTIR enabling the selection of formulations for further HME processing. Placebo filaments with 10-30 wt. % of plasticizer were obtained and their mechanical properties and ME processability were assessed to select a polymer blend with desirable process parameters. Finally, the filaments containing 20 wt. % of plasticizer with HCT or the HCT:NIC cocrystal were produced and processed using ME to form tablets. The phase of the drug (crystalline or amorphous), mechanical properties of filaments and tablets as well as the drug content in the obtained materials were assessed using PXRD, DSC, and FTIR followed by materials imaging with polarising microscopy and SEM. The cocrystal formation not only enabled to modify the melting point of the drug to match the temperature of both processes but also improved the mechanical properties of the filaments which is important for ME processing. In the case of PVA based formulations the cocrystal turned amorphous upon HME processing forming flexible and printable filaments. In contrast, the drug embedded in HPMCAS based filaments formed crystals that affected the mechanical properties of the extrudates. The mechanical properties of the obtained tablets and the release profile of the drug from the AM tablets were enhanced as compared to the materials obtained using conventional methods, i.e. tableting and encapsulation. • High melting drugs can be incorporated into pharmaceutically approved polymers in the form of cocrystals via combined HME and ME processing, • Modification of physicochemical properties of the API via cocrystal formation enables successful processing of high melting point APIs into elastic filaments, • Low miscibility of the drug with polymers and the presence of crystalline drug in the filaments may result in brittle materials difficult to process via ME, • Openwork 3D tablet structure and cocrystal formation substantially improve drug dissolution.

  • Research Article
  • Cite Count Icon 88
  • 10.1081/ddc-100102169
Thermal Behavior and Dissolution Properties of Naproxen From Binary and Ternary Solid Dispersions
  • Jan 1, 1999
  • Drug Development and Industrial Pharmacy
  • P Mura + 4 more

Solid dispersions of 10% w/w naproxen (NAP) in poly(ethylene glycol) (PEG) (4000, 6000, or 20,000) as a carrier with or without incorporation of anionic (sodium dodecyl sulfate; SDS) or nonionic (Tween 80; Tw80) surfactant were prepared by the melting method. Physicochemical characteristics were determined by differential scanning calorimetry (DSC) and X-ray diffraction analysis. The results of dissolution studies showed that drug dissolution properties were better from ternary systems than from binary systems since in the former the wetting and solubilizing effects of surfactant and polymer were additive. No influence of the PEG molecular weight was found. The best performance given by anionic surfactant has been attributed to several factors, such as higher hydrophilicity, better solubilizing power, and most facile interaction with both drug and PEG. No important changes in solid-state characteristics or in drug dissolution properties were found after 30 months storage for dispersions with or without surfactant. Only a slight decrease in initial drug dissolution rate was observed at the highest concentration (10% w/w) of SDS.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.jddst.2022.104075
Investigation of hot melt extrusion process parameters on solubility and tabletability of atorvastatin calcium in presence of Neusilin® US2
  • Dec 13, 2022
  • Journal of Drug Delivery Science and Technology
  • Ahmed Almotairy + 8 more

Investigation of hot melt extrusion process parameters on solubility and tabletability of atorvastatin calcium in presence of Neusilin® US2

  • Dissertation
  • 10.25394/pgs.13333694.v1
Critical Quality Attributes of Hot Melt Extruded Amorphous Solid Dispersions
  • Dec 15, 2020
  • Dana E Moseson

The success of an amorphous solid dispersion (ASD) formulation, consisting of a homogeneous molecular dispersion of drug and polymer, relies on its ability to create and maintain a supersaturated solution. However, supersaturated solutions are metastable and prone to crystallization. In solution, crystals are expected to serve as a template for crystal growth, depleting achieved supersaturation. Thus, in an ASD product, ideally no crystallinity should be present. However, technical challenges exist in both processing and characterization to routinely ensure this is achieved. The presented studies follow the process design, characterization, and dissolution performance of hot melt extruded amorphous solid dispersions, seeking insight into the significance of critical quality attributes of resulting extrudates, namely residual crystallinity and thermal degradation.Selection of hot melt extrusion (HME) processing conditions to prepare ASDs is governed by thermodynamic and kinetic attributes of the drug and polymer system. Mapping the temperature-composition phase diagram to HME processing conditions provides a processing design strategy to prevent residual crystallinity while simultaneously avoiding thermal degradation. Through processing temperatures below the drug’s melting point (Tm) and above the formulation critical temperature (Tc), fully amorphous systems could be generated if sufficient kinetics were provided. The utility of thermogravimetric analysis was critically examined for prediction of the chemical stability processing window for HME formulations.<br>For characterization and product performance characterization, residual crystalline content in HME ASDs can be anticipated and tailored to various levels. Several HME ASDs were characterized by a range of analytical techniques, highlighting the sensitivity of available techniques to qualitatively or quantitatively detect crystalline content (depending on limitations which stem from properties of the instrument or sample). Transmission electron microscopy (TEM) was found to identify low levels of crystallinity not observed by other technique and provide insight into crystal dissolution mechanisms. A defect-site driven dissolution and fragmentation model was suggested, and supported by a Monte Carlo simulation, underscoring that crystal defect sites, either intrinsic to the crystals or formed during processing, expedite dissolution rates and generation of new surfaces for dissolution.<br>Non-sink dissolution was performed for indomethacin/PVPVA HME ASD samples with residual crystallinity ranging from 0-25% crystalline content. Due to effective crystal growth inhibition by the polymer, crystals had little impact on dissolution performance. Achieved supersaturation was reduced approximately by the level of crystallinity present, i.e. a lost solubility advantage. These studies have significance for HME processing design and risk assessment of crystallinity within ASD formulations.<br>

  • Research Article
  • 10.3390/pharmaceutics17050568
Process Development for the Continuous Manufacturing of Carbamazepine-Nicotinamide Co-Crystals Utilizing Hot-Melt Extrusion Technology.
  • Apr 25, 2025
  • Pharmaceutics
  • Lianghao Huang + 6 more

Objectives: Hot-melt extrusion (HME) offers a solvent-free, scalable approach for manufacturing pharmaceutical co-crystals (CCs), aligning with the industry's shift to continuous manufacturing (CM). However, challenges like undefined yield optimization, insufficient risk management, and limited process analytical technology (PAT) integration hinder its industrial application. This study aimed to develop a proof-of-concept HME platform for CCs, assess process risks, and evaluate PAT-enabled monitoring to facilitate robust production. Methods: Using carbamazepine (CBZ) and nicotinamide (NIC) as model compounds, an HME platform compatible with PAT tools was established. A systematic risk assessment identified five key risk domains: materials, machinery, measurement, methods, and other factors. A Box-Behnken design of experiments (DoE) evaluated the impact of screw speed, temperature, and mixing sections on CC quality. Near-infrared (NIR) spectroscopy monitored CBZ-NIC co-crystal formation in real time during HME process. Results: DoE revealed temperature and number of mixing sections significantly influenced particle size (D50: 2.0-4.0 μm), while screw speed affected efficiency. NIR spectroscopy detected a unique CC absorption peak at 5008.3 cm⁻¹, enabling real-time structural monitoring with high accuracy (R² = 0.9999). Risk assessment highlighted material attributes, process parameters, and equipment design as critical factors affecting CC formation. All experimental batches yielded ≥ 94% pure CCs with no residual starting materials, demonstrating process reproducibility and robustness. Conclusions: Overall, this work successfully established a continuous hot-melt extrusion (HME) process for manufacturing CBZ-NIC co-crystals, offering critical insights into material, equipment, and process parameters while implementing robust in-line NIR monitoring for real-time quality control. Additionally, this work provides interpretable insights and serves as a basis for future machine learning (ML)-driven studies.

  • Research Article
  • Cite Count Icon 101
  • 10.1016/j.landusepol.2020.104537
Analyzing driving factors of land values in urban scale based on big data and non-linear machine learning techniques
  • Feb 27, 2020
  • Land Use Policy
  • Jun Ma + 4 more

Analyzing driving factors of land values in urban scale based on big data and non-linear machine learning techniques

  • Research Article
  • Cite Count Icon 8
  • 10.1080/03639045.2019.1585447
Spherical agglomeration to improve dissolution and micromeritic properties of an anticancer drug, Bicalutamide
  • Mar 7, 2019
  • Drug Development and Industrial Pharmacy
  • Hitesh Dalvadi + 2 more

Bicalutamide (BCT), an anticancer drug, suffers from dissolution rate limited bioavailability and poor micromeritic properties. Spherical crystallization involves the formation of spherical agglomerates with enhanced dissolution properties, obviating the need for further granulation process. The present investigation was focused on spherical agglomeration of BCT by quasi-emulsion solvent diffusion method. All the responses were subjected to principal component analysis to scrutinize the critical attributes. Further for optimization, X1; influence of phase ratio, X2; amount of PEG 6000 and X3; stirring speed on critical dependent variables was studied by employing the Box-Behnken experimental design. The agglomerates exhibited better flow properties, higher bulk density, and improved compressibility compared to pure powder drug. In-vitro release studies revealed enhancement of dissolution properties of poorly soluble BCT. Characterization studies carried out by differential scanning calorimeter and powder X-ray diffractometer revealed crystallinity of drug with decreased intensity in the formulation. Scanning electron microscopy showed spherical shape agglomerates of BCT. The residual solvents were largely below the permitted limits. Spherical agglomerates demonstrated enhanced dissolution properties on account of reduced particle size and partial conversion into amorphous form. Thus, spherical agglomerates of BCT seem to be a promising approach to ameliorate the dissolution properties which might thereby improve its bioavailability.

  • Research Article
  • Cite Count Icon 16
  • 10.1093/bjs/znad181
Prediction of postoperative complications after oesophagectomy using machine-learning methods.
  • Jun 21, 2023
  • British Journal of Surgery
  • Jin-On Jung + 7 more

Oesophagectomy is an operation with a high risk of postoperative complications. The aim of this single-centre retrospective study was to apply machine-learning methods to predict complications (Clavien-Dindo grade IIIa or higher) and specific adverse events. Patients with resectable adenocarcinoma or squamous cell carcinoma of the oesophagus and gastro-oesophageal junction who underwent Ivor Lewis oesophagectomy between 2016 and 2021 were included. The tested algorithms were logistic regression after recursive feature elimination, random forest, k-nearest neighbour, support vector machine, and neural network. The algorithms were also compared with a current risk score (the Cologne risk score). 457 patients had Clavien-Dindo grade IIIa or higher complications (52.9 per cent) versus 407 patients with Clavien-Dindo grade 0, I, or II complications (47.1 per cent). After 3-fold imputation and 3-fold cross-validation, the overall accuracies were: logistic regression after recursive feature elimination, 0.528; random forest, 0.535; k-nearest neighbour, 0.491; support vector machine, 0.511; neural network, 0.688; and Cologne risk score, 0.510. For medical complications, the results were: logistic regression after recursive feature elimination, 0.688; random forest, 0.664; k-nearest neighbour, 0.673; support vector machine, 0.681; neural network, 0.692; and Cologne risk score, 0.650. For surgical complications, the results were: logistic regression after recursive feature elimination, 0.621; random forest, 0.617; k-nearest neighbour, 0.620; support vector machine, 0.634; neural network, 0.667; and Cologne risk score, 0.624. The calculated area under the curve of the neural network was 0.672 for Clavien-Dindo grade IIIa or higher, 0.695 for medical complications, and 0.653 for surgical complications. The neural network scored the highest accuracies compared with all of the other models for the prediction of postoperative complications after oesophagectomy.

  • Research Article
  • Cite Count Icon 6
  • 10.3109/03639049509065889
The Effect of Mixing Variables on the Dissolution Properties of Direct Compression Formulations of Furosemide
  • Jan 1, 1995
  • Drug Development and Industrial Pharmacy
  • J G Van Der Watt + 1 more

The particles of a number of poorly water soluble drugs, for instance furosemide, tend to agglonierate spontaneously and as a result decrease the drug's dissolution properties. This phenomena is undesirable when the drug is to be formulated in a direct compressible formulation. Interactive or ordered mixing with a filler usually rectifies this problem but the drug load is limited to a maxirnuni of ± 5% of the mixture. This is well below the formulation requirements of hrosemide (25 %) and below the maximum drug load which can be handled in dircct compression formulations (± 35 %). The effect of two types of mixers, the mixing time and drug load were investigated for a direct compression formulation of furosemide tablets. A Turbula and a V mixer, both with a volume of 720 ml, were used. The drug was formulated with Ludipress (a commercial direct compression filler, BASF, Germany) at two drug loadings of 20 and 25 %. Magnesium stearate (1 %) was added as a lubricant. A mixture was prepared for each experimental condition. After mixing the whole mixture (120 gram) was tabletted on a Korsch single punch machine producing ± 500 tablets. The crushing strength, mass and disintegration time of ten tablets and the dissolution of six tablets were measured. Dissolutions were donc according to the USP XXII - method 21 - in 0, 1 M HCI and a phosphate buffer with pH = 5.8. The intrinsic dissolution rates of some of the mixtures were also deterniined in the two dissolution media. The dissolution properties of the formulations were compared with the properties of Lasix®, a commercially available furoseniide product. which is not manufactured by dircct compression. The dissolution rates of the formulations mixed in the Turbula mixer were significantly higher than those mixed in the V miser. The area under the dissolution curves increased as a function of niixing time for both mixers. The best dissolution results were obtained for formulations with a 20 % drug load and mixed for 120 minutes in the Turbula miser. The dissolution curves for these formulations compared well with the curves for the commercial tablets. Intrinsic dissolution rates were also a hnction of niising time, which indicates that the increase in dissolution properties is probably a result of the deagglomeration of the agglomerated furosemide particles. The Turbula mixer, which can develop more shear force, breaks the agglomerates quicker and to a larger extend than the V mixer. It can be concluded that the type of mixer, mixing time and drug load control the dissolution properties of direct compression formulations of poorly water soluble drugs in which the drug particles form agglomerates.

  • Research Article
  • Cite Count Icon 58
  • 10.1016/j.xphs.2017.09.004
Selection of Solid-State Plasticizers as Processing Aids for Hot-Melt Extrusion
  • Sep 17, 2017
  • Journal of Pharmaceutical Sciences
  • Dipen Desai + 6 more

Selection of Solid-State Plasticizers as Processing Aids for Hot-Melt Extrusion

  • Research Article
  • Cite Count Icon 20
  • 10.1186/s12885-021-08704-9
Identifying novel transcript biomarkers for hepatocellular carcinoma (HCC) using RNA-Seq datasets and machine learning
  • Aug 27, 2021
  • BMC Cancer
  • Rajinder Gupta + 2 more

BackgroundHepatocellular carcinoma (HCC) is one of the leading causes of cancer death in the world owing to limitations in its prognosis. The current prognosis approaches include radiological examination and detection of serum biomarkers, however, both have limited efficiency and are ineffective in early prognosis. Due to such limitations, we propose to use RNA-Seq data for evaluating putative higher accuracy biomarkers at the transcript level that could help in early prognosis.MethodsTo identify such potential transcript biomarkers, RNA-Seq data for healthy liver and various HCC cell models were subjected to five different machine learning algorithms: random forest, K-nearest neighbor, Naïve Bayes, support vector machine, and neural networks. Various metrics, namely sensitivity, specificity, MCC, informedness, and AUC-ROC (except for support vector machine) were evaluated. The algorithms that produced the highest values for all metrics were chosen to extract the top features that were subjected to recursive feature elimination. Through recursive feature elimination, the least number of features were obtained to differentiate between the healthy and HCC cell models.ResultsFrom the metrics used, it is demonstrated that the efficiency of the known protein biomarkers for HCC is comparatively lower than complete transcriptomics data. Among the different machine learning algorithms, random forest and support vector machine demonstrated the best performance. Using recursive feature elimination on top features of random forest and support vector machine three transcripts were selected that had an accuracy of 0.97 and kappa of 0.93. Of the three transcripts, two were protein coding (PARP2–202 and SPON2–203) and one was a non-coding transcript (CYREN-211). Lastly, we demonstrated that these three selected transcripts outperformed randomly taken three transcripts (15,000 combinations), hence were not chance findings, and could then be an interesting candidate for new HCC biomarker development.ConclusionUsing RNA-Seq data combined with machine learning approaches can aid in finding novel transcript biomarkers. The three biomarkers identified: PARP2–202, SPON2–203, and CYREN-211, presented the highest accuracy among all other transcripts in differentiating the healthy and HCC cell models. The machine learning pipeline developed in this study can be used for any RNA-Seq dataset to find novel transcript biomarkers.Code: www.github.com/rajinder4489/ML_biomarkers

  • Research Article
  • Cite Count Icon 3
  • 10.1128/spectrum.04689-22
A Machine Learning-Based Analytic Pipeline Applied to Clinical and Serum IgG Immunoproteome Data To Predict Chlamydia trachomatis Genital Tract Ascension and Incident Infection in Women
  • Jun 15, 2023
  • Microbiology Spectrum
  • Chuwen Liu + 7 more

ABSTRACTWe developed a reusable and open-source machine learning (ML) pipeline that can provide an analytical framework for rigorous biomarker discovery. We implemented the ML pipeline to determine the predictive potential of clinical and immunoproteome antibody data for outcomes associated with Chlamydia trachomatis (Ct) infection collected from 222 cis-gender females with high Ct exposure. We compared the predictive performance of 4 ML algorithms (naive Bayes, random forest, extreme gradient boosting with linear booster [xgbLinear], and k-nearest neighbors [KNN]), screened from 215 ML methods, in combination with two different feature selection strategies, Boruta and recursive feature elimination. Recursive feature elimination performed better than Boruta in this study. In prediction of Ct ascending infection, naive Bayes yielded a slightly higher median value of are under the receiver operating characteristic curve (AUROC) 0.57 (95% confidence interval [CI], 0.54 to 0.59) than other methods and provided biological interpretability. For prediction of incident infection among women uninfected at enrollment, KNN performed slightly better than other algorithms, with a median AUROC of 0.61 (95% CI, 0.49 to 0.70). In contrast, xgbLinear and random forest had higher predictive performances, with median AUROC of 0.63 (95% CI, 0.58 to 0.67) and 0.62 (95% CI, 0.58 to 0.64), respectively, for women infected at enrollment. Our findings suggest that clinical factors and serum anti-Ct protein IgGs are inadequate biomarkers for ascension or incident Ct infection. Nevertheless, our analysis highlights the utility of a pipeline that searches for biomarkers and evaluates prediction performance and interpretability.IMPORTANCE Biomarker discovery to aid early diagnosis and treatment using machine learning (ML) approaches is a rapidly developing area in host-microbe studies. However, lack of reproducibility and interpretability of ML-driven biomarker analysis hinders selection of robust biomarkers that can be applied in clinical practice. We thus developed a rigorous ML analytical framework and provide recommendations for enhancing reproducibility of biomarkers. We emphasize the importance of robustness in selection of ML methods, evaluation of performance, and interpretability of biomarkers. Our ML pipeline is reusable and open-source and can be used not only to identify host-pathogen interaction biomarkers but also in microbiome studies and ecological and environmental microbiology research.

  • Research Article
  • Cite Count Icon 24
  • 10.33480/jitk.v9i2.5015
A SYSTEMATIC LITERATURE REVIEW: RECURSIVE FEATURE ELIMINATION ALGORITHMS
  • Feb 1, 2024
  • JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer)
  • Arif Mudi Priyatno + 1 more

Recursive feature elimination (RFE) is a feature selection algorithm that works by gradually eliminating unimportant features. RFE has become a popular method for feature selection in various machine learning applications, such as classification and prediction. However, there is no systematic literature review (SLR) that discusses recursive feature elimination algorithms. This article conducts a SLR on RFE algorithms. The goal is to provide an overview of the current state of the RFE algorithm. This SLR uses IEEE Xplore, ScienceDirect, Springer, and Scopus (publish and publish) databases from 2018 to 2023. This SLR received 76 relevant papers with 49% standard RFEs, 43% strategy RFEs, and 8% modified RFEs. Research using RFE continues to increase every year, from 2018 to 2023. The feature selection method used simultaneously or for comparison is based on a filter approach, namely Pearson correlation, and an embedded approach, namely random forest. The most widely used machine learning algorithms are support vector machines and random forests, with 19.5% and 16.7%, respectively. Strategy RFE and modified RFE can be referred to as hybrid RFEs. Based on relevant papers, it is found that the RFE strategy is broadly divided into two categories: using RFE after other feature selection methods and using RFE simultaneously with other methods. Modification of the RFE is done by modifying the flow of the RFE. The modification process is divided into two categories: before the process of calculating the smallest weight criteria and after calculating the smallest weight criteria. Calculating the smallest weight criteria in this RFE modification is still a challenge at this time to obtain optimal results.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 23
  • 10.3390/s23073470
A Study on ML-Based Software Defect Detection for Security Traceability in Smart Healthcare Applications
  • Mar 26, 2023
  • Sensors
  • Samuel Mcmurray + 1 more

Software Defect Prediction (SDP) is an integral aspect of the Software Development Life-Cycle (SDLC). As the prevalence of software systems increases and becomes more integrated into our daily lives, so the complexity of these systems increases the risks of widespread defects. With reliance on these systems increasing, the ability to accurately identify a defective model using Machine Learning (ML) has been overlooked and less addressed. Thus, this article contributes an investigation of various ML techniques for SDP. An investigation, comparative analysis and recommendation of appropriate Feature Extraction (FE) techniques, Principal Component Analysis (PCA), Partial Least Squares Regression (PLS), Feature Selection (FS) techniques, Fisher score, Recursive Feature Elimination (RFE), and Elastic Net are presented. Validation of the following techniques, both separately and in combination with ML algorithms, is performed: Support Vector Machine (SVM), Logistic Regression (LR), Naïve Bayes (NB), K-Nearest Neighbour (KNN), Multilayer Perceptron (MLP), Decision Tree (DT), and ensemble learning methods Bootstrap Aggregation (Bagging), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), Random Forest(RF), and Generalized Stacking (Stacking). Extensive experimental setup was built and the results of the experiments revealed that FE and FS can both positively and negatively affect performance over the base model or Baseline. PLS, both separately and in combination with FS techniques, provides impressive, and the most consistent, improvements, while PCA, in combination with Elastic-Net, shows acceptable improvement.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon
Setting-up Chat
Loading Interface