Stock returns forecasting via a novel machine learning method
Stock returns forecasting via a novel machine learning method
- Research Article
18
- 10.3390/cancers13143611
- Jul 19, 2021
- Cancers
Simple SummaryRadiogenomics enables prediction of the status and prognosis of patients using non-invasively obtained imaging data. Current machine learning (ML) methods used in radiogenomics require huge datasets, which involve the handling of large heterogeneous datasets from multiple cohorts/hospitals. In this study, two different glioma datasets were used to test various ML and image pre-processing methods to confirm whether the models trained on one dataset are universally applicable to other datasets. Our result suggested that the ML method that yielded the highest accuracy in a single dataset was likely to be overfitted. We demonstrated that implementation of standardization and dimension reduction procedures prior to classification, enabled the development of ML methods that are less affected by the multiple cohort difference. We advocate using caution in interpreting the results of radiogenomic studies of the training and testing datasets that are small or mixed, with a view to implementing practical ML methods in radiogenomics.Radiogenomics use non-invasively obtained imaging data, such as magnetic resonance imaging (MRI), to predict critical biomarkers of patients. Developing an accurate machine learning (ML) technique for MRI requires data from hundreds of patients, which cannot be gathered from any single local hospital. Hence, a model universally applicable to multiple cohorts/hospitals is required. We applied various ML and image pre-processing procedures on a glioma dataset from The Cancer Image Archive (TCIA, n = 159). The models that showed a high level of accuracy in predicting glioblastoma or WHO Grade II and III glioma using the TCIA dataset, were then tested for the data from the National Cancer Center Hospital, Japan (NCC, n = 166) whether they could maintain similar levels of high accuracy. Results: we confirmed that our ML procedure achieved a level of accuracy (AUROC = 0.904) comparable to that shown previously by the deep-learning methods using TCIA. However, when we directly applied the model to the NCC dataset, its AUROC dropped to 0.383. Introduction of standardization and dimension reduction procedures before classification without re-training improved the prediction accuracy obtained using NCC (0.804) without a loss in prediction accuracy for the TCIA dataset. Furthermore, we confirmed the same tendency in a model for IDH1/2 mutation prediction with standardization and application of dimension reduction that was also applicable to multiple hospitals. Our results demonstrated that overfitting may occur when an ML method providing the highest accuracy in a small training dataset is used for different heterogeneous data sets, and suggested a promising process for developing an ML method applicable to multiple cohorts.
- Research Article
155
- 10.1016/j.engappai.2023.105961
- Feb 14, 2023
- Engineering Applications of Artificial Intelligence
Applications of machine learning in friction stir welding: Prediction of joint properties, real-time control and tool failure diagnosis
- Research Article
18
- 10.1016/j.conbuildmat.2022.129116
- Nov 1, 2022
- Construction and Building Materials
Inference of mechanical properties and structural grades of bamboo by machine learning methods
- Research Article
17
- 10.1016/j.petsci.2022.09.002
- Feb 1, 2023
- Petroleum Science
Reservoir identification and production prediction are two of the most important tasks in petroleum exploration and development. Machine learning (ML) methods are used for petroleum-related studies, but have not been applied to reservoir identification and production prediction based on reservoir identification. Production forecasting studies are typically based on overall reservoir thickness and lack accuracy when reservoirs contain a water or dry layer without oil production. In this paper, a systematic ML method was developed using classification models for reservoir identification, and regression models for production prediction. The production models are based on the reservoir identification results. To realize the reservoir identification, seven optimized ML methods were used: four typical single ML methods and three ensemble ML methods. These methods classify the reservoir into five types of layers: water, dry and three levels of oil (Ⅰ oil layer, Ⅱ oil layer, Ⅲ oil layer). The validation and test results of these seven optimized ML methods suggest the three ensemble methods perform better than the four single ML methods in reservoir identification. The XGBoost produced the model with the highest accuracy; up to 99%. The effective thickness of Ⅰ and Ⅱ oil layers determined during the reservoir identification was fed into the models for predicting production. Effective thickness considers the distribution of the water and the oil resulting in a more reasonable production prediction compared to predictions based on the overall reservoir thickness. To validate the superiority of the ML methods, reference models using overall reservoir thickness were built for comparison. The models based on effective thickness outperformed the reference models in every evaluation metric. The prediction accuracy of the ML models using effective thickness were 10% higher than that of reference model. Without the personal error or data distortion existing in traditional methods, this novel system realizes rapid analysis of data while reducing the time required to resolve reservoir classification and production prediction challenges. The ML models using the effective thickness obtained from reservoir identification were more accurate when predicting oil production compared to previous studies which use overall reservoir thickness.
- Research Article
68
- 10.1007/s12039-021-01995-2
- Dec 21, 2021
- Journal of chemical sciences (Bangalore, India)
Research in molecular sciences witnessed the rise and fall of Artificial Intelligence (AI)/ Machine Learning (ML) methods, especially artificial neural networks, few decades ago. However, we see a major resurgence in the use of modern ML methods in scientific research during the last few years. These methods have had phenomenal success in the areas of computer vision, speech recognition, natural language processing (NLP), etc. This has inspired chemists and biologists to apply these algorithms to problems in natural sciences. Availability of high performance Graphics Processing Unit (GPU) accelerators, large datasets, new algorithms, and libraries has enabled this surge. ML algorithms have successfully been applied to various domains in molecular sciences by providing much faster and sometimes more accurate solutions compared to traditional methods like Quantum Mechanical (QM) calculations, Density Functional Theory (DFT) or Molecular Mechanics (MM) based methods, etc. Some of the areas where the potential of ML methods are shown to be effective are in drug design, prediction of high–level quantum mechanical energies, molecular design, molecular dynamics materials, and retrosynthesis of organic compounds, etc. This article intends to conceptually introduce various modern ML methods and their relevance and applications in computational natural sciences.Graphical abstract Synopsis Recent surge in the application of machine learning (ML) methods in fundamental sciences has led to a perspective that these methods may become important tools in chemical science. This perspective provides an overview of the modern ML methods and their successful applications in chemistry during the last few years.
- Research Article
19
- 10.1007/s10916-019-1418-y
- Jul 19, 2019
- Journal of Medical Systems
Traditional methods have long been used for clinical demand forecasting. Machine learning methods represent the next evolution in forecasting, but model choice and optimization remain challenging for achieving optimal results. To determine the best method to predict demand for outpatient appointments comparing machine learning and traditional methods, this retrospective study analyzed "appointment requests" at a major outpatient department in a destination medical center. Two separate locations (A and B) were assessed with 20 traditional, hybrid (traditional + machine learning) and machine learning methods to determine the best forecasting outcome (lowest Forecast Standard Error, FSE). Data characteristics from both datasets were examined. 20 forecasting models were then assessed and compared for the best result. Location A's data displayed a cyclical and non-trending pattern while Location B's displayed a cyclical and trending pattern. Both Location A and B yielded the feature engineered XGBoost model (machine learning) with the lowest out-of-sample FSE. It is important to carefully analyze and understand the underlying data set pattern and then test a variety of traditional, machine learning, and hybrid prediction methods to achieve optimal predictive results. Additionally, the use of feature engineering or hybrid methods can augment the usefulness of machine learning methods.
- Research Article
45
- 10.1016/j.clon.2021.11.014
- Dec 3, 2021
- Clinical Oncology
Overall Survival Prognostic Modelling of Non-small Cell Lung Cancer Patients Using Positron Emission Tomography/Computed Tomography Harmonised Radiomics Features: The Quest for the Optimal Machine Learning Algorithm
- Research Article
1
- 10.1016/j.psj.2024.104489
- Nov 1, 2024
- Poultry Science
An investigation of machine learning methods applied to genomic prediction in yellow-feathered broilers
- Research Article
525
- 10.1139/er-2020-0019
- Jul 28, 2020
- Environmental Reviews
Artificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then, the field has rapidly progressed congruently with the wide adoption of machine learning (ML) methods in the environmental sciences. Here, we present a scoping review of ML applications in wildfire science and management. Our overall objective is to improve awareness of ML methods among wildfire researchers and managers, as well as illustrate the diverse and challenging range of problems in wildfire science available to ML data scientists. To that end, we first present an overview of popular ML approaches used in wildfire science to date and then review the use of ML in wildfire science as broadly categorized into six problem domains, including (i) fuels characterization, fire detection, and mapping; (ii) fire weather and climate change; (iii) fire occurrence, susceptibility, and risk; (iv) fire behavior prediction; (v) fire effects; and (vi) fire management. Furthermore, we discuss the advantages and limitations of various ML approaches relating to data size, computational requirements, generalizability, and interpretability, as well as identify opportunities for future advances in the science and management of wildfires within a data science context. In total, to the end of 2019, we identified 300 relevant publications in which the most frequently used ML methods across problem domains included random forests, MaxEnt, artificial neural networks, decision trees, support vector machines, and genetic algorithms. As such, there exists opportunities to apply more current ML methods — including deep learning and agent-based learning — in the wildfire sciences, especially in instances involving very large multivariate datasets. We must recognize, however, that despite the ability of ML models to learn on their own, expertise in wildfire science is necessary to ensure realistic modelling of fire processes across multiple scales, while the complexity of some ML methods such as deep learning requires a dedicated and sophisticated knowledge of their application. Finally, we stress that the wildfire research and management communities play an active role in providing relevant, high-quality, and freely available wildfire data for use by practitioners of ML methods.
- Research Article
3
- 10.1259/bjr.20220373
- Mar 6, 2023
- The British Journal of Radiology
A dose deposition matrix (DDM) prediction method using several voxel features and a machine learning (ML) approach is proposed for plan optimization in radiation therapy. Head and lung cases with the inhomogeneous medium are used as training and testing data. The prediction model is a cascade forward backprop neural network where the input is the features of the voxel, including 1) voxel to body surface distance along the beamlet axis, 2) voxel to beamlet axis distance, 3) voxel density, 4) heterogeneity corrected voxel to body surface distance, 5) heterogeneity corrected voxel to beamlet axis, and (6) the dose of voxel obtained from the pencil beam (PB) algorithm. The output is the predicted voxel dose corresponding to a beamlet. The predicted DDM was used for plan optimization (ML method) and compared with the dose of MC-based plan optimization (MC method) and the dose of pencil beam-based plan optimization (PB method). The mean absolute error (MAE) value was calculated for full volume relative to the dose of the MC method to evaluate the overall dose performance of the final plan. For patient with head tumor, the ML method achieves MAE value 0.49 × 10-4 and PB has MAE 1.86 × 10-4. For patient with lung tumor, the ML method has MAE 1.42 × 10-4 and PB has MAE 3.72 × 10-4. The maximum percentage difference in PTV dose coverage (D98) between ML and MC methods is no more than 1.2% for patient with head tumor, while the difference is larger than 10% using the PB method. For patient with lung tumor, the maximum percentage difference in PTV dose coverage (D98) between ML and MC methods is no more than 2.1%, while the difference is larger than 16% using the PB method. In this work, a reliable DDM prediction method is established for plan optimization by applying several voxel features and the ML approach. The results show that the ML method based on voxel features can obtain plans comparable to the MC method and is better than the PB method in achieving accurate dose to the patient, which is helpful for rapid plan optimization and accurate dose calculation. Establishment of a new machine learning method based on the relationship between the voxel and beamlet features for dose deposition matrix prediction in radiation therapy.
- Research Article
8
- 10.1016/j.imu.2022.100861
- Jan 1, 2022
- Informatics in Medicine Unlocked
The prediction power of machine learning on estimating the sepsis mortality in the intensive care unit
- Research Article
146
- 10.3390/rs11212575
- Nov 2, 2019
- Remote Sensing
Landslides represent a severe hazard in many areas of the world. Accurate landslide maps are needed to document the occurrence and extent of landslides and to investigate their distribution, types, and the pattern of slope failures. Landslide maps are also crucial for determining landslide susceptibility and risk. Satellite data have been widely used for such investigations—next to data from airborne or unmanned aerial vehicle (UAV)-borne campaigns and Digital Elevation Models (DEMs). We have developed a methodology that incorporates object-based image analysis (OBIA) with three machine learning (ML) methods, namely, the multilayer perceptron neural network (MLP-NN) and random forest (RF), for landslide detection. We identified the optimal scale parameters (SP) and used them for multi-scale segmentation and further analysis. We evaluated the resulting objects using the object pureness index (OPI), object matching index (OMI), and object fitness index (OFI) measures. We then applied two different methods to optimize the landslide detection task: (a) an ensemble method of stacking that combines the different ML methods for improving the performance, and (b) Dempster–Shafer theory (DST), to combine the multi-scale segmentation and classification results. Through the combination of three ML methods and the multi-scale approach, the framework enhanced landslide detection when it was tested for detecting earthquake-triggered landslides in Rasuwa district, Nepal. PlanetScope optical satellite images and a DEM were used, along with the derived landslide conditioning factors. Different accuracy assessment measures were used to compare the results against a field-based landslide inventory. All ML methods yielded the highest overall accuracies ranging from 83.3% to 87.2% when using objects with the optimal SP compared to other SPs. However, applying DST to combine the multi-scale results of each ML method significantly increased the overall accuracies to almost 90%. Overall, the integration of OBIA with ML methods resulted in appropriate landslide detections, but using the optimal SP and ML method is crucial for success.
- Research Article
27
- 10.3390/ijms21030713
- Jan 22, 2020
- International Journal of Molecular Sciences
(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41–235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61–0.88 range to 0.70–0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.
- Research Article
23
- 10.3390/bioengineering10010025
- Dec 24, 2022
- Bioengineering
The eye is generally considered to be the most important sensory organ of humans. Diseases and other degenerative conditions of the eye are therefore of great concern as they affect the function of this vital organ. With proper early diagnosis by experts and with optimal use of medicines and surgical techniques, these diseases or conditions can in many cases be either cured or greatly mitigated. Experts that perform the diagnosis are in high demand and their services are expensive, hence the appropriate identification of the cause of vision problems is either postponed or not done at all such that corrective measures are either not done or done too late. An efficient model to predict eye diseases using machine learning (ML) and ranker-based feature selection (r-FS) methods is therefore proposed which will aid in obtaining a correct diagnosis. The aim of this model is to automatically predict one or more of five common eye diseases namely, Cataracts (CT), Acute Angle-Closure Glaucoma (AACG), Primary Congenital Glaucoma (PCG), Exophthalmos or Bulging Eyes (BE) and Ocular Hypertension (OH). We have used efficient data collection methods, data annotations by professional ophthalmologists, applied five different feature selection methods, two types of data splitting techniques (train-test and stratified k-fold cross validation), and applied nine ML methods for the overall prediction approach. While applying ML methods, we have chosen suitable classic ML methods, such as Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), AdaBoost (AB), Logistic Regression (LR), k-Nearest Neighbour (k-NN), Bagging (Bg), Boosting (BS) and Support Vector Machine (SVM). We have performed a symptomatic analysis of the prominent symptoms of each of the five eye diseases. The results of the analysis and comparison between methods are shown separately. While comparing the methods, we have adopted traditional performance indices, such as accuracy, precision, sensitivity, F1-Score, etc. Finally, SVM outperformed other models obtaining the highest accuracy of 99.11% for 10-fold cross-validation and LR obtained 98.58% for the split ratio of 80:20.
- Research Article
15
- 10.3390/rs13142848
- Jul 20, 2021
- Remote Sensing
Obtaining large-scale, long-term, and spatial continuous soil moisture (SM) data is crucial for climate change, hydrology, and water resource management, etc. ESA CCI SM is such a large-scale and long-term SM (longer than 40 years until now). However, there exist data gaps, especially for the area of China, due to the limitations in remote sensing of SM such as complex topography, human-induced radio frequency interference (RFI), and vegetation disturbances, etc. The data gaps make the CCI SM data cannot achieve spatial continuity, which entails the study of gap-filling methods. In order to develop suitable methods to fill the gaps of CCI SM in the whole area of China, we compared typical Machine Learning (ML) methods, including Random Forest method (RF), Feedforward Neural Network method (FNN), and Generalized Linear Model (GLM) with a geostatistical method, i.e., Ordinary Kriging (OK) in this study. More than 30 years of passive–active combined CCI SM from 1982 to 2018 and other biophysical variables such as Normalized Difference Vegetation Index (NDVI), precipitation, air temperature, Digital Elevation Model (DEM), soil type, and in situ SM from International Soil Moisture Network (ISMN) were utilized in this study. Results indicated that: (1) the data gap of CCI SM is frequent in China, which is found not only in cold seasons and areas but also in warm seasons and areas. The ratio of gap pixel numbers to the whole pixel numbers can be greater than 80%, and its average is around 40%. (2) ML methods can fill the gaps of CCI SM all up. Among the ML methods, RF had the best performance in fitting the relationship between CCI SM and biophysical variables. (3) Over simulated gap areas, RF had a comparable performance with OK, and they outperformed the FNN and GLM methods greatly. (4) Over in situ SM networks, RF achieved better performance than the OK method. (5) We also explored various strategies for gap-filling CCI SM. Results demonstrated that the strategy of constructing a monthly model with one RF for simulating monthly average SM and another RF for simulating monthly SM disturbance achieved the best performance. Such strategy combining with the ML method such as the RF is suggested in this study for filling the gaps of CCI SM in China.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.