Semi-Supervised Skin Lesion Segmentation Using Structured Prediction-Based Deep Reinforcement Learning
Semi-Supervised Skin Lesion Segmentation Using Structured Prediction-Based Deep Reinforcement Learning
- Research Article
18
- 10.1109/taslp.2020.3013392
- Jan 1, 2020
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
Traditional dialogue policy needs to be trained independently for each dialogue task. In this work, we aim to solve a collection of independent dialogue tasks using a unified dialogue agent. The unified policy is parallelly trained using the conversation data from all of the distributed dialogue tasks. However, there are two key challenges:(1) the design of a unified dialogue model to adapt to different dialogue tasks; (2) finding a robust reinforcement learning method to keep the efficiency and the stability of the training process. Here we propose a novel structured actor-critic approach to implement structured deep reinforcement learning (DRL), which not only can learn parallelly from data of different dialogue tasks but also achieves stable and sample-efficient learning. We demonstrate the effectiveness of the proposed approach on 18 tasks of PyDial benchmark. The results show that our method is able to achieve state-of-the-art performance.
- Research Article
34
- 10.1007/s10489-021-02218-4
- Feb 5, 2021
- Applied Intelligence
Deep learning-based financial approaches have received attention from both investors and researchers. This study demonstrates how to optimize portfolios, asset allocation, and trading systems based on deep reinforcement learning using three frameworks. In the proposed deep learning structure, the input data are first decomposed through wavelet transformation (WT) to remove noise from stock price time-series data. Then, only the mother wavelet (high-frequency) data are used as input. Second, reinforcement learning is performed using the high-frequency data. The reinforcement learning network employs long short-term memory (LSTM). Actions are determined by the LSTM network or randomly. Third, it learns the optimal investment trading system using the actions of a given transaction and appropriate rewards. The structure of the optimal investment trading system obtained by the proposed deep reinforcement learning structure improves trading performance without requiring the construction of a predictive model. To investigate the performance of the proposed structure, we applied the S&P500, DJI, and KOSPI200 indices to the proposed structure (HW_LSTM_RL) and other reinforcement learning structures for comparison. We evaluated the difference in Sharpe ratio for various test periods (one to three years) and for different rewards. Using the decomposed high-frequency data as input, a portfolio of investment transactions was improved for highly volatile markets. In deep reinforcement learning, we found that network composition and appropriate rewards have significant influence on learning transactions in financial time-series data. Thus, the proposed HW_LSTM_RL structure demonstrates the importance of input data composition, learning network settings, and rewards.
- Research Article
20
- 10.1109/tr.2022.3197322
- Sep 1, 2023
- IEEE Transactions on Reliability
Opportunistic maintenance (OM), which shows its superiority on complex multi-component systems by integrating the maintenance activities of multiple components to reduce the maintenance cost, has been widely studied over the past decade. To our knowledge, most of the existing OM works are developed based on fixed maintenance thresholds without fully utilizing the health state of the multi-component system. This article presents an OM optimization problem of multi-component systems with load sharing, solved by a modified proximal policy optimization approach based on deep reinforcement learning algorithm. The load sharing effect is reflected in the hazard rate function, which further changes the failure probability of the components. Meanwhile, the health states can be recovered by executing imperfect maintenance and corrective maintenance. The optimization problem is formulated as an infinite-horizon MDP with mixed discrete and continuous state and action space to maximize the total discounted reward, taking into account the system reliability and the maintenance cost. The difficulty caused by the mixed action space is solved by designing a parameterized action space structure and multi-task reinforcement learning framework. The effectiveness of the proposed algorithm is tested on a four-component system and a real-world scenario configured with the high-pressure feedwater heater system in the nuclear power plant. The results show that the performance of the algorithm is stable when facing large-scale problems. The algorithm proposed in this study also contributes to the imperfect maintenance optimization with state-of-the-art optimization techniques.
- Research Article
42
- 10.3390/diagnostics13193147
- Oct 7, 2023
- Diagnostics
Skin lesions are essential for the early detection and management of a number of dermatological disorders. Learning-based methods for skin lesion analysis have drawn much attention lately because of improvements in computer vision and machine learning techniques. A review of the most-recent methods for skin lesion classification, segmentation, and detection is presented in this survey paper. The significance of skin lesion analysis in healthcare and the difficulties of physical inspection are discussed in this survey paper. The review of state-of-the-art papers targeting skin lesion classification is then covered in depth with the goal of correctly identifying the type of skin lesion from dermoscopic, macroscopic, and other lesion image formats. The contribution and limitations of various techniques used in the selected study papers, including deep learning architectures and conventional machine learning methods, are examined. The survey then looks into study papers focused on skin lesion segmentation and detection techniques that aimed to identify the precise borders of skin lesions and classify them accordingly. These techniques make it easier to conduct subsequent analyses and allow for precise measurements and quantitative evaluations. The survey paper discusses well-known segmentation algorithms, including deep-learning-based, graph-based, and region-based ones. The difficulties, datasets, and evaluation metrics particular to skin lesion segmentation are also discussed. Throughout the survey, notable datasets, benchmark challenges, and evaluation metrics relevant to skin lesion analysis are highlighted, providing a comprehensive overview of the field. The paper concludes with a summary of the major trends, challenges, and potential future directions in skin lesion classification, segmentation, and detection, aiming to inspire further advancements in this critical domain of dermatological research.
- Research Article
68
- 10.1109/tvt.2022.3168870
- Jul 1, 2022
- IEEE Transactions on Vehicular Technology
As for fuel cell hybrid electric vehicle equipped with battery (BAT) and ultracapacitor (UC), its dynamic topology structure is complex and different characteristics of three power sources induce challenges in energy management for fuel economy, power sources lifespan, and dynamic performance of the vehicle. In this paper, an energy management strategy (EMS) based on a hierarchical power splitting structure and deep reinforcement learning (DRL) is proposed. In the higher layer strategy of the proposed EMS, the UC is employed to supply peak power and recover braking energy through the adaptive filter based on fuzzy control. Then, the integrated DRL and equivalent consumption minimization strategy framework is proposed to optimize the power allocation of fuel cell (FC) and BAT in the lower layer, to ensure the highly efficient operation of FC and reduce hydrogen consumption. And the action trimming based on heuristic technique is proposed to further restrain the adverse effect of sudden peak power on FC lifespan. The simulation results show the proposed EMS can make the output of FC smoother, improve its working efficiency to alleviate the stress of BAT, and increase by 14.8% compared with the Q-learning strategy in fuel economy under WLTP driving cycle. Meanwhile, the obtained results under UDDSHDV show fuel economy of the proposed EMS can reach dynamic programming (DP) benchmark level of 89.7 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> .
- Research Article
9
- 10.1155/2021/5562801
- Jan 1, 2021
- BioMed Research International
The segmentation of a skin lesion is regarded as very challenging because of the low contrast between the lesion and the surrounding skin, the existence of various artifacts, and different imaging acquisition conditions. The purpose of this study is to segment melanocytic skin lesions in dermoscopic and standard images by using a hybrid model combining a new hierarchical K-means and level set approach, called HK-LS. Although the level set method is usually sensitive to initial estimation, it is widely used in biomedical image segmentation because it can segment more complex images and does not require a large number of manually labelled images. The preprocessing step is used for the proposed model to be less sensitive to intensity inhomogeneity. The proposed method was evaluated on medical skin images from two publicly available datasets including the PH2 database and the Dermofit database. All skin lesions were segmented with high accuracies (>94%) and Dice coefficients (>0.91) of the ground truth on two databases. The quantitative experimental results reveal that the proposed method yielded significantly better results compared to other traditional level set models and has a certain advantage over the segmentation results of U-net in standard images. The proposed method had high clinical applicability for the segmentation of melanocytic skin lesions in dermoscopic and standard images.
- Research Article
1
- 10.3390/aerospace11090774
- Sep 20, 2024
- Aerospace
Reusable launch vehicles need to face complex and diverse environments during flight. The design of rocket recovery control law based on traditional deep reinforcement learning (DRL) makes it difficult to obtain a set of network architectures that can adapt to multiple scenarios and multi-parameter uncertainties, and the performance of deep reinforcement learning algorithm depends on manual trial and error of hyperparameters. To solve this problem, this paper proposes a self-learning control method for launch vehicle recovery based on neural architecture search (NAS), which decouples deep network structure search and reinforcement learning hyperparameter optimization. First, using network architecture search technology based on a multi-objective hybrid particle swarm optimization algorithm, the proximal policy optimization algorithm of deep network architecture is automatically designed, and the search space is lightweight design in the process. Secondly, in order to further improve the landing accuracy of the launch vehicle, the Bayesian optimization (BO) method is used to automatically optimize the hyperparameters of reinforcement learning, and the control law of the landing phase in the recovery process of the launch vehicle is obtained through training. Finally, the algorithm is transplanted to the rocket intelligent learning embedded platform for comparative testing to verify its online deployment capability. The simulation results show that the proposed method can satisfy the landing accuracy of the launch vehicle recovery mission, and the control effect is basically the same as the landing accuracy of the trained rocket model under the untrained condition of model parameter deviation and wind field interference, which verifies the generalization of the proposed method.
- Book Chapter
11
- 10.1007/978-3-030-27272-2_20
- Jan 1, 2019
Segmentation of skin lesions is a crucial task in detecting and diagnosing melanoma cancer. Incidence of melanoma skin cancer which is the most deadly form of skin cancer has been on steady increase. Early detection of the melanoma cancer is necessary to improve the survival rate of the patients. Segmentation is an important task in analysing skin lesion images. Skin lesion segmentation has come with some challenges such as low contrast and fine grained nature of skin lesions. This has necessitated the need for automated analysis and segmentation of skin lesions using state-of-the-arts techniques. In this paper, a deep learning model has been adapted for the segmentation of skin lesions. This work demonstrates the segmentation of skin lesions using fully convolutional networks (FCNs) that train skin lesion images from end-to-end using only the images pixels and disease ground truth labels as inputs. The fully convolutional network adapted is based on U-Net architecture. The model is enhanced by employing multi-stage segmentation approach with batch normalisation and data augmentation. Performance metrics such as dice coefficient, accuracy, sensitivity and specificity were used for evaluating the performance of the model. Experimental results show that the proposed model achieved better performance compared with the other state-of-the arts methods for skin lesion image segmentation with a dice coefficient of \(90\%\) and sensitivity of \(96\%\).
- Research Article
39
- 10.1016/j.eswa.2016.02.044
- Mar 2, 2016
- Expert Systems with Applications
Segmentation of melanocytic skin lesions using feature learning and dictionaries
- Research Article
- 10.4028/www.scientific.net/amr.139-141.1763
- Oct 1, 2010
- Advanced Materials Research
A self-constructing fuzzy neural network (SCFNN) based on reinforcement learning is proposed in this study. In the SCFNN, structure and parameter learning are implemented simultaneously. Structure learning is based on uniform division of the input space and distribution of membership function. The structure and membership parameters are organized as real value chromosomes, and the chromosomes are trained by the reinforcement learning based on genetic algorithm. This paper uses Matlab/Simulink to establish simulation platform and several simulations are provided to demonstrate the effectiveness of the proposed SCFNN control stratagem with the implementation of AC motor speed drive. The simulation results show that the AC drive system with SCFNN has good anti-disturbance performance while the load change randomly.
- Conference Article
- 10.1109/iciea.2011.5975717
- Jun 1, 2011
A self-constructing fuzzy neural network (SCFNN) based on reinforcement learning is proposed in this study. In the SCFNN, structure and parameter learning are implemented simultaneously. Structure learning is based on uniform division of the input space and distribution of membership function. The parameters are trained by the reinforcement learning based on genetic algorithm. Several simulations are provided to demonstrate the effectiveness of the proposed SCFNN control stratagem with the implementation of AC motor speed drive. The simulation results show that the AC drive system with SCFNN has good anti-disturbance performance while the load change randomly.
- Research Article
- 10.3389/conf.neuro.06.2009.03.238
- Jan 1, 2009
- Frontiers in Systems Neuroscience
Event Abstract Back to Event Structure learning in human sequential decision-making Humans daily perform sequential decision-making under uncertainty to choose products, services, careers, and jobs. Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that knows the task that generates reward in the environment. This has led to conclusions about how we explore new courses of actions and exploit what we have learned. We argue, however, that humans have uncertainty about both the task and environmental structure, and that task and structure learning can potentially explain much better how people schedule actions, including behaviors previously deemed sub-optimal. We illustrate the task structure learning problem with an important special case that controls optimal exploration/exploitation. In particular, we formulate the structure learning problem using mixtures of two reward models'two-arm and one-arm bandit models'and solve the optimal action selection using Bayesian Reinforcement Learning. These two reward models represent extremes in both the exploration'exploitation tradeoff and computational difficulty'one model needs to balance exploration'exploitation and use long future horizons to compute actions, while the other needs no look-ahead and the action selection is greedy. In simulations, we show that optimal learning with uncertainty about the task structure can produce a range of qualitative behaviors deemed suboptimal in previous studies on sequential binary choice. In our experiments, each of 16 subjects (8 females) ran on 32 bandit tasks, a block of 16 in a two-arm bandits and a block of 16 one-arm bandits. Within blocks, the presentation order was randomized, and the order of the one-arm bandits was randomized across subjects. On average, each task required 48 choices. For two-arm bandits, the subjects made 1194 choices across the 16 tasks, and 925 for the one-arm bandits. Our results show that humans rapidly learn and exploit new reward structure'human behavior tracks the behavior of our structure learning model but is not explained by models that assume the task is known. Other kinds of reward structure learning may account for a broad variety of human decision-making performance. In particular, allowing dependence between the probability of reward at a site and previous actions can produce large changes in decision-making behavior. For instance, in a "foraging" model where reward is collected from a site and probabilistically replenished, optimal strategies will produce choice sequences that alternate between reward sites. Thus, uncertainty about the independence of reward on previous actions can produce a continuum of behavior, from maximization to probability matching. Instead of explaining behavior in terms of the idiosyncrasies of a learning rule, structure learning constitutes a fully rational response to uncertainty about the causal structure of rewards in the environment. Our hope is that, by expanding the range of normative hypotheses for human decision-making, we can begin to develop more principled accounts of human sequential decision-making behavior. Conference: Computational and systems neuroscience 2009, Salt Lake City, UT, United States, 26 Feb - 3 Mar, 2009. Presentation Type: Poster Presentation Topic: Poster Presentations Citation: (2009). Structure learning in human sequential decision-making. Front. Syst. Neurosci. Conference Abstract: Computational and systems neuroscience 2009. doi: 10.3389/conf.neuro.06.2009.03.238 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 03 Feb 2009; Published Online: 03 Feb 2009. Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Google Google Scholar PubMed Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.
- Research Article
60
- 10.1126/sciadv.abk2607
- May 6, 2022
- Science Advances
Artificial intelligence (AI) and reinforcement learning (RL) have improved many areas but are not yet widely adopted in economic policy design, mechanism design, or economics at large. The AI Economist is a two-level, deep RL framework for policy design in which agents and a social planner coadapt. In particular, the AI Economist uses structured curriculum learning to stabilize the challenging two-level, coadaptive learning problem. We validate this framework in the domain of taxation. In one-step economies, the AI Economist recovers the optimal tax policy of economic theory. In spatiotemporal economies, the AI Economist substantially improves both utilitarian social welfare and the trade-off between equality and productivity over baselines. It does so despite emergent tax-gaming strategies while accounting for emergent labor specialization, agent interactions, and behavioral change. These results demonstrate that two-level, deep RL complements economic theory and unlocks an AI-based approach to designing and understanding economic policy.
- Research Article
3
- 10.1080/13682199.2023.2187518
- Mar 28, 2023
- The Imaging Science Journal
Skin cancer is the irregular growth of skin cells, which is most often termed as cancer, developed by exposure of ultraviolet rays from sun. In this research paper, deep learning enabled hybrid optimization is followed for skin cancer detection and lesion segmentation. Two optimization algorithms are followed for skin lesion segmentation and cancer detection. Here, pre-processing is done by anisotropic diffusion followed by skin lesion segmentation. Here, Multi-Scale Residual Fusion Network (MSRFNet) is utilized for skin lesion segmentation, which is trained by proposed Average Subtraction Student Psychology Based Optimization (ASSPBO). After skin lesion segmentation, necessary features are extracted, followed by skin cancer detection. Skin cancer is detected by Deep Residual Network (DRN) trained by proposed Fractional ASSPBO (FrASSPBO). Moreover, performance of proposed FrASSPBO-DRN is analysed by three performance metrics like testing accuracy, True Positive Rate (TPR), and False Positive Rate (FPR) with values of 93.4%, 94%, and 8.2%.
- Research Article
93
- 10.1109/access.2020.2970433
- Jan 1, 2020
- IEEE Access
Autonomous underwater vehicle (AUV) plays an increasingly important role in ocean exploration. Existing AUVs are usually not fully autonomous and generally limited to pre-planning or pre-programming tasks. Reinforcement learning (RL) and deep reinforcement learning have been introduced into the AUV design and research to improve its autonomy. However, these methods are still difficult to apply directly to the actual AUV system because of the sparse rewards and low learning efficiency. In this paper, we proposed a deep interactive reinforcement learning method for path following of AUV by combining the advantages of deep reinforcement learning and interactive RL. In addition, since the human trainer cannot provide human rewards for AUV when it is running in the ocean and AUV needs to adapt to a changing environment, we further propose a deep reinforcement learning method that learns from both human rewards and environmental rewards at the same time. We test our methods in two path following tasks—straight line and sinusoids curve following of AUV by simulating in the Gazebo platform. Our experimental results show that with our proposed deep interactive RL method, AUV can converge faster than a DQN learner from only environmental reward. Moreover, AUV learning with our deep RL from both human and environmental rewards can also achieve a similar or even better performance than that with deep interactive RL and can adapt to the actual environment by further learning from environmental rewards.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.