Cross-Fusion of Band-Specific Spectral Features For Multi-Band NIR Colorization
Near-infrared (NIR) colorization holds the potential to enrich human interpretation and seamlessly integrating with existing visual frameworks, especially in low-light environments. Multi-band NIR colorization has been explored to efficiently uncover correlations between individual NIR bands and RGB, leveraging abundant spectral information than single-band NIR. However, the previous methods simply concatenated the multi-band NIR images, and in this paper, we study a deep learning network to effectively exploit band-specific spectral features for colorization. We propose a network with private and cross-fusion architectures that extracts and reconstructs the private and cross features separately. Private and cross-fusion modules are utilized to preserve the band-specific characteristics of the private features while updating any cross features lost during private convolutions. Experimental results show that the proposed network excels at producing the overall colors of objects and high-frequency details.
- Research Article
25
- 10.1016/j.cj.2021.12.011
- Feb 3, 2022
- The Crop Journal
Stacked spectral feature space patch: An advanced spectral representation for precise crop classification based on convolutional neural network
- Conference Article
93
- 10.1109/iciecs.2009.5362730
- Dec 1, 2009
In this paper, we propose a speech emotion recognition system using both spectral and prosodic features. Most traditional systems have focused on spectral features or prosodic features. Since both the spectral and the prosodic features contain emotion information, it is believed that the combining of spectral features and prosodic features will improve the performance of the emotion recognition system. Therefore, we propose to use both spectral and prosodic features. For spectral features, a GMM super vector based SVM is applied with them. For prosodic features, a set of prosodic features that are clearly correlated with speech emotional states and SVM is also used for emotion recognition. The combination of both spectral features and prosodic features is posed as a data fusion problem to obtain the final decision. Experimental results show that the combining of both spectral features and prosodic features yields the emotion error reduction rate of 18.0% and 52.8%, over using only spectral and prosodic features.
- Research Article
3
- 10.1186/s13636-024-00386-y
- Dec 21, 2024
- EURASIP Journal on Audio, Speech, and Music Processing
In this study, we investigate the effectiveness of spatial features in acoustic scene classification using distributed microphone arrays. Under the assumption that multiple subarrays, each equipped with microphones, are synchronized, we investigate two types of spatial feature: intra- and inter-generalized cross-correlation phase transforms (GCC-PHATs). These are derived from channels within the same subarray and between different subarrays, respectively. Our approach treats the log-Mel spectrogram as a spectral feature and intra- and/or inter-GCC-PHAT as a spatial feature. We propose two integration methods for spectral and spatial features: (a) middle integration, which fuses embeddings obtained by spectral and spatial features, and (b) late integration, which fuses decisions estimated using spectral and spatial features. The evaluation experiments showed that, when using only spectral features, employing all channels did not markedly improve the F1-score compared with the single-channel case. In contrast, integrating both spectral and spatial features improved the F1-score compared with using only spectral features. Additionally, we confirmed that the F1-score for late integration was slightly higher than that for middle integration.
- Conference Article
3
- 10.1109/agro-geoinformatics.2015.7248093
- Jul 1, 2015
In the past decades, the area of plastic-mulched cultivation has grown rapidly both on the global and regional scope, due to its remarkable efficiency of increasing crop production. However, the rapid expansion has brought a series of ecological and environmental problems in many regions, such as the vast amount of residues from the plastic-mulched farmland bring about aesthetic pollution and adverse environmental impacts on water, air and soil of the agricultural ecosystem and the terrestrial ecosystem. It is a serious problem that will do great damage to the sustainability of agriculture. In order to alleviate problems and to find solutions, the development of efficient methods to map plastic-mulched farmland as accurately as possible is of great significance to the policy-makers and scientists. In this paper, the optical remotely sensed data, Landsat-8 OLI, have been used in monitoring the plastic-mulched farmland in a study area located in Jizhou city, Hebei Province, China, where the plastic mulching is extensively practiced in farming and interlaced with uncovered farmland. The spectral and texture features of different ground objects have been statistically analyzed on the OLI imagery in the light of the training samples. Then the spectral and texture features that carry a certain degree of explanation for plastic-mulched farmland were selected for classification. The training and testing samples (bigger than 4 pixels × 4 pixels of panchromatic fused OLI imagery) were collected from Google Earth images, and were amended by comparing to false color composite Landsat bands 7 (SWIR), 5 (NIR) and 4 (RED). At the same time, the samples were purified according the J-M distance method. The Support Vector Machines (SVM) has been used as classifier to extract the plastic-mulched farmland based on the spectral features alone, on the texture features alone and on the combined spectral and texture features respectively. The classification result was validated using confusion matrix. The overall classification accuracy for plastic-mulched farmland based on spectra features alone, on texture features alone and on combined spectral and texture features are 89.45%, 94.68% and 94.73%, respectively. Their respective Kappa coefficients are 0.85, 0.92 and 0.92. And the producer's accuracy and user's accuracy of plastic-mulched farmland are 87.30% and 63.96% on spectral features, 83.03%, and 81.37% on texture features, 83.39% and 81.37% on the combined features, respectively. The commission and omission errors for plastic-mulched farmland are 36.04% and 12.70% on spectral features, 18.63% and 16.97% on texture features, 18.63% and 16.61% on the combined features. This study shows that the plastic-mulched farmland can be extracted effectively using SVM classifier based on the Landsat-8 imagery.
- Research Article
7
- 10.1587/transinf.e93.d.2813
- Jan 1, 2010
- IEICE Transactions on Information and Systems
In this paper, we present a hybrid speech emotion recognition system exploiting both spectral and prosodic features in speech. For capturing the emotional information in the spectral domain, we propose a new spectral feature extraction method by applying a novel non-uniform subband processing, instead of the mel-frequency subbands used in Mel-Frequency Cepstral Coefficients (MFCC). For prosodic features, a set of features that are closely correlated with speech emotional states are selected. In the proposed hybrid emotion recognition system, due to the inherently different characteristics of these two kinds of features (e.g., data size), the newly extracted spectral features are modeled by Gaussian Mixture Model (GMM) and the selected prosodic features are modeled by Support Vector Machine (SVM). The final result of the proposed emotion recognition system is obtained by combining the results from these two subsystems. Experimental results show that (1) the proposed non-uniform spectral features are more effective than the traditional MFCC features for emotion recognition; (2) the proposed hybrid emotion recognition system using both spectral and prosodic features yields the relative recognition error reduction rate of 17.0% over the traditional recognition systems using only the spectral features, and 62.3% over those using only the prosodic features.
- Research Article
42
- 10.3389/fpls.2022.1004427
- Sep 21, 2022
- Frontiers in Plant Science
Infection caused by Fusarium head blight (FHB) has severely damaged the quality and yield of wheat in China and threatened the health of humans and livestock. Inaccurate disease detection increases the use cost of pesticide and pollutes farmland, highlighting the need for FHB detection in wheat fields. The combination of spectral and spatial information provided by image analysis facilitates the detection of infection-related damage in crops. In this study, an effective detection method for wheat FHB based on unmanned aerial vehicle (UAV) hyperspectral images was explored by fusing spectral features and image features. Spectral features mainly refer to band features, and image features mainly include texture and color features. Our aim was to explain all aspects of wheat infection through multi-class feature fusion and to find the best FHB detection method for field wheat combining current advanced algorithms. We first evaluated the quality of the two acquired UAV images and eliminated the excessively noisy bands in the images. Then, the spectral features, texture features, and color features in the images were extracted. The random forest (RF) algorithm was used to optimize features, and the importance value of the features determined whether the features were retained. Feature combinations included spectral features, spectral and texture features fusion, and the fusion of spectral, texture, and color features to combine support vector machine, RF, and back propagation neural network in constructing wheat FHB detection models. The results showed that the model based on the fusion of spectral, texture, and color features using the RF algorithm achieved the best performance, with a prediction accuracy of 85%. The method proposed in this study may provide an effective way of FHB detection in field wheat.
- Research Article
6
- 10.3390/agronomy14112542
- Oct 28, 2024
- Agronomy
Accurately identifying the distribution of vineyard cultivation is of great significance for the development of the grape industry and the optimization of planting structures. Traditional remote sensing techniques for vineyard identification primarily depend on machine learning algorithms based on spectral features. However, the spectral reflectance similarities between grapevines and other orchard vegetation lead to persistent misclassification and omission errors across various machine learning algorithms. As a perennial vine plant, grapes are cultivated using trellis systems, displaying regular row spacing and distinctive strip-like texture patterns in high-resolution satellite imagery. This study selected the main oasis area of Turpan City in Xinjiang, China, as the research area. First, this study extracted both spectral and texture features based on GF-6 satellite imagery, subsequently employing the Boruta algorithm to discern the relative significance of these remote sensing features. Then, this study constructed vineyard information extraction models by integrating spectral and texture features, using machine learning algorithms including Naive Bayes (NB), Support Vector Machines (SVMs), and Random Forests (RFs). The efficacy of various machine learning algorithms and remote sensing features in extracting vineyard information was subsequently evaluated and compared. The results indicate that three spectral features and five texture features under a 7 × 7 window have significant sensitivity to vineyard recognition. These spectral features include the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), and Normalized Difference Water Index (NDWI), while texture features include contrast statistics in the near-infrared band (B4_CO) and the variance statistic, contrast statistic, heterogeneity statistic, and correlation statistic derived from NDVI images (NDVI_VA, NDVI_CO, NDVI_DI, and NDVI_COR). The RF algorithm significantly outperforms both the NB and SVM models in extracting vineyard information, boasting an impressive accuracy of 93.89% and a Kappa coefficient of 0.89. This marks a 12.25% increase in accuracy and a 0.11 increment in the Kappa coefficient over the NB model, as well as an 8.02% enhancement in accuracy and a 0.06 rise in the Kappa coefficient compared to the SVM model. Moreover, the RF model, which amalgamates spectral and texture features, exhibits a notable 13.59% increase in accuracy versus the spectral-only model and a 14.92% improvement over the texture-only model. This underscores the efficacy of the RF model in harnessing the spectral and textural attributes of GF-6 imagery for the precise extraction of vineyard data, offering valuable theoretical and methodological insights for future vineyard identification and information retrieval efforts.
- Research Article
8
- 10.1109/jstars.2021.3115129
- Jan 1, 2021
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Abundant spectral features are the precious wealth of hyperspectral images (HSI). Nevertheless, well-designed spectral feature is still a challenge that affects the performance of the classifier, especially with insufficient number of training samples. To make up the poor discriminability of spectral feature, double-branch methods are proposed by fusing parallel spectral and spatial branches. However, this structure does nothing to improve the quality of spectral feature, which is regarded as the most valuable information for HSI information. In this article, we propose a siamese spectral attention network with channel consistency (SSACC) to focus on obtaining discriminative spectral features, thus improving the generalization ability of the classifier. Two kinds of HSI cubes with different patch sizes are generated as the input of SSACC. The two cubes are divided into top and bottom branches and then be fed into the siamese network to obtain the refined spectral features. Then, self-attention is conducted to interacting with each channel for the spectral features enhancement. Meanwhile, two attention maps are obtained to display the spectral structures of each branch. A channel consistency regularization is performed on the two attention maps by enforcing the two branches to possess similar spectral patterns when identifying the same centric pixel. Extensive experiments conducted on the three HSI datasets verify the superiority of the obtained spectral feature. Furthermore, the proposed method applying convolution only on the spectral domain outperforms the state-of-the-art double-branch methods which integrate the spectral and spatial features simultaneously.
- Research Article
37
- 10.3390/rs9030261
- Mar 12, 2017
- Remote Sensing
Due to the advances in hyperspectral sensor technology, hyperspectral images have gained great attention in precision agriculture. In practical applications, vegetation classification is usually required to be conducted first and then the vegetation of interest is discriminated from the others. This study proposes an integrated scheme (SpeSpaVS_ClassPair_ScatterMatrix) for vegetation classification by simultaneously exploiting image spectral and spatial information to improve vegetation classification accuracy. In the scheme, spectral features are selected by the proposed scatter-matrix-based feature selection method (ClassPair_ScatterMatrix). In this method, the scatter-matrix-based class separability measure is calculated for each pair of classes and then averaged as final selection criterion to alleviate the problem of mutual redundancy among the selected features, based on the conventional scatter-matrix-based class separability measure (AllClass_ScatterMatrix). The feature subset search is performed by the sequential floating forward search method. Considering the high spectral similarity among different green vegetation types, Gabor features are extracted from the top two principal components to provide complementary spatial features for spectral features. The spectral features and Gabor features are stacked into a feature vector and then the ClassPair_ScatterMatrix method is used on the formed vector to overcome the over-dimensionality problem and select discriminative features for vegetation classification. The final features are fed into support vector machine classifier for classification. To verify whether the ClassPair_ScatterMatrix method could well avoid selecting mutually redundant features, the mean square correlation coefficients were calculated for the ClassPair_ScatterMatrix method and AllClass_ScatterMatrix method. The experiments were conducted on a widely used agricultural hyperspectral image. The experimental results showed that (1) the The proposed ClassPair_ScatterMatrix method could better alleviate the problem of selecting mutually redundant features, compared to the AllClass_ScatterMatrix method; (2) compared with the representative mutual information-based feature selection methods, the scatter-matrix-based feature selection methods generally achieved higher classification accuracies, and the ClassPair_ScatterMatrix method especially, produced the highest classification accuracies with respect to both data sets (87.2% and 90.1%); and (3) the proposed integrated scheme produced higher classification accuracy, compared with the decision fusion of spectral and spatial features and the methods only involving spectral features or spatial features. The comparative experiments demonstrate the effectiveness of the proposed scheme.
- Conference Article
2
- 10.1117/12.262853
- Dec 18, 1996
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
Spectral and morphological features were used to detect temperature induced stress on tomato plants. Top projected canopy area (TPCA) and profile were selected as morphological features and the reflectance of plant under side canopy (USC) and its average gray level were chosen as spectral features. Temperature regimes (day/night, 18/6 hours) 24/21 degrees Celsius, 21/18 degrees Celsius, and 19.5/16.5 degrees Celsius were used. Both spectral and morphological features were capable of detecting temperature stresses. Reflectance and gray level of plant USC correlated with average environment temperatures. The stress was detected after one week from occurrence based on both morphological and spectral features. However, stress was detected more clearly based on spectral features.
- Research Article
5
- 10.5194/isprs-annals-v-1-2020-25-2020
- Aug 3, 2020
- ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Classification of urban materials using remote sensing data, in particular hyperspectral data, is common practice. Spectral libraries can be utilized to train a classifier since they provide spectral features about selected urban materials. However, urban materials can have similar spectral characteristic features due to high inter-class correlation which can lead to misclassification. Spectral libraries rarely provide imagery of their samples, which disables the possibility of classifying urban materials with additional textural information. Thus, this paper conducts material classification comparing the benefits of using close-range acquired spectral and textural features. The spectral features consist of either the original spectra, a PCA-based encoding or the compressed spectral representation of the original spectra retrieved using a deep autoencoder. The textural features are generated using a deep denoising convolutional autoencoder. The spectral and textural features are gathered from the recently published spectral library KLUM. Three classifiers are used, the two well-established Random Forest and Support Vector Machine classifiers in addition to a Histogram-based Gradient Boosting Classification Tree. The achieved overall accuracy was within the range of 70–80% with a standard deviation between 2–10% across all classification approaches. This indicates that the amount of samples still is insufficient for some of the material classes for this classification task. Nonetheless, the classification results indicate that the spectral features are more important for assigning material labels than the textural features.
- Research Article
102
- 10.3390/rs8040353
- Apr 22, 2016
- Remote Sensing
In recent decades, plastic-mulched farmland has expanded rapidly in China as well as in the rest of the world because it results in marked increases of crop production. However, plastic-mulched farmland significantly influences the environment and has so far been inadequately investigated. Accurately monitoring and mapping plastic-mulched farmland is crucial for agricultural production, environmental protection, resource management, and so on. Monitoring plastic-mulched farmland using moderate-resolution remote sensing data is technically challenging because of spatial mixing and spectral confusion with other ground objects. This paper proposed a new scheme that combines spectral and textural features for monitoring the plastic-mulched farmland and evaluates the performance of a Support Vector Machine (SVM) classifier with different kernel functions using Landsat-8 Operational Land Imager (OLI) imagery. The textural features were extracted from multi-bands OLI data using a Grey Level Co-occurrence Matrix (GLCM) algorithm. Then, six combined feature sets were developed for classification. The results indicated that Landsat-8 OLI data are well suitable for monitoring plastic-mulched farmland; the SVM classifier with a linear kernel function is superior both to other kernel functions and to two other widely used supervised classifiers: Maximum Likelihood Classifier (MLC) and Minimum Distance Classifier (MDC). For the SVM classifier with a linear kernel function, the highest overall accuracy was derived from combined spectral and textural features in the 90° direction (94.14%, kappa 0.92), followed by the combined spectral and textural features in the 45° (93.84%, kappa 0.92), 135° (93.73%, kappa 0.92), 0° (93.71%, kappa 0.92) directions, and the spectral features alone (93.57%, kappa 0.91). Spectral features make a more significant contribution to monitoring the plastic-mulched farmland; adding textural features from medium resolution imagery provide only limited improvement in accuracy.
- Book Chapter
5
- 10.1007/978-3-319-19387-8_298
- Jan 1, 2015
Biometrics using electroencephalography (EEG) have received attention as a strong security method and has been investigated by many researchers. Studies applied spectral and connectivity features to identify individuals. However, comparison of spectral and connectivity features are not yet conducted in the aspect of stability. In this paper, we present contrast between spectral and connectivity features for EEG based authentication with signals measured in different days. Spectral features are represented as power spectrum density (PSD) over 2-40Hz with 1Hz resolution provided from each channel. Connectivity features are presented as coherence (COH) of two channels combined, frequency range of 2-40Hz with 1Hz resolution. Total of 20 subjects participated and measured 32 channels of EEG for 10 seconds in eyes-closed resting state in three different days. We evaluated false authentication rate (FAR), false rejection rate (FRR) and half total error rate (HTER) as performance of authentication system designed: by using data measured in first day as train data (600 trials) and others as test data (1,173 trials). The similarity of data is measured using correlation modified Euclidean distance. During the decision making process, two values of threshold were set. The results were achieved with minimum of 10.45% HTER when using PSD, and 17.45% of HTER when using COH. It is well known that PSD features are relatively stable over time thus we post-analyzed coherence characteristics of EEG measured over three different days to evaluate stability. To assure stability, those that failed to reject ANOVA and highly correlated (over 0.8) were filtered in each subject in alpha band (8-13Hz) and composed coherence map for each participant. We concluded that considering both PSD and COH, feature filtering is necessary in order to guarantee efficient EEG based authentication.
- Research Article
20
- 10.1109/access.2021.3084200
- Jan 1, 2021
- IEEE Access
In this research, four unique nonlinear speech features are extracted and analyzed to study the dissimilarity pattern between when the speaker is being deceitful and truthful based on how human speech is perceived. The speaker was under stress in a police interrogation where two ground truth and two deceitful responses were recorded during three different times of the day. Using the audio recordings from all three sessions, the cepstral features and spectral energy features are extracted. Cepstral features are the Mel frequency cepstrum coefficient, from where the delta cepstrum and the time-difference cepstrum features are developed. On the other hand, the spectral energy features are the energy of Bark band energy from where the delta energy and the time-difference energy features are developed. The Levenberg-Marquardt classification method and the long short-term memory classification method are then applied to evaluate the accuracy of detecting deception based on the nine unique training and testing combinations of the three different sessions and their extracted cepstrum and spectral energy features. In addition, the principal component analysis is applied to reduce the dimensionality from the extracted features for further improvement. The projected principal components of the four types of features showed improved accuracy in order to distinguish between truthful and deceptive speech pattern. After incorporating with principal component analysis, the long short-term memory classification method with time-difference spectral energy feature shows the highest recognition rate compared to Levenberg-Marquardt algorithm with other cepstral and spectral features.
- Research Article
41
- 10.3390/rs15082152
- Apr 19, 2023
- Remote Sensing
Timely and accurate monitoring of the nitrogen levels in winter wheat can reveal its nutritional status and facilitate informed field management decisions. Machine learning methods can improve total nitrogen content (TNC) prediction accuracy by fusing spectral and texture features from UAV-based image data. This study used four machine learning models, namely Gaussian Process Regression (GPR), Random Forest Regression (RFR), Ridge Regression (RR), and Elastic Network Regression (ENR), to fuse data and the stacking ensemble learning method to predict TNC during the winter wheat heading period. Thirty wheat varieties were grown under three nitrogen treatments to evaluate the predictive ability of multi-sensor (RGB and multispectral) spectral and texture features. Results showed that adding texture features improved the accuracy of TNC prediction models constructed based on spectral features, with higher accuracy observed with more features input into the model. The GPR, RFR, RR, and ENR models yielded coefficient of determination (R2) values ranging from 0.382 to 0.697 for TNC prediction accuracy. Among these models, the ensemble learning approach produced the best TNC prediction performance (R2 = 0.726, RMSE = 3.203 mg·g−1, MSE = 10.259 mg·g−1, RPD = 1.867, RPIQ = 2.827). Our findings suggest that accurate TNC prediction based on UAV multi-sensor spectral and texture features can be achieved through data fusion and ensemble learning, offering a high-throughput phenotyping approach valuable for future precision agriculture research.