• Volume 0,Issue 9,2024 Table of Contents
    Select All
    Display Type: |
    • Self-optimizing Single-cell Clustering with Contrastive Learning and Graph Neural Network

      2024, 33(9):1-13. DOI: 10.15888/j.cnki.csa.009638

      Abstract (431) HTML (1494) PDF 3.67 M (2144) Comment (0) Favorites

      Abstract:Single-cell RNA sequencing (scRNA-seq) performs high-throughput sequencing analysis of the transcriptomes at the level of individual cells. Its primary application is to identify cell subpopulations with distinct functions, usually based on cell clustering. However, the high dimensionality, noise, and sparsity of scRNA-seq data make clustering challenging. Traditional clustering methods are inadequate, and most existing single-cell clustering approaches only consider gene expression patterns while ignoring relationships between cells. To address these issues, a self-optimizing single-cell clustering method with contrastive learning and graph neural network (scCLG) is proposed. This method employs an autoencoder to learn cellular feature distribution. First, it begins by constructing a cell-gene graph, which is encoded using a graph neural network to effectively harness information on intercellular relationships. Subgraph sampling and feature masking create augmented views for contrastive learning, further optimizing feature representation. Finally, a self-optimizing strategy is utilized to jointly train the clustering and feature modules, continually refining feature representation and clustering centers for more accurate clustering. Experiments on 10 real scRNA-seq datasets demonstrate that scCLG can learn robust representations of cell features, significantly surpassing other methods in clustering accuracy.

    • Enhanced Locality Preserving Projection with Latent Sparse Representation Learning

      2024, 33(9):14-27. DOI: 10.15888/j.cnki.csa.009609

      Abstract (272) HTML (1462) PDF 3.56 M (2001) Comment (0) Favorites

      Abstract:Dimensionality reduction plays a crucial role in machine learning and pattern recognition. The existing projection-based methods tend to solely utilize distance information or representation relationships among data points to maintain the data structure, which makes it difficult to effectively capture the nonlinear features and complex correlations of data manifolds in high-dimensional space. To address this issue, this study proposes a method: enhanced locality preserving projection with latent sparse representation learning (LPP_SRL). The method not only utilizes distance information to preserve the local structure of the data but also leverages multiple local linear representations to unveil the global nonlinear structure of the data. Moreover, to establish a connection between projection learning and sparse self-representation, this study employs a novel strategy by replacing the dictionary in sparse self-representation with reconstructed samples from the low-dimensional representation. This approach effectively filters out irrelevant features and noise, thereby better preserving the principal components in the original feature space. Extensive experiments conducted on multiple publicly available benchmark datasets have demonstrated the effectiveness and superiority of the proposed method.

    • Physical Factor Fusion for Tropical Cyclone Intensity Estimation

      2024, 33(9):28-37. DOI: 10.15888/j.cnki.csa.009623

      Abstract (254) HTML (1557) PDF 2.56 M (1994) Comment (0) Favorites

      Abstract:Accurate estimation of tropical cyclone intensity is the basis of effective intensity prediction and is crucial for disaster forecasting. Current tropical cyclone intensity estimation technology based on deep learning shows superior performance, but there is still a problem of insufficient physical information fusion. Therefore, based on the deep learning framework, this study proposes a physical factor fusion for tropical cyclone intensity estimation model (PF-TCIE) to estimate the intensity of tropical cyclones in the northwest Pacific. PF-TCIE consists of a multi-channel satellite cloud image learning branch and a physical information extraction branch. The multi-channel satellite cloud image learning branch is used to extract tropical cyclone cloud system features, and the physical information extraction branch is used to extract physical factor features to constrain the learning of cloud system features. The data used in this article include Himawari-8 satellite data and ERA-5 reanalysis data. Experimental results show that after introducing multiple channels, the root mean squared error (RMSE) of the model is reduced by 3.7% compared with a single channel. At the same time, the introduction of physical information further reduces the model error by 8.5%. The RMSE of PF-TCIE finally reaches 4.83 m/s, which is better than most deep learning methods.

    • Entity Recognition for Interpretation of Bone-sign Integrated with Multiple Features

      2024, 33(9):38-47. DOI: 10.15888/j.cnki.csa.009605

      Abstract (274) HTML (1505) PDF 2.25 M (1831) Comment (0) Favorites

      Abstract:This study constructs a named entity recognition (NER) model suitable for the bone-sign interpretations of Han Chang’an City to solve the problem of the inability to classify some bone-sign interpretations due to the lack of key content. The original text of the bone-sign interpretations of Han Chang’an City is used as the dataset, and the begin, inside, outside, end (BIOE) annotation method is utilized to annotate the bone-sign interpretation entities. A multi-feature fusion network (MFFN) model is proposed, which not only considers the structural features of individual characters but also integrates the structural features of character-word combinations to enhance the model’s comprehension of the bone-sign interpretations. The experimental results demonstrate that the MFFN model can better identify the named entities of the bone-sign interpretations of Han Chang’an City and classify the bone-sign interpretations, outperforming existing NER models. This model provides historians and researchers with richer and more precise data support.

    • Few-shot Open-set Recognition with Feature Decoupling and Openness Learning

      2024, 33(9):48-57. DOI: 10.15888/j.cnki.csa.009612

      Abstract (285) HTML (1468) PDF 1.54 M (2103) Comment (0) Favorites

      Abstract:In the task of few-shot open-set recognition (FSOSR), effectively distinguishing closed-set from open-set samples presents a notable challenge, especially in cases of sample scarcity. Current approaches exhibit uncertainty in describing boundaries for known class distributions, leading to insufficient discrimination between closed-set and open-set spaces. To tackle this issue, this study introduces a novel method for FSOSR leveraging feature decoupling and openness learning. The primary objective is to employ a feature decoupling module to compel the model to decouple class-specific features and open-set features, thereby accentuating the disparity between unknown and known classes. To achieve effective feature decoupling, an openness learning loss is introduced to facilitate the acquisition of open-set features. By integrating similarity metric values and anti-openness scores as the optimization target, the model is steered towards learning more discriminative feature representations. Experimental results on publicly datasets miniImageNet and tieredImageNet demonstrate that the proposed method substantially enhances the detection rate of unknown class samples while accurately classifying known classes.

    • Decoupled Knowledge Distillation Based on Diffusion Model

      2024, 33(9):58-64. DOI: 10.15888/j.cnki.csa.009615

      Abstract (377) HTML (1481) PDF 1.08 M (2164) Comment (0) Favorites

      Abstract:Knowledge distillation (KD) is a technique that transfers knowledge from a complex model (teacher model) to a simpler model (student model). While many popular distillation methods currently focus on intermediate feature layers, response-based knowledge distillation (RKD) has regained its position among the SOTA models after decoupled knowledge distillation (DKD) was introduced. RKD leverages strong consistency constraints to split classic knowledge distillation into two parts, addressing the issue of high coupling. However, this approach overlooks the significant representation gap caused by the disparity in teacher-student network architectures, leading to the problem where smaller student models cannot effectively learn knowledge from teacher models. To solve this problem, this study proposes a diffusion model to narrow the representation gap between teacher and student models. This model transfers teacher features to train a lightweight diffusion model, which is then used to denoise the student model, thus reducing the representation gap between teacher and student models. Extensive experiments demonstrate that the proposed model achieves significant improvements over baseline models on CIFAR-100 and ImageNet datasets, maintaining good performance even when there is a large gap in teacher-student network architectures.

    • Dynamic Super-resolution Reconstruction of Images with Shift Convolution and Edge Detection

      2024, 33(9):65-76. DOI: 10.15888/j.cnki.csa.009618

      Abstract (252) HTML (673) PDF 2.07 M (1009) Comment (0) Favorites

      Abstract:To address the challenges posed by fixed network architectures and deep network layers, such as incomplete expression of complex scene predictions, high computational costs, and deployment difficulties, this study proposes a new network called wide structure dynamic super-resolution network (W-SDNet). Initially, a residual enhancement block, consisting of shift convolution residual structures, is designed to enhance the capability of extracting hierarchical features for image super-resolution and to reduce computational costs. Next, a wide enhancement module is introduced, employing a dual-branch four-layer parallel structure to extract deep information while using a dynamic network’s gating mechanism to selectively enhance feature expression. This module also utilizes an attention mechanism that integrates edge detection operators to improve the expressiveness of edge details. To prevent interference among components within the wide enhancement block, a refinement block utilizing group convolution and channel splitting is employed. Ultimately, high-quality image reconstruction is achieved through a construction block. Experimental results show that W-SDNet outperforms the existing mainstream algorithms in peak signal-to-noise ratio (PSNR) metrics when zoomed in 4 times on five publicly available test datasets, and the number of parameters in the model is significantly reduced. The results demonstrate the advantages of W-SDNet in terms of complexity, performance, and recovery time of super-resolution reconstruction.

    • Concrete Crack Detection Based on ST-UNet and Target Features

      2024, 33(9):77-84. DOI: 10.15888/j.cnki.csa.009632

      Abstract (229) HTML (735) PDF 2.01 M (1196) Comment (0) Favorites

      Abstract:Concrete cracks have negative impacts on the structural load-bearing capacity, durability, and waterproofing. Therefore, early crack detection is of paramount importance. The rapid development of big data and deep learning provides effective methods for intelligent crack detection. To address the issues of imbalanced positive and negative samples, as well as the challenges posed by deep colors and low luminance in crack areas during the crack detection process, this study proposes a crack detection method based on Swin Transformer U-Net (ST-UNet) and target features. This algorithm introduces the CBAM attention mechanism into the network, enabling the network to focus more on the pixel regions in the image that are crucial for crack detection, thereby enhancing the feature representation capability of crack images. The Focal+Dice mixed loss function replaces the single cross-entropy loss function to address the problem of uneven distribution of positive and negative sample images. Additionally, the design of the APSD regularization term optimizes the loss function, addressing the issues of deep colors and low luminance in crack areas and reducing both missed rates and false rates in detection. The results of crack detection show a 22% improvement in IoU and a 17% increase in the Dice index, indicating the effectiveness and feasibility of the algorithm.

    • Super-resolution Reconstruction of Remote Sensing Image Based on Swin Transformer

      2024, 33(9):85-94. DOI: 10.15888/j.cnki.csa.009600

      Abstract (253) HTML (687) PDF 1.71 M (1178) Comment (0) Favorites

      Abstract:Due to the uncertainty of objects in remote sensing images and significant differences in feature information between different images, existing super-resolution methods yield poor reconstruction results. Therefore, this study proposes an NG-MAT model that combines the Swin Transformer and the N-gram model to achieve super-resolution of remote sensing images. Firstly, multiple attention modules are connected in parallel on the branch of the original Transformer to extract global feature information for activating more pixels. Secondly, the N-gram model from natural language processing is applied to the field of image processing, utilizing a trigram N-gram model to enhance information interaction between windows. The proposed method achieves peak signal-to-noise ratios of 34.68 dB, 31.03 dB, and 28.99 dB at amplification factors of 2, 3, and 4, respectively, and structural similarity indices of 0.926 6, 0.844 4, and 0.773 4 at the same amplification factors on the selected dataset. Experimental results demonstrate that the proposed method outperforms other similar methods in various metrics.

    • Road Extraction from Remote Sensing Image Based on Multi-scale Difference Aggregation Mechanism

      2024, 33(9):95-104. DOI: 10.15888/j.cnki.csa.009603

      Abstract (210) HTML (666) PDF 2.45 M (1080) Comment (0) Favorites

      Abstract:In the extraction of roads from high-resolution remote sensing images, problems such as local disconnections and the loss of details are common due to the complex backgrounds and the presence of trees and buildings covering the roads during the image formation process. To solve these problems, this study proposes a road extraction model called MSDANet, based on a multi-scale difference aggregation mechanism. The model has an encoder-decoder structure, using the Res2Net module as the backbone network of the encoder to obtain information with fine-grained and multi-scale features from the images and to expand the receptive field for feature extraction. Additionally, a gated axial guidance module, in conjunction with road morphological features, is applied to highlight the representation of road features and improve the connectivity of long-distance roads in road extraction. Furthermore, a multi-scale difference aggregation module is used between the encoder and decoder to extract and aggregate the different information between shallow and deep features. The aggregated features are then fused with the decoded features through a feature fusion module to facilitate the decoder to accurately restore road features. The proposed method has been evaluated on two high-resolution remote sensing datasets: DeepGlobe and CHN6-CUG. The results show that the F1 score of the MSDANet model is 80.37% and 78.17% respectively, and the IoU is 67.18% and 64.17% respectively. It indicates that the proposed model outperforms the comparison models.

    • Privacy Protection Based on Homomorphic Encryption for Cross-chain Transaction Data

      2024, 33(9):105-113. DOI: 10.15888/j.cnki.csa.009608

      Abstract (188) HTML (719) PDF 1.66 M (1093) Comment (0) Favorites

      Abstract:To protect data privacy in blockchain cross-chain transactions, this study proposes a privacy protection scheme based on homomorphic encryption. The scheme improves the homomorphic encryption algorithm to support floating-point operations while retaining the additive homomorphic property of the original algorithm, and it supports any number of addition operations to realize the privacy protection of cross-chain transaction amounts. To prevent security threats to transactions posed by mismanagement or loss of the private key with homomorphic encryption, a private key sharing mechanism based on Shamir’s secret sharing algorithm is introduced into the scheme. This mechanism prevents untrustworthy nodes from sending malicious values to recover the private key by adding ECDSA digital signatures to verify the private key share. It also considers the dynamic update of the private key share after a node drops or leaves to prevent node collusion. Security analysis and experimental verification show that the proposed scheme can effectively protect privacy in cross-chain transactions.

    • Remote Sensing Object Detection Based on Global Context Attentional Feature Fusion Pyramid Network

      2024, 33(9):114-122. DOI: 10.15888/j.cnki.csa.009631

      Abstract (216) HTML (794) PDF 3.14 M (1144) Comment (0) Favorites

      Abstract:Remote sensing object detection usually faces challenges such as large variations in image scale, small and densely arranged targets, and high aspect ratios, which make it difficult to achieve high-precision oriented object detection. This study proposes a global context attentional feature fusion pyramid network. First, a triple attentional feature fusion module is designed, which can better fuse features with semantic and scale inconsistencies. Then, an intra-layer conditioning method is introduced to improve the module and a global context enhancement network is proposed, which refines deep features containing high-level semantic information to improve the characterization ability. On this basis, a global context attentional feature fusion pyramid network is designed with the idea of global centralized regulation to modulate shallow multi-scale features by using attention-modulated features. Experiments have been conducted on multiple public data sets, and results show that the high-precision evaluation indicators of the proposed network are better than those of the current advanced models.

    • Urban Traffic Estimation Based on Graph Convolution Spatiotemporal GAN

      2024, 33(9):123-131. DOI: 10.15888/j.cnki.csa.009634

      Abstract (189) HTML (659) PDF 1.32 M (962) Comment (0) Favorites

      Abstract:It is very challenging to estimate the traffic flow before urban road deployment. To solve this problem, this study proposes a new conditional urban traffic generating adversarial network (Curb-GAN) model, which utilizes a conditional generating adversarial network (CGAN) to generate urban traffic flow data. Firstly, the distance relationship and external feature information of each node of the road network are treated as conditions to control the generated results. Secondly, the spatial autocorrelation of the road network is captured by the graph convolutional network (GCN), and the time dependence of traffic in different time slots is captured by the self-attention (SA) mechanism and gated cycle unit (GRU). Finally, the trained generator generates traffic flow data. A large number of experiments on two real spatiotemporal datasets show that the estimation accuracy of the Curb-GAN model is superior to the main baseline methods and can produce more meaningful estimates.

    • Automatic Sleep Staging Model Based on Multi-head Self-attention

      2024, 33(9):132-139. DOI: 10.15888/j.cnki.csa.009624

      Abstract (273) HTML (723) PDF 1.31 M (1324) Comment (0) Favorites

      Abstract:Sleep staging is highly important for sleep monitoring and sleep quality assessment. High-precision sleep staging can assist physicians in correctly evaluating sleep quality during clinical diagnosis. Although existing studies on automatic sleep staging have achieved relatively reliable accuracy, there are still problems that need to be solved: (1) How can sleep features be extracted from patients more comprehensively? (2) How can effective rules for sleep state transition be obtained from the captured sleep features? (3) How can multimodal data be effectively utilized to improve classification accuracy? To solve the above problems, this study proposes an automatic sleep staging network based on multi-head self-attention. To extract the modal characteristics of EEG and EOG in sleep stages separately, this network uses a parallel two-stream convolutional neural network structure to process the original EEG and EOG data separately. In addition, the model uses a contextual learning module, which consists of a multi-head self-attention module and a residual network, to capture the multifaceted features of the sequences and to learn the correlation and significance between the sequences. Finally, the model utilizes unidirectional LSTM to learn the transition rules for sleep stages. The results of the sleep staging experiments show that the model proposed in this study achieves an overall accuracy of 85.7% on the Sleep-EDF dataset, with an MF1 score of 80.6%. Moreover, its accuracy and robustness are better than those of the existing automatic sleep staging methods. This indicates that the proposed model is valuable for automatic sleep staging research.

    • Service Quality Optimization Strategy Based on Microservice Distributed Link

      2024, 33(9):140-152. DOI: 10.15888/j.cnki.csa.009628

      Abstract (180) HTML (688) PDF 2.28 M (978) Comment (0) Favorites

      Abstract:Microservices architecture, as an agile and resilient software design paradigm, has been widely applied in the field of software development. However, with the increasing number of microservices, the complexity of the systems rises, and the service quality of the system decreases. Enhancing the quality of online business service under the microservices architecture is a critical challenge. Optimization of service links is a key aspect in addressing this challenge. This work conducts an in-depth study of service links under the microservices architecture and proposes various link analysis methods, including link sampling, link topology generation, strong and weak dependency determination, identification of cyclic calls, and recognition of redundant and ineffective calls. Building upon these methods, the study implements a series of effective optimization strategies, such as robust testing, disassembling cyclic calls, reducing and merging redundant calls, fault self-healing, and link tracing. These strategies effectively improve the service quality of production and operation system services under the microservices architecture.

    • Multi-modal Deep-level High-confidence Fusion Tracking Algorithm

      2024, 33(9):153-163. DOI: 10.15888/j.cnki.csa.009633

      Abstract (214) HTML (698) PDF 4.47 M (1094) Comment (0) Favorites

      Abstract:This study proposes a multi-modal deep-level high-confidence fusion tracking algorithm in response to the tracking failure issues caused by changes in target appearance and environment in single-target tracking applications. First, a high-dimensional multi-modal model is constructed utilizing the target’s color model combined with a shape model based on bilinear interpolation HOG features. Then, candidate targets are searched using particle filtering. The challenge posed by model fusion is addressed by scrupulously quantifying a range of confidences in shape and color models. This is followed by the introduction of a high-confidence fusion criterion, which enables a deeply-adaptive, weighted, and balanced fusion with different confidence levels in the multi-modal model. To counter the issue of static model update parameters, a nonlinear, graded balanced update strategy is designed. Upon testing on the OTB-2015 dataset, this algorithm’s average CLE and OS metrics demonstrated superior performance compared to all reference algorithms, with values of 30.57 and 0.609, respectively. Moreover, with an FPS of 15.67, the algorithm fulfills the real-time operation requirements inherent in tracking algorithms under most conditions. Notably, in some common specific scenarios, the accuracy and success rate of the algorithm also outperform the top-tier algorithms in most cases.

    • Adaptive Laser SLAM Algorithm Combining CPD for Complex Scenes

      2024, 33(9):164-173. DOI: 10.15888/j.cnki.csa.009644

      Abstract (198) HTML (725) PDF 3.06 M (1086) Comment (0) Favorites

      Abstract:Laser point cloud matching is a key factor affecting the accuracy and efficiency of laser SLAM systems. Traditional laser SLAM algorithms cannot effectively distinguish scene structures and result in performance degradation due to poor feature extraction in unstructured scenes. To address this issue, a joint coherent point drift (CPD) adaptive laser SLAM algorithm for complex scenes is proposed, called CPD-LOAM. First, a scene structure identification method combining prejudgment and verification is proposed, in which scene feature variables are introduced to make preliminary judgments on the scene structure. Then, surface curvature is further used to verify the preliminary judgments from the perspective of geometric features, enhancing the accuracy of scene structure identification. In addition, the CPD algorithm is combined for point cloud pre-registration in unstructured scenes, and then the ICP algorithm is used for re-registration to solve the problem of feature degradation in this scene, thereby improving the accuracy and efficiency of point cloud registration. The experimental results show that the proposed scene feature variables and surface curvature can effectively distinguish structure scenes based on the set threshold. The validation results on the public dataset KITTI show that CPD-LOAM reduces the positioning error by 84.47% compared to the LOAM algorithm, and improves the positioning accuracy by 55.88% and 30.52% respectively, compared to the LEGO-LOAM and LIO-SAM algorithms, with higher efficiency and robustness.

    • Stroke Extraction for Chinese Handwriting Character Based on Multi-label Semantic Segmentation

      2024, 33(9):174-182. DOI: 10.15888/j.cnki.csa.009620

      Abstract (234) HTML (692) PDF 1.71 M (1115) Comment (0) Favorites

      Abstract:As the carrier of Chinese culture, Chinese characters are distinguished from other scripts by their complex structure. As the basic unit of Chinese characters, strokes play a vital role in the evaluation of Chinese handwriting characters. The correct extraction of strokes is the primary step in evaluating Chinese handwriting characters. Most existing stroke extraction methods are based on specific rules, and due to the complexity of Chinese characters, these rules usually cannot take into account all the features, and cannot match the strokes of template characters based on stroke order and other information during evaluation. To address these issues, this study transforms stroke extraction into a multi-label semantic segmentation problem and proposes a multi-label semantic segmentation model (M-TransUNet), which utilizes a deep convolutional model to train with Chinese characters as a unit task, retaining the original structure of the strokes and avoiding ambiguity in stroke segment combinations. At the same time, the stroke order of the Chinese handwriting characters is obtained, which is conducive to downstream tasks, such as stroke evaluations. Since the handwriting images are only divided into foreground and background without additional color information, they are more prone to generating FP segmentation noise. To solve this problem, this study also proposes a local smooth strategy on strokes (LSSS) for the stroke segmentation results to dilute the impact of noise. Finally, this study conducted experiments on the segmentation performance and efficiency of M-TransUNet, demonstrating that the algorithm significantly enhances efficiency with minimal performance loss. Additionally, experiments were carried out on the LSSS algorithm to demonstrate its effectiveness in eliminating FP noise.

    • Reliability Analysis of Linear Wireless Sensor Network Based on Probabilistic Sensing Model

      2024, 33(9):183-191. DOI: 10.15888/j.cnki.csa.009625

      Abstract (183) HTML (640) PDF 1.83 M (1007) Comment (0) Favorites

      Abstract:Linear wireless sensor network (LWSN) is widely used to monitor key infrastructure in linear topology such as railways and natural gas pipelines, whose reliability is very important, and coverage is an important indicator to measure reliability. Currently, most methods for evaluating the LWSN coverage are based on a 0/1 disk sensing model, but in practice, the monitoring reliability of sensors follows a probability distribution with the increase of coverage radius. Therefore, a reliability analysis method based on a probabilistic sensing model is proposed, which can calculate the effective sensing range based on the physical parameters of sensors, thereby improving the accuracy of evaluation. To reduce the size of the system state space, a binary decision tree is used to construct the LWSN system state set. In this study, the failure probability of nodes is assumed to follow a Weibull distribution, and simulation experiments are conducted for different communication radii and sensing ranges. The results show that this method can effectively evaluate the reliability of LWSN, and the evaluation accuracy is more accurate than the 0/1 disk sensing model.

    • Application of Variable Neighborhood Simulated Annealing Algorithm in Rural Household Garbage Collection and Transportation

      2024, 33(9):192-200. DOI: 10.15888/j.cnki.csa.009622

      Abstract (205) HTML (663) PDF 1.92 M (936) Comment (0) Favorites

      Abstract:According to the characteristics of rural household garbage generation, a multi-objective garbage collection and transportation path optimization model is constructed to minimize transportation cost, vehicle delay penalty cost, and environmental penalty cost, considering the variable collection and transportation cycle of domestic waste classification. The solution space is reconstructed with the combination of random choice method and nearest neighbor method, and the simulated annealing algorithm with variable neighborhood is used to solve the model. Through case simulation and comparative analysis, it can be seen that the proposed model and algorithm have good optimization results in terms of total collection and transportation cost and total distance. Based on the analysis, the results in this study are also superior to the optimal solutions of the classical simulated annealing algorithm and variable neighborhood search algorithm. Compared with the traditional fixed cycle collection and transportation scheme, the model established in this study subtracts the environmental pollution cost and modifies the total cost by more than 110.4%, which can effectively solve the problem of garbage collection and transportation path optimization in rural areas.

    • Degradation-aware Underwater Image Enhancement Network Based on Wavelet Transform

      2024, 33(9):201-207. DOI: 10.15888/j.cnki.csa.009616

      Abstract (149) HTML (673) PDF 1.69 M (1033) Comment (0) Favorites

      Abstract:To solve the problems of poor degradation awareness, easy detail loss, and ineffective color cast correction caused by existing underwater image enhancement algorithms, this study proposes a degradation-aware underwater image enhancement network based on wavelet transform. This model mainly contains a degradation representation extraction network based on contrastive learning and an underwater image enhancement network based on multiple-level wavelet transform. Firstly, the degradation representation extraction network uses an encoder and contrastive learning method to extract unique degradation representations from each underwater image. Secondly, a three-level wavelet transform module is built under the principle of multi-level wavelet transform enhancement algorithm, aiming to conduct multi-scale detail and color enhancement in the frequency domain. Lastly, a multiple-level wavelet transform enhancement network is built with three-level wavelet transform modules, and the extracted degradation representations are introduced into this network for better implementing multiple-level wavelet transform enhancement with perceived degradation information. Experimental results show that the proposed algorithm outperforms existing algorithms in color correction and detail enhancement in terms of sharply enhanced results, i.e. structural similarity is improved by 16%, peak signal-to-noise ratio is improved by 9%, and underwater image quality is improved by 14%, making it suitable for underwater image enhancement tasks.

    • ResNet Few-shot Crop Pest and Disease Recognition Incorporating Attention Mechanism and Secondary Feature Extraction

      2024, 33(9):208-215. DOI: 10.15888/j.cnki.csa.009619

      Abstract (232) HTML (662) PDF 3.51 M (1019) Comment (0) Favorites

      Abstract:Aiming at the problem that traditional machine learning methods are not ideal in terms of effect and time for identifying crop leaf pests and diseases with small samples and multiple categories, this study utilizes an improved ResNet model to realize the recognition of crop pests and diseases. By adding dropout layers, activation function, maximum pooling layer, and attention mechanism, the robustness and feature capturing ability of the model is improved, and the accuracy of pest and disease recognition with a lower number of model parameters is achieved. Firstly, the images obtained from the public dataset Plant Village are preprocessed and enhanced, and the ReLU activation function is replaced by PReLU to solve the problem of neuron necrosis in the part of the ReLU function less than 0. Then, a dropout layer is added before the global average pooling layer, and a reasonable threshold value is set to effectively avoid the occurrence of overfitting and to enhance the robustness of the model. In addition, a maximum pooling layer is added between the dropout and global average pooling layer, which not only expands the receptive field of neurons, but also helps the model to obtain the most significant features of local pests and diseases, reduce the noise effect from image background, and realize secondary feature extraction. Finally, the CBAM attention mechanism is embedded, which makes the model automatically learn the most important channel information in the input feature maps and weight it between the channel and space to better capture the semantic information in the images. Experimental results show that the improved model recognizes the test set with an accuracy of 99.15% with a model parameter count of only 9.13M, which exceeds the accuracy of Xception, InceptionV3, and the original ResNet by 1.01, 0.68, and 0.59 percentage points, respectively, and reduces the model parameter count. This provides a state-of-the-art crop disease recognition deep learning method.

    • High Expressiveness Voice Conversion Based on Multiple Mutual Information Constraints

      2024, 33(9):216-225. DOI: 10.15888/j.cnki.csa.009637

      Abstract (139) HTML (666) PDF 1.76 M (934) Comment (0) Favorites

      Abstract:As voice conversion technology becomes increasingly prevalent in human-computer interaction, the need for highly expressive speech continues to grow. Currently, voice conversion primarily relies on decoupling acoustic features, emphasizing the decoupling of content and timbre features, but often neglects the emotional features in speech, resulting in insufficient emotional expressiveness in converted audio. To address this problem, this study introduces a novel model for highly expressive voice conversion with multiple mutual information constraints (MMIC-EVC). On top of decoupling content and timbre features, the model incorporates an expressiveness module to capture discourse-level prosody and rhythm features, enabling the conveyance of emotional features. It constrains every encoder to focus on its acoustic embedding by minimizing the variational upper bounds of multiple mutual information between features. Experiments on the CSTR-VCTK and ESD speech datasets indicate that the converted audio of the proposed model achieves a mean opinion score of 3.78 for naturalness and a Mel cepstral distortion of 5.39 dB, significantly outperforming baseline models in the best-worst sensitivity test. The MMIC-EVC model effectively decouples rhythmic and prosodic features, facilitating high expressiveness in voice conversion, and thereby providing a more natural and better user experience in human-computer interaction.

    • OFDM Specific Emitter Identification Using FFB-EWT

      2024, 33(9):226-234. DOI: 10.15888/j.cnki.csa.009610

      Abstract (147) HTML (672) PDF 1.77 M (984) Comment (0) Favorites

      Abstract:This study proposes a novel identification method for OFDM emitters to address the issue of low classification accuracy in traditional methods for specific emitter identification, where subtle fingerprint features of OFDM emitters are affected by data signal components and channel noise. Considering the subcarrier spectrum of the short preamble, this method utilizes the fixed frequency boundary-based empirical wavelet transform (FFB-EWT) and a deep residual network. Initially, the short preamble of OFDM signals is extracted to define fixed boundary conditions based on the frequency intervals of the subcarriers in the short preamble. The boundary values in the frequency domain are then applied to FFB-EWT for signal decomposition to remove the subcarrier components containing preamble information. Subsequently, the signal-to-noise ratio of fingerprint features is enhanced by accumulating the null subcarrier components of adjacent frames. Next, a dual-channel residual network called ResNet18, integrated with a non-local attention module and a channel attention module, is used for feature extraction from IQ data inputs, with classification performed via the Softmax function. Finally, the Oracle public dataset is chosen to validate the feasibility of the method. Experimental results demonstrate that the FFB-EWT method achieves accuracy rates of 98.17% and 89.33% for identifying six different emitters under 6 dB and 0 dB conditions, respectively, proving the effectiveness of the method in environments with low signal-to-noise ratios.

    • Byzantine Node Detection of Federated Learning for Transient Stability Analysis of Power System

      2024, 33(9):235-244. DOI: 10.15888/j.cnki.csa.009578

      Abstract (149) HTML (652) PDF 2.09 M (1076) Comment (0) Favorites

      Abstract:This study proposes a federated learning algorithm for transient stability in a distributed power system and a Byzantine node detection algorithm to assess the transient stability of various regions in a distributed smart grid and address potential network attacks. In the federated learning framework, each regional power grid independently uses neural networks to assess its transient stability, while the central server integrates the training gradients, provides feedback, and updates them. To improve the security of the framework, the model constructed in this study clusters the updated gradients of each regional power grid to identify outliers, which refer to regional power grids that are under attack, so as to detect Byzantine nodes. Considering the high-dimensional characteristics of gradients, direct clustering will lead to inaccurate distance measurement. Therefore, an autoencoder is trained online to reduce the dimension of the gradients. Density clustering is then performed on the lower-dimensional gradients to select a small number of nodes as a set of Byzantine nodes and permanently eliminate the gradients provided by Byzantine nodes. An example of electromechanical transient simulation for angle stability is used for verification. The results show that this method addresses network attacks while assessing the temporary stability of the power system. Compared with other methods, this method significantly improves the average accuracy and stability, effectively preventing fluctuations in assessment accuracy.

    • Hierarchical Adaptive PID Control Algorithm Based on Deep Reinforcement Learning

      2024, 33(9):245-252. DOI: 10.15888/j.cnki.csa.009598

      Abstract (399) HTML (690) PDF 2.14 M (1496) Comment (0) Favorites

      Abstract:Proportional integral derivative (PID) control is widely used in the fields of industrial and robot control. However, it faces challenges such as complex parameter setting, difficulty in accurately modeling the system, and sensitivity to changes in the controlled object. To address these challenges, this study proposes a hierarchical adaptive PID control algorithm based on a deep reinforcement learning algorithm, named TD3-PID, for the automatic control of mobile robots. In this algorithm, the upper-layer controller adjusts the parameters and output compensation of the lower-layer PID controller by observing the current environmental and system status in real time to compensate for errors in real time and optimize system performance. This study applies the proposed TD3-PID controller to a trajectory tracking task of a four-wheel mobile robot and conducts real-scenario experimental comparisons with other control methods. The results show that the TD3-PID controller exhibits superior dynamic response performance and anti-interference ability. The overall response error is significantly reduced and significant advantages are seen in improving the performance of the control system.

    • Pavement Disease Detection Based on Improved YOLOv5s

      2024, 33(9):253-260. DOI: 10.15888/j.cnki.csa.009611

      Abstract (248) HTML (761) PDF 3.39 M (1543) Comment (0) Favorites

      Abstract:This study proposes an improved lightweight pavement disease detection model called pavement disease-YOLOv5s (PD-YOLOv5s) to address the problem of low detection accuracy in pavement disease detection due to diverse disease forms, large-scale differences, and similar background grayscale values. Firstly, the model applies a three-dimensional parameter-free attention mechanism called SimAM to effectively enhance the feature extraction ability of the model in complex environments without increasing the number of model parameters. Secondly, the model integrates the residual block Res2NetBlock to expand its receptive field and improve its feature fusion at a finer granularity level. Finally, the SPD-GSConv module is constructed for downsampling to effectively capture target features of different scales and integrate the extracted features into the model to perform pavement disease classification detection. Experimental results on real pavement disease datasets show that the mean average precision (mAP) of the PD-YOLOv5s model is improved by 4.7% compared to that of the original YOLOv5s model. The parameters of the proposed model are reduced to 6.78M, and the detection speed reaches 53.97 f/s. The PD-YOLOv5s model has superior detection performance while reducing network computing costs, making it valuable for engineering applications in pavement disease detection.

    • Few-shot Semantic Segmentation Based on Contrastive Learning and Background Mining

      2024, 33(9):261-268. DOI: 10.15888/j.cnki.csa.009617

      Abstract (145) HTML (686) PDF 3.76 M (1022) Comment (0) Favorites

      Abstract:Few-shot semantic segmentation is a computer vision task that involves segmenting potential object categories in query images with a small number of annotated samples. However, existing methods still face two challenges. Firstly, there is a prototype bias problem, resulting in prototypes having less foreground object information and making it difficult to simulate real category statistics. The other issue is feature degradation, which means that the model only focuses on the current category rather than potential categories. This study proposes a new network based on contrastive prototypes and background mining. The main idea of the network is to enable the model to learn more representative prototypes and identify potential categories from the background. Specifically, a specific class learning branch constructs a large and consistent prototype dictionary and then uses InfoNCE loss to make the prototypes more discriminative. On the other hand, the background mining branch initializes background prototypes and uses an attention mechanism between the constructed background prototypes and the dictionary to mine potential categories. Experimental results on the PASCAL-5i and COCO-20i datasets demonstrate excellent performance of the model. Under the 1-shot setting using the ResNet-50 network, 64.9% and 44.2% are achieved, an improvement of 4.0% and 1.9%, respectively, compared to the baseline model.

    • Cross-modality Person Re-identification Based on Attention Feature Fusion

      2024, 33(9):269-275. DOI: 10.15888/j.cnki.csa.009604

      Abstract (198) HTML (719) PDF 2.26 M (1086) Comment (0) Favorites

      Abstract:Cross-modality person re-identification is widely used in intelligent safety monitoring systems, aiming to match visible light images and infrared images of the same person. Due to the inherent modality differences between visible and infrared modalities, cross-modality person re-identification poses significant challenges in practical applications. To alleviate modality differences, researchers have proposed many effective solutions. However, existing methods extract different modality features without corresponding modality information, resulting in insufficient discriminability of the features. To improve the discriminability of the features extracted from models, this study proposes a cross-modality person re-identification method based on attention feature fusion. By designing an efficient feature extraction network and attention feature fusion module, and optimizing multiple loss functions, the fusion and alignment of different modality information can be achieved, thereby promoting the model matching accuracy for persons. Experimental results show that this method achieves great performance on multiple datasets.


Volume , No. 9

Table of Contents

Archive

Volume

Issue

联系方式
  • 《计算机系统应用》
  • 1992年创刊
  • 主办单位:中国科学院软件研究所
  • 邮编:100190
  • 电话:010-62661041
  • 电子邮箱:csa (a) iscas.ac.cn
  • 网址:http://www.c-s-a.org.cn
  • 刊号:ISSN 1003-3254
  • CN 11-2854/TP
  • 国内定价:50元
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063