• Volume 33,Issue 10,2024 Table of Contents
    Select All
    Display Type: |
    • Human Action Recognition Based on 3D Skeleton

      2024, 33(10):1-12. DOI: 10.15888/j.cnki.csa.009659 CSTR: 32024.14.csa.009659

      Abstract (497) HTML (1111) PDF 869.65 K (2508) Comment (0) Favorites

      Abstract:Action recognition is an important technology in computer vision, which can be categorized into video-based and skeleton-based action recognition according to different input data. The 3D skeleton data avoids the influence of illumination, occlusion, and other factors, yielding more accurate action descriptions. Now, human action recognition based on 3D skeleton has been paid more attention. Methods for human action recognition based on a 3D skeleton can be divided into the end-to-end black-box method and the pattern recognition-based white-box method. The black-box method in deep learning involves large parameters and can learn classification knowledge from a large amount of data. However, deep learning is difficult to explain and can only provide an overall recognition result. Compared with the black-box method, the white-box method has an explainable recognition process and an adjustable algorithm. Nevertheless, some white-box methods only focus on algorithmic improvements, using formulas to represent and classify actions, without reflecting the difference and connection among actions. Therefore, this study designs a white-box method with a visible classification process. This method uses a tree structure to organize action data hierarchically, constructing an individual classification hierarchy according to the differences between the same actions and an action classification hierarchy according to the discrepancies among different actions. Various measurement algorithms are also incorporated into the system. This study selects the nearest neighbor and dynamic time warping algorithms for experiments. The advantage of a hierarchical structure is that a variety of knowledge can be implanted to it according to various requirements so that actions can be classified from different perspectives. In the experiments, key posture knowledge and human body structure knowledge are implanted into the hierarchy structure. With the implantation of knowledge, the hierarchy structure dynamically changes.

    • Cognitive Abilities Research in Alzheimer’s Disease Based on Sparse Quantile Regression

      2024, 33(10):13-25. DOI: 10.15888/j.cnki.csa.009662 CSTR: 32024.14.csa.009662

      Abstract (365) HTML (1166) PDF 1.77 M (1869) Comment (0) Favorites

      Abstract:Alzheimer’s disease poses a significant public health challenge in the global aging society. One of its main clinical symptoms is the gradual decline in cognitive abilities. A crucial topic in Alzheimer’s disease research is to establish models that link cognitive performance with neuroimaging data to identify neuroimaging biomarkers associated with cognitive abilities. However, neuroimaging data often exhibit high dimensions, heavy-tailed distributions, and outliers. These characteristics not only reduce the accuracy and stability of models but also pose challenges for result explanations. To address these issues, this study uses sparse quantile regression to model and perform feature selection on data from the Alzheimer’s disease neuroimaging initiative (ADNI). This study also explores the distribution characteristics of cognitive scores at different quantiles and identifies specific brain regions associated with cognitive abilities. Experimental results demonstrate that sparse quantile regression successfully identifies the brain regions relevant to cognitive abilities at different quantiles. This research shows the potential of applying sparse quantile regression in neuroimaging data analysis and provides a novel perspective and approach for neuroimaging research.

    • >Survey
    • Review on Non-contact Detection Methods of Heart Rate and SpO2 Based on Video Signal

      2024, 33(10):26-36. DOI: 10.15888/j.cnki.csa.009646 CSTR: 32024.14.csa.009646

      Abstract (506) HTML (1402) PDF 652.19 K (2964) Comment (0) Favorites

      Abstract:Heart rate and saturation of peripheral capillary oxygenation (SpO2) are very important physiological indicators of human health. In recent years, non-contact heart rate and SpO2 detection methods based on imaging photoplethysmography (IPPG) have gradually become a research focus as they are convenient and freely-applied. The main work is as follows. First, the study introduces the background and research significance of non-contact detection methods. Secondly, two aspects of target region detection and region of interest (ROI) are selected to summarize and clarify the research status and future improvement direction. Thirdly, the detection methods of heart rate and SpO2 are summarized from three aspects: traditional method, signal processing combined with deep learning method and end-to-end method, and the data sets used in deep learning method and the detection effects displayed in each data set are sorted out. Finally, the paper points out the problems that need to be solved and the future research direction in this field.

    • Risk Prediction of Child Vaccination Based on Knowledge Graph and Pre-trained Language Model

      2024, 33(10):37-46. DOI: 10.15888/j.cnki.csa.009635 CSTR: 32024.14.csa.009635

      Abstract (382) HTML (1117) PDF 1.85 M (1748) Comment (0) Favorites

      Abstract:Primary healthcare providers lack the ability to assess the risk of vaccination for children with certain illnesses. It is a viable solution to developing a risk prediction model for pediatric vaccination, by leveraging the experience of healthcare professionals in tertiary hospitals, to assist primary healthcare providers in swiftly identifying high-risk pediatric patients. This study proposes an intelligent method for vaccine recommendations based on a knowledge graph. Firstly, a method for medical named entity recognition called ELECTRA-BiGRU-CRF, based on pre-trained language models, is proposed for named entity extraction from outpatient electronic medical records. Secondly, a vaccination ontology is designed, with relationships and attributes defined, to construct a Chinese childhood vaccination knowledge graph based on Neo4j. Finally, a method for vaccine recommendations guided by significant categories using pre-trained language models is proposed based on the constructed knowledge graph. Experimental results indicate that the proposed methods can provide diagnostic assistance to physicians and offer support for deciding whether vaccines can be administered to children with certain illnesses.

    • Non-cooperative Spacecraft Pose Recognition Network Based on ISAR Imaging

      2024, 33(10):47-55. DOI: 10.15888/j.cnki.csa.009670 CSTR: 32024.14.csa.009670

      Abstract (686) HTML (1031) PDF 969.15 K (1944) Comment (0) Favorites

      Abstract:Due to the lack of cooperative information, non-cooperative spacecraft cannot obtain pose data directly from sensors. Therefore, a pose recognition network based on inverse synthetic aperture radar (ISAR) images is proposed. Compared with the images taken by space photography satellites and simulation data, this kind of image is easier to obtain and cheaper, but there are some problems such as low resolution ratio and incomplete panel image. Therefore, in image preprocessing, the network uses YOLOX-tiny as a spacecraft clipping network by adjusting it to avoid the data marked in the image affecting the subsequent network training, so that the network only focuses on the region where the spacecraft is located. The enhanced Lee filter is used to remove image noise and improve image quality. In the backbone network, the STN module is added to make the network select the most relevant region attention, and the U-Net is designed into a dense residual block structure and combined with the CBAM module to reduce the feature loss during sampling and improve the accuracy of the model. In addition, multi-head self-attention is introduced to capture more global information. The experimental results show that the minimum, maximum, and average errors of this model are improved compared with some mainstream models, and the errors are reduced by 0.5–0.6. All this proves that the network has a better pose recognition ability.

    • Multimodal Remote Sensing Image Matching Combining Phase Symmetry and Rank-based LSS

      2024, 33(10):56-65. DOI: 10.15888/j.cnki.csa.009668 CSTR: 32024.14.csa.009668

      Abstract (228) HTML (1030) PDF 5.97 M (1771) Comment (0) Favorites

      Abstract:To address the issue of nonlinear radial distortion present in multimodal remote sensing images, this study proposes a method for matching multimodal remote sensing images that integrates phase symmetry features with rank-based local self-similarity. Initially, the local phase information of the images is utilized to construct a phase symmetry map, upon which feature extraction is performed using the features from the accelerated segment test (FAST) algorithm. Subsequently, a new feature descriptor named RPCLSS is constructed, which combines rank-based local self-similarity and phase congruency. Finally, the fast sample consensus (FSC) algorithm is employed to eliminate mismatched points. Comparative experiments are conducted on publicly available multi-source remote sensing image datasets, comparing the proposed method against five existing advanced matching methods. The results reveal that the proposed method outperforms these state-of-the-art methods in terms of the number of correct matching points, matching precision, and matching correctness.

    • 3D Gaze Estimation by Bidirectional Fusion of CNN and Transformer

      2024, 33(10):66-74. DOI: 10.15888/j.cnki.csa.009649 CSTR: 32024.14.csa.009649

      Abstract (391) HTML (821) PDF 1.59 M (1366) Comment (0) Favorites

      Abstract:To address the issue of low accuracy and susceptibility to interference from external factors in unconstrained environments, a convolution and attention double-branch parallel feature cross-fusion gaze estimation method is proposed to enhance feature fusion effectiveness and network performance. Firstly, the Mobile-Former network is enhanced by introducing a linear attention mechanism and partial convolution. This effectively improves the feature extraction capability while reducing computing costs. Additionally, a branch of the ResNet50 head pose feature estimation network, pre-trained on the 300W-LP dataset, is added to enhance gaze estimation accuracy. A Sigmoid function is used as a gating unit to screen effective features. Finally, facial images are inputted into the neural network for feature extraction and fusion, and the 3D gaze estimation direction is outputted. The model is evaluated on the MPIIFaceGaze and Gaze360 datasets, and the average angle error of the proposed method is 3.70° and 10.82°, respectively. The network model is verified to accurately estimate the 3D gaze direction and reduce computational complexity compared to other mainstream 3D gaze estimation methods.

    • Predictive Maintenance System for Mineral Processing Equipment

      2024, 33(10):75-86. DOI: 10.15888/j.cnki.csa.009663 CSTR: 32024.14.csa.009663

      Abstract (265) HTML (799) PDF 3.37 M (1271) Comment (0) Favorites

      Abstract:Ensuring the precise maintenance and stable operation of mineral processing equipment has always been an important challenge for mining-related enterprises while developing predictive maintenance systems for equipment has become a crucial means to reduce maintenance costs and improve production efficiency. This study analyzes the functional requirements of predictive maintenance systems, designs architecture and overall functional structure for a predictive maintenance system based on a micro-service architecture, and elaborates on the key technologies of the system. Moreover, the study proposes an evaluation model for equipment health status based on a multi-scale CNN fusion attention mechanism, as well as a prediction model for current trend fusion based on CNN and BiLSTM, to support the construction of the predictive maintenance system. The completed system has been applied at Ansteel Group Guanbaoshan Mining Co. Ltd., where the proposed model undergoes testing. The results show that the proposed model outperforms existing models with its high accuracy and robustness. The developed system can provide precise equipment maintenance plans, reduce equipment maintenance costs, and improve enterprise production efficiency.

    • Identification System of Cancer Driver Genes Based on Graph Autoencoder and LightGBM

      2024, 33(10):87-96. DOI: 10.15888/j.cnki.csa.009647 CSTR: 32024.14.csa.009647

      Abstract (239) HTML (844) PDF 1.94 M (1351) Comment (0) Favorites

      Abstract:Cancer driver genes play a crucial role in the formation and progression of cancer. Accurate identification of cancer driver genes contributes to a deeper understanding of the mechanisms underlying cancer development and advances precision medicine. To address the heterogeneity and complexity challenges in the current field of cancer driver gene identification, this study presents the design and implementation of a cancer driver gene identification system, ACGAI, based on graph autoencoder and LightGBM. The system initially employs unsupervised learning with a graph autoencoder to grasp the complex topological structure of the biomolecular network. Subsequently, the generated embedding representations are concatenated with original gene features, forming gene-enhanced features input into LightGBM. After training, the system outputs predictive scores for each gene on the biomolecular network, achieving accurate identification of cancer driver genes. Finally, the system utilizes Web technology to create a user-friendly and highly interactive visualization interface, enabling cancer driver gene identification in the context of gene set analysis and providing biological interpretation for the identification results. Through rigorous testing, the system exhibits superior identification performance compared to other methods, demonstrating its effectiveness in identifying cancer driver genes.

    • Scheme of Cross-border Trade Data Sharing and Access Control Based on Blockchain

      2024, 33(10):97-105. DOI: 10.15888/j.cnki.csa.009648 CSTR: 32024.14.csa.009648

      Abstract (296) HTML (820) PDF 1.42 M (1553) Comment (0) Favorites

      Abstract:With the development of global economic integration, cross-border trade has become an important driving force for global economic development. However, it is facing issues such as data security, information silos, and information asymmetry. Based on this, this study proposes a blockchain-based scheme for data sharing and access control in cross-border trade. The scheme uses a collaborative storage mechanism of blockchain and inter planetary file system (IPFS) to effectively reduce the storage load of blockchain. In addition, a dual key regression model combined with time dimension is adopted to encrypt and store data, as well as assign access permissions by setting different time periods, which limits the unnecessary access of data users outside a certain time span. Finally, corresponding smart contracts are designed to achieve efficient management of the entire life cycle flow of data, improving sharing efficiency. The experimental results show that the proposed scheme can achieve secure data sharing in cross-border trade and fine-grained access control for users.

    • Deep Reinforcement Learning for Object Detection Based on Improved Reward Mechanism

      2024, 33(10):106-114. DOI: 10.15888/j.cnki.csa.009639 CSTR: 32024.14.csa.009639

      Abstract (365) HTML (813) PDF 983.35 K (1993) Comment (0) Favorites

      Abstract:To improve the detection accuracy and speed of deep reinforcement learning object detection models, modifications are made to traditional models. To address inadequate feature extraction, a VGG16 feature extraction module integrated with a channel attention mechanism is introduced as the state input for reinforcement learning, enabling a more comprehensive capture of key information in images. To address inaccurate evaluation caused by relying solely on the intersection over union as a reward, an improved reward mechanism that also considers the distance between the center points and the aspect ratio of the ground truth box and the predicted box is employed, making the reward more reasonable. To accelerate the convergence of the training process and enhance the objectivity of the agent’s evaluation of current states and actions, the Dueling DQN algorithm is used for training. After conducting experiments on the PASCAL VOC2007 and PASCAL VOC2012 datasets, experimental results show that the detection model only needs 4–10 candidate boxes to detect the target. Compared with Caicedo-RL, the accuracy is improved by 9.8%, and the mean intersection over union between the predicted and ground truth boxes is increased by 5.6%.

    • Lightweight Sleep Analysis System Based on Single-channel EEG Signals

      2024, 33(10):115-123. DOI: 10.15888/j.cnki.csa.009650 CSTR: 32024.14.csa.009650

      Abstract (255) HTML (769) PDF 3.45 M (1472) Comment (0) Favorites

      Abstract:Traditional sleep staging models are difficult to deploy in devices with limited computing power due to high requirements of computational resources. In this study, a lightweight sleep analysis system based on single-channel EEG signals is developed, which deploys a GhostNet-optimized neural network model named GhostSleepNet to assess sleep staging and sleep quality. Users only need to use a brain loop and connect it to this system to achieve sleep staging with high accuracy in a home environment. In this system, convolutional neural networks (CNN) are responsible for extracting higher-order features, GhostNet is designed to maintain the accuracy of CNN extracted features while reducing the parameters of the model to improve the computational efficiency, and gated recurrent unit (GRU) focuses on capturing long-term dependencies and cyclic changes in sleep data. Verification of the five classification tasks on the Sleep-EDF dataset shows that the sleep staging accuracy of GhostSleepNet reaches 84.17%, which is 3%–5% lower than that of traditional sleep staging models. However, the number of FLOPs is only 5 041 111 040, and the computational complexity decreases by 20%–45%, contributing to the development of sleep staging for mobile devices.

    • Short Term Power Load Forecasting Based on CEEMDAN-SBiGRU-OMHA

      2024, 33(10):124-132. DOI: 10.15888/j.cnki.csa.009661 CSTR: 32024.14.csa.009661

      Abstract (203) HTML (869) PDF 1.56 M (1235) Comment (0) Favorites

      Abstract:This study proposes a CEEMDAN-SBiGRU combined prediction model with an optimized multi-head attention mechanism to enhance the precision of short-term power load forecasting and fully explore the complex correlation of power load data. The model improves two modules: feature extraction and feature fusion. Firstly, the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is utilized to decompose the power load data into multiple intrinsic mode function (IMF) and a residual signal (RES); and a denoising autoencoder DAE is introduced to extract potential features from the data affected by meteorological factors, workday types, and temperature changes. Secondly, the extracted intricate features are fed into the stacked bidirectional gated recurrent unit (SBiGRU) module to obtain hidden states. Finally, the obtained hidden states are input into the optimized multi-head attention (OMHA) mechanism module,which incorporates residual mechanism and layer normalization, to accurately assign higher weights to important features and solve the problem of noise interference. The experimental results indicate that the CEEMDAN-SBiGRU-OMHA combined model achieves higher accuracy.

    • Modeling and Simulation of Light-weight Fusion Network MSDNet Guided by VSM

      2024, 33(10):133-139. DOI: 10.15888/j.cnki.csa.009630 CSTR: 32024.14.csa.009630

      Abstract (250) HTML (813) PDF 1.70 M (1313) Comment (0) Favorites

      Abstract:Light-weight image fusion algorithm is very important for human eye observation and machine recognition. By studying the importance of visual saliency in infrared and visible image fusion, a visual saliency map (VSM)-guided MSDNet fusion network is optimized and designed based on the SDNet fusion network. Firstly, the structure and channel numbers of SDNet are reduced to accelerate training and inference speed, and the learning ability of the light-weight model is enhanced by structural parameterization and reverse parameterization techniques. Then, for model training, the loss function guided by VSM is used to achieve model self-supervised training. Finally, at the end of the training, the image reconstruction branch is deleted. So the final light-weight model is obtained by the fusion of convolution parameters. Experiments show that the light-weight network can not only ensure image fusion quality but greatly improve the speed, making its porting in mobile terminals possible.

    • Counterfactual Explanation of Anomalous Objects Considering Causal Constraints

      2024, 33(10):140-151. DOI: 10.15888/j.cnki.csa.009651 CSTR: 32024.14.csa.009651

      Abstract (225) HTML (801) PDF 957.43 K (1690) Comment (0) Favorites

      Abstract:Most existing anomaly detection methods focus on algorithm efficiency and accuracy while overlooking the interpretability of detected anomalous objects. Counterfactual explanation, a research hot spot in interpretable machine learning, aims to explain model decisions by perturbing the features of the instances under study and generating counterfactual examples. In practical applications, there may be causal relationships among features. However, most existing counterfactual-based interpretability methods concentrate on how to generate more diverse counterfactual examples, overlooking the causal relationships among features. Consequently, unreasonable counterfactual explanations may be produced. To address this issue, this study proposes an algorithm to interpret anomaly via reasonable counterfactuals (IARC) that consider causal constraints. In the process of generating counterfactual explanations, the proposed method incorporates the causality between features into the objective function to evaluate the feasibility of each perturbation, and employs an improved genetic algorithm for solution optimization, thereby generating rational counterfactual explanations. Additionally, a novel measurement metric is introduced to quantify the degree of contradiction in the generated counterfactual explanations. Comparative experiments and detailed case studies are conducted on multiple real-world datasets, benchmarking the proposed method against several state-of-the-art methods. The experimental results demonstrate that the proposed method can generate highly rational counterfactual explanations for anomalous objects.

    • Online Estimation for Partially Linear Model in Data Streams

      2024, 33(10):152-162. DOI: 10.15888/j.cnki.csa.009658 CSTR: 32024.14.csa.009658

      Abstract (217) HTML (778) PDF 1.39 M (1367) Comment (0) Favorites

      Abstract:The partially linear model, as an important type of semiparametric regression models, is widely used across various fields due to its flexible adaptability in the analysis of complex data structures. However, in the era of big data, the research and application of this model are faced with multiple challenges, with the most critical ones being computing speed and data storage. This study considers the scenario of data streams continuously observed in the form of data blocks and proposes an online estimation method for the parameters of the linear part and the unknown function of the nonlinear part in the partially linear model. This method enables real-time estimation using only the current data block and previously computed summary statistics. To verify the effectiveness, the unit data block size and the total sample size of the data streams are changed respectively in numerical simulations, so that the bias, standard error and mean squared error between the online estimation method and the traditional one can be compared. The experiments demonstrate that, compared to the traditional method, the proposed approach offers the advantages of rapid computation and unnecessary review of historical data, while being close to the traditional method in terms of mean squared error. Finally, based on the data from the China general social survey (CGSS), this study applies the online estimation method to analyze the factors influencing the quality of life of the working-age population in China. The results indicate that full-time work within the range of 30 to 60 hours per week positively contributes to improving the quality of life, providing valuable references for relevant policy formulation.

    • Model Deobfuscation Method Based on Neural Machine Translation

      2024, 33(10):163-173. DOI: 10.15888/j.cnki.csa.009666 CSTR: 32024.14.csa.009666

      Abstract (217) HTML (844) PDF 1.86 M (1249) Comment (0) Favorites

      Abstract:Model obfuscation refers to the equivalent transformation of neural networks into another form, which is an efficient and low-cost technique for protecting neural networks. To detect the flaws of model obfuscation, researchers have proposed model deobfuscation techniques in the hope of improving model obfuscation methods. However, model deobfuscation techniques are not fully explored, with limited applicability and effectiveness. Therefore, this study proposes a model deobfuscation method based on neural machine translation (NMT). This method models a deobfuscation task as a seq2seq task. It provides a more detailed sequential representation of the obfuscated model, identifies and processes the obfuscated information in the weight parameters, and utilizes an NMT-based model for deobfuscation translation. The experimental results demonstrate that this method addresses the shortcomings of existing methods, effectively capturing the obfuscation features and restoring the architectures of models. It can serve as a general solution to model deobfuscation.

    • Facial Image Generation Based on Collaborative Control of Text and Key Points

      2024, 33(10):174-182. DOI: 10.15888/j.cnki.csa.009652 CSTR: 32024.14.csa.009652

      Abstract (233) HTML (1245) PDF 1.41 M (1310) Comment (0) Favorites

      Abstract:Face image generation requires high realism and controllability. This study proposes an algorithm for face image generation that is jointly controlled by text and facial key points. The text constrains the generation of faces at a semantic level, while facial key points enable the model to control the generation of facial features, expressions, and details based on given facial information. The proposed algorithm improves the existing diffusion model and introduces additional components: text processing models (CM), keypoint control networks (KCN), and autoencoder networks (ACN). Specifically, the diffusion model is a noise inference algorithm based on the diffusion theory; CM is designed based on an attention mechanism to encode and store text information; KCN receives the location information of key points, enhancing the controllability of face generation; ACN alleviates the generation pressure of the diffusion model and reduces the time required to generate samples. In addition, to adapt to generating face images, this research constructs a dataset containing 30000 face images. In the proposed algorithm, given prerequisite text and a facial keypoint image, the model extracts feature information and keypoint information from the text, generating a highly realistic and controllable target face image. Compared with mainstream methods, the proposed algorithm improves the FID index by about 5%–23% and the IS index by about 3%–14%, which proves its superiority.

    • Traffic Signal Control Algorithm Based on Contextual Multi-armed Bandit

      2024, 33(10):183-189. DOI: 10.15888/j.cnki.csa.009645 CSTR: 32024.14.csa.009645

      Abstract (207) HTML (823) PDF 705.00 K (1302) Comment (0) Favorites

      Abstract:In recent years, the exacerbation of traffic congestion has sparked widespread interest in the research on traffic signal control algorithms. Current studies indicate that methods based on deep reinforcement learning (DRL) exhibit promising performance in simulated environments. However, challenges persist in their practical application, including substantial requirements for data and computational resources, as well as difficulties in achieving coordination between intersections. To address these challenges, this study proposes a novel traffic signal control algorithm based on a contextual multi-armed bandit model. In contrast to conventional algorithms, the proposed algorithm achieves efficient coordination between intersections by extracting the main arteries from the road network. Moreover, it employs a contextual multi-armed bandit model to facilitate rapid and effective traffic signal control. Finally, through extensive experimentation on both real and synthetic datasets, the superiority of the proposed algorithm over previous algorithms is empirically demonstrated.

    • Efficient Vertical Federated Learning Based on Embedding and Gradient Bidirectional Compression

      2024, 33(10):190-197. DOI: 10.15888/j.cnki.csa.009656 CSTR: 32024.14.csa.009656

      Abstract (223) HTML (776) PDF 749.82 K (2208) Comment (0) Favorites

      Abstract:Vertical federated learning improves the value of data utilization by combining local data features from multiple parties and jointly training the target model without leaking data privacy. It has received widespread attention from companies and institutions in the industry. During the training process, the intermediate embeddings uploaded by clients and the gradients returned by the server require a huge amount of communication, and thus the communication cost becomes a key bottleneck limiting the practical application of vertical federated learning. Consequently, current research focuses on designing effective algorithms to reduce the communication amount and improve communication efficiency. To improve the communication efficiency of vertical federated learning, this study proposes an efficient compression algorithm based on embedding and gradient bidirectional compression. For the embedding representation uploaded by the client, an improved sparsification method combined with a cache reuse mechanism is employed. For the gradient information distributed by the server, a mechanism combining discrete quantization and Huffman coding is used. Experimental results show that the proposed algorithm can reduce the communication volume by about 85%, improve communication efficiency, and reduce the overall training time while maintaining almost the same accuracy as the uncompressed scenario.

    • Transmission Line Defect Detection Based on Transfer Federated Learning

      2024, 33(10):198-204. DOI: 10.15888/j.cnki.csa.009660 CSTR: 32024.14.csa.009660

      Abstract (166) HTML (869) PDF 1.66 M (1406) Comment (0) Favorites

      Abstract:Effective detection of damage and foreign matter on transmission lines is very important for intelligent circuit inspection. However, it is difficult to collect data from different power companies to train a unified detection model due to the data island problem. Therefore, this study proposes a circuit defect detection method based on federated transfer learning by combining federated transfer learning and object detection algorithms. Specifically, a high-performance detection model is selected as the basic detection model, whose initial weight is frozen. The model adaptively learns from the data of different clients by using the low-rank decomposition of the weight matrix and inserting an adapter layer, so as to greatly reduce the number of the trainable parameters. An adaptive weight screening method is also proposed to accurately determine the low-rank decomposition of the weight layer and the insertion position of the adapter layer of the model. Through simple adaptive learning, the model can effectively adapt to the data distributions from different power companies. Experimental verification on a power dataset that closely resembles real-world conditions shows that the proposed model can adapt to different distributed detection scenarios under the premise of ensuring the security and privacy of customer data.

    • Verifiable Progressive Secret Image Sharing to Resist Dishonest Participant Cheating Attacks

      2024, 33(10):205-216. DOI: 10.15888/j.cnki.csa.009657 CSTR: 32024.14.csa.009657

      Abstract (165) HTML (859) PDF 3.17 M (1359) Comment (0) Favorites

      Abstract:Current progressive secret image sharing schemes do not consider cheating attacks by dishonest participants, allowing them to use false shadow images for cheating attacks. To ensure successful progressive reconstruction, this study divides the bit plane of pixels into two parts and uses the Lagrange interpolation algorithm along with visual cryptography schemes to address this issue. The sliding window of the pixel bit plane is determined by a pseudo-random number, and authentication information is embedded into the sliding window through a filtering operation to achieve authentication capability. Additionally, different strategies for bit plane division produce different progressive reconstruction effects, enabling more flexible progressive reconstruction. Theoretical analysis and experimental results both demonstrate the effectiveness of the proposed scheme.

    • High-frequency Enhanced Dual-branch Hyperspectral Image Super-resolution Network

      2024, 33(10):217-227. DOI: 10.15888/j.cnki.csa.009643 CSTR: 32024.14.csa.009643

      Abstract (227) HTML (831) PDF 3.74 M (1575) Comment (0) Favorites

      Abstract:The narrow spectral bands of hyperspectral images (HSI) provide rich information for many visual tasks, but also pose challenges for feature extraction. Despite various deep learning methods proposed by researchers, the advantages of these architectures are not fully combined. Therefore, this study proposes a high-frequency enhanced dual-branch hyperspectral image super-resolution network (HFEDB-Net) that effectively extracts spatial and spectral information of HSI by integrating the image spatial feature extraction advantage of convolutional neural network (CNN) with the adaptive capability and long-distance dependency extraction advantage of Transformers. HFEDB-Net consists of a high-frequency information enhancement branch and a backbone branch. In the high-frequency information enhancement branch, the high-frequency information of low-resolution and high-resolution HSI is extracted by using Laplacian pyramids, and the results serve as the input and label for the high-frequency branch. A spectral-enhanced Transformer is employed as the feature extraction method for this branch. In the backbone branch, a CNN with channel attention is utilized to extract spatial features and spectral information comprehensively. Finally, the results from both branches are combined through CNN to obtain the final reconstructed image. Additionally, the attention mechanism and encoder layers of the Transformer are respectively improved by using multi-head attention and multi-scale strategies to better extract spatial and spectral information from HSI. Experimental results demonstrate that HFEDB-Net outperforms current state-of-the-art methods in terms of quantitative evaluation metrics and visual effects on two public datasets.

    • Tongue Image Syndrome Classification Integrated with Multiple Attention

      2024, 33(10):228-235. DOI: 10.15888/j.cnki.csa.009665 CSTR: 32024.14.csa.009665

      Abstract (170) HTML (767) PDF 1.38 M (1498) Comment (0) Favorites

      Abstract:Intelligent tongue diagnosis is of great significance in assisting doctors in medical treatment. At present, intelligent tongue diagnosis is mainly focused on the prediction and classification of single tongue image features, making it difficult to provide substantial help in the diagnostic process. To make up for this deficiency, research of accurate prediction and classification is carried out from the level of tongue image syndrome to assist doctors in diagnosing diseases. The TUNet is used to segment the tongue, and a parallel residual network PMANet integrated with the multi-attention mechanism is proposed to classify the syndrome of tongue image. the pixel accuracy (PA), mean intersection over union (MIoU) and Dice coefficient of TUNet reach 99.7%, 98.4%, and 99.2%, respectively, improved by 3.2%, 9.0%, and 4.8% compared with the baseline U-Net. In the research of tongue image syndrome classification, PMA’s total amount of parameters is 12.34M, slightly higher than that of EfficientNet, and its total amount of floating-point calculations is 1.021G, significantly lower than all compared networks. Under the background of a lower amount of both parameters and floating-point calculations, the classification accuracy of PMANet reaches 95.7%, achieving a balance between precision, parameter amount, and floating-point calculations amount. This method provides support for the research of intelligent tongue diagnosis and is expected to promote the modernization of TCM tongue diagnosis.

    • Point Cloud Cegmentation for 3D Visual Guidance

      2024, 33(10):236-244. DOI: 10.15888/j.cnki.csa.009664 CSTR: 32024.14.csa.009664

      Abstract (180) HTML (862) PDF 5.12 M (1302) Comment (0) Favorites

      Abstract:Point cloud segmentation is a crucial step in 3D visual guidance and scene understanding, whose quality directly affects the quality of 3D measurement or imaging. To improve the segmentation accuracy and solve the out-of-bounds problem, this study proposes a point cloud segmentation algorithm for 3D vision guidance. This algorithm generates initial supervoxel data and extracts boundary points based on the spatial position, curvature and normal vectors of the point cloud. Boundary refinement is then performed, which refers to the redistribution of boundary points to optimize supervoxels, by calculating the similarity measure between boundary points and neighboring supervoxels. Ultimately, candidate fragments are obtained based on region growing and merged according to their concavity and convexity to achieve object-level segmentation. Visualization and quantitative comparison show that this algorithm effectively solves the out-of-bounds problem and accurately segment complex point cloud models. The segmentation accuracy is 89.04% and the recall rate is 87.38%.

    • Oversampling Method Based on Shared Nearest Neighbors for Density Peak Clustering

      2024, 33(10):245-254. DOI: 10.15888/j.cnki.csa.009607 CSTR: 32024.14.csa.009607

      Abstract (220) HTML (801) PDF 1.01 M (1433) Comment (0) Favorites

      Abstract:In imbalanced datasets, the presence of noise and class overlapping often leads to poor performance of traditional classifiers, resulting in minority class samples being difficult to classify accurately. To improve classification performance, a method for handling imbalanced data based on shared nearest neighbor density peak clustering and ensemble filtering mechanism is proposed. This method first uses the shared nearest neighbor density peak clustering algorithm to adaptively divide the minority class samples into multiple clusters. Then, based on the density and size within the clusters, oversampling weights are allocated to each cluster. During the synthesis within clusters, the local sparsity and clustering coefficient of the samples are considered to select neighboring samples and determine the weight range of linear interpolation, thus avoiding the generation of new samples in the majority class aggregation area. Finally, an ensemble filtering mechanism is introduced to eliminate noise and hard-to-learn boundary samples to regulate the decision boundary and improve the quality of generated samples. Compared with 5 oversampling methods, this algorithm performs better overall on 8 public datasets.

    • Multimodal Sentiment Analysis Using Interpolation Optimization Features

      2024, 33(10):255-262. DOI: 10.15888/j.cnki.csa.009614 CSTR: 32024.14.csa.009614

      Abstract (211) HTML (841) PDF 695.98 K (1535) Comment (0) Favorites

      Abstract:Currently, in multimodal sentiment analysis tasks, there are problems such as insufficient single modal feature extraction and lack of stability in data fusion methods. This study proposes a method of optimizing modal features that uses interpolation to solve these problems. Firstly, the interpolation-optimized BERT and GRU models are applied to extract features, and both of the models are used to mine text, audio, and video information. Secondly, an improved attention mechanism is used to fuse text, audio, and video information, thus achieving modal fusion more stably. This method is tested on the MOSI and MOSEI datasets. The experimental results show that using interpolation can improve the accuracy of multi-modal sentiment analysis tasks based on optimizing modal features. This result verifies the effectiveness of interpolation.

    • RelightGAN: Generative Adversarial Network for Dark Image Enhancement

      2024, 33(10):263-269. DOI: 10.15888/j.cnki.csa.009654 CSTR: 32024.14.csa.009654

      Abstract (230) HTML (793) PDF 1.57 M (1439) Comment (0) Favorites

      Abstract:Aiming at the problems of insufficient light, low contrast, and information loss in images taken by imaging devices at night or in low-light environments, an improved dark image enhancement network named RelightGAN is designed based on generative adversarial network (GAN). It contains two discriminators and one generator, and the generator is jointly constrained by two sets of adversarial losses and cyclic losses to generate a better illumination layer. To enhance the recovery of image details during network training, a residual network is introduced to solve the gradient vanishing problem. At the same time, a hybrid attention mechanism CBAM structure is introduced to increase the generator’s attention to important information and structures in the image, enhancing network expression capability. By comparing the image enhanced by RelightGAN with those enhanced by other dark image enhancement networks, the peak signal-to-noise ratio (PSNR) of the former is improved by 12.81% and the structural similarity (SSIM) is enhanced by 5.95%. Experimental results show that the RelightGAN network combines the advantages of traditional algorithms and neural networks to improve dark scene images and image visibility.


Volume 第33卷, No. 10

Table of Contents

Archive

Volume

Issue

联系方式
  • 《计算机系统应用》
  • 1992年创刊
  • 主办单位:中国科学院软件研究所
  • 邮编:100190
  • 电话:010-62661041
  • 电子邮箱:csa (a) iscas.ac.cn
  • 网址:http://www.c-s-a.org.cn
  • 刊号:ISSN 1003-3254
  • CN 11-2854/TP
  • 国内定价:50元
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063