ZHANG Sheng-Yao , PAN Xu-Dong , ZHANG Mi
2024, 33(4):1-12. DOI: 10.15888/j.cnki.csa.009459 CSTR: 32024.14.csa.009459
Abstract:Training of deep neural networks (DNN) in mission-critical scenarios involves increasingly more resources, which stimulates model stealing from prediction API at the cloud and violates the intellectual property rights of the model owners. To trace public illegal model copies, DNN model fingerprint provides a promising copyright verification option for model owners who want to preserve the model integrity. However, existing fingerprinting schemes are mainly based on output-level traces (e.g., mis-prediction behavior on special inputs) to cause limited stealthiness during model fingerprint verification. This study proposes a novel task-agnostic fingerprinting scheme based on saliency map traces of model prediction. The proposed scheme puts forward a constrained manipulation objective of saliency maps to construct clean-label and natural fingerprint samples, thus significantly improving the stealthiness of model fingerprints. According to extensive evaluation results on three typical tasks, this scheme is proven to substantially enhance the fingerprint effectiveness of existing schemes and remain highly stealthy of model fingerprints.
XIAN Guang-Ming , YANG Xian-Ping , ZHAO Zhi-Feng
2024, 33(4):13-25. DOI: 10.15888/j.cnki.csa.009461 CSTR: 32024.14.csa.009461
Abstract:Multimodal sentiment analysis aims to assess users’ sentiment by analyzing the videos they upload on social platforms. The current research on multimodal sentiment analysis primarily focuses on designing complex multimodal fusion networks to learn the consistency information among modalities, which enhances the model’s performance to some extent. However, most of the research overlooks the complementary role played by the difference information among modalities, resulting in sentiment analysis biases. This study proposes a multimodal sentiment analysis model called DERL (dual encoder representation learning) based on dual encoder representation learning. This model learns modality-invariant representations and modality-specific representations by a dual encoder structure. Specifically, a cross-modal interaction encoder based on a hierarchical attention mechanism is employed to learn the modality-invariant representations of all modalities to obtain consistency information. Additionally, an intra-modal encoder based on a self-attention mechanism is adopted to learn the modality-specific representations within each modality and thus capture difference information. Furthermore, two gate network units are designed to enhance and filter the encoded features and enable a better combination of modality-invariant and modality-specific representations. Finally, during fusion, potential similar sentiment between different multimodal representations is captured for sentiment prediction by reducing the L2 distance among them. Experimental results on two publicly available datasets CMU-MOSI and CMU-MOSEI show that this model outperforms a range of baselines.
SHAO Run-Hua , LIU Jing , MA Jin-Gang , WANG Yi-Fan , CHEN Tian-Zhen , LI Ming
2024, 33(4):26-38. DOI: 10.15888/j.cnki.csa.009466 CSTR: 32024.14.csa.009466
Abstract:Liver cancer is a malignant liver tumor that originates from liver cells, and its diagnosis has always been a difficult medical problem and a research hotspot in various fields. Early diagnosis of liver cancer can reduce the mortality rate of liver cancer. Histopathological image examination is the gold standard for oncology diagnosis as the images can display the cells and tissue structures of tissue slices, which can be employed to determine cell types, tissue structures, and the number and morphology of abnormal cells, and evaluate the specific condition of the tumor. This study focuses on the application of convolutional neural networks in liver cancer diagnosis algorithms for pathological images, including liver tumor detection, image segmentation, and preoperative prediction. The design ideas and related improvement goals and methods of each algorithm of convolutional neural networks are elaborated in detail to provide clearer reference ideas for researchers. Additionally, the advantages and disadvantages of convolutional neural network algorithms in diagnosis are summarized and analyzed, with potential research hotspots and related difficulties in the future discussed.
WU Bo , SHI Dong-Hui , LYU Dong-Lai , HU Tao
2024, 33(4):39-49. DOI: 10.15888/j.cnki.csa.009469 CSTR: 32024.14.csa.009469
Abstract:The multi-client brain tumor classification method based on the convolutional block attention module has inadequate extraction of tumor region details from MRI images, and channel attention and spatial attention interfere with each other under the federated learning framework. In addition, the accuracy in classifying medical tumor data from multiple points is low. To address these problems, this study proposes a brain tumor classification method that amalgamates the federated learning framework with an enhanced CBAM-ResNet18 network. The method leverages the federated learning characteristic to collaboratively work with brain tumor data from multiple sources. It replaces the ReLU activation function with Leaky ReLU to mitigate issues of neuron death. The channel attention module within the convolutional block attention module is modified from a dimension reduction followed by a dimension increment approach to a dimension increment followed by a dimension reduction approach. This change significantly enhances the network’s ability to extract image details. Furthermore, the architecture of the channel attention module and spatial attention module in the convolutional block attention module has been shifted from a cascade structure to a parallel structure, ensuring that the network’s feature extraction capability remains unaffected by the order of processing. A publicly available brain tumor MRI dataset from Kaggle is used in the study. The results demonstrate that FL-CBAM-DIPC-ResNet has a remarkable performance. It achieves impressive accuracy, precision, recall, and F1 score of 97.78%, 97.68%, 97.61%, and 97.63%, respectively. These values of accuracy, precision, recall, and F1 score are 6.54%, 4.78%, 6.80%, and 7.00% higher than those of the baseline model. These experimental findings validate that the proposed method not only overcomes data islands and enables data fusion from multiple sources but also outperforms the majority of existing mainstream models in terms of performance.
FENG Jin-Yu , ZHANG Kui-Xing , ZHANG Tie-Lin , LI Yan-Jun
2024, 33(4):50-59. DOI: 10.15888/j.cnki.csa.009486 CSTR: 32024.14.csa.009486
Abstract:The visually impaired are a vulnerable group in society and face many obstacles when traveling independently. Providing safe and reliable auxiliary equipment for the visually impaired reflects the progress of social civilization. This study introduces the key technologies for obstacle detection and identification and path planning related algorithms for assisting visually impaired travel. The study mainly analyzes path planning algorithms after obstacle detection, comprehensively compares the application characteristics and scenarios of various technologies, and discusses the research progress of related methods in visually impaired assistive devices. In addition, it summarizes the current application status of multi-technology integration in intelligent assistance equipment. On this basis, combined with the advancement of technologies such as artificial intelligence and embedded devices, the future development direction of auxiliary visually impaired travel equipment is prospected.
CAO Jie , WANG Qiao , LIANG Hao-Peng , WANG Chen-Zhang , LI Xiao-Xu , YU Hong
2024, 33(4):60-68. DOI: 10.15888/j.cnki.csa.009458 CSTR: 32024.14.csa.009458
Abstract:Inaccurate phase estimation in single-channel speech enhancement tasks will cause poor quality of the enhanced speech. To this end, this study proposes a speech enhancement method based on a deep complex axial self-attention convolutional recurrent network (DCACRN), which enhances speech amplitude information and phase information in the complex domain simultaneously. Firstly, a complex convolutional network-based encoder is employed to extract complex features from the input speech signal, and a convolutional hopping module is introduced to map the features into a high-dimensional space for feature fusion, which enhances the information interaction and the gradient flow. Then an encoder-decoder structure based on the axial self-attention mechanism is designed to enhance the model’s timing modeling ability and feature extraction ability. Finally, the reconstruction of the speech signals is realized by the decoder, while the hybrid loss function is adopted to optimize the network model to improve the quality of enhanced speech signals. Meanwhile, the mixed loss function is utilized to optimize the network model and improve the quality of enhanced speech signals. The experiments are conducted on the public datasets Valentini and DNS Challenge, and the results show that the proposed method improves both the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) metrics compared to other models. In the non-reverberant dataset, PESQ is improved by 12.8% over DCTCRN and 3.9% over DCCRN, which validates the effectiveness of the proposed model in speech enhancement tasks.
ZHENG Jia-Cheng , HE Heng , CHEN Yue-Jia , XIAO Tian-Zhe
2024, 33(4):69-81. DOI: 10.15888/j.cnki.csa.009464 CSTR: 32024.14.csa.009464
Abstract:Ciphertext-policy attribute-based encryption (CP-ABE) can provide fine-grained access control while guaranteeing data privacy. Considering that the existing CP-ABE-based access control schemes can not effectively address critical data security in edge computing, this study proposes a blockchain-based lightweight access control scheme over ciphertext (BLAC) in edge computing. In BLAC, a lightweight CP-ABE algorithm based on elliptic curve cryptography is designed, and fast elliptic curve scalar multiplication is adopted to realize algorithm encryption and decryption. Additionally, most of the encryption and decryption operations are securely transferred to make user devices with limited computing power efficiently complete the fine-grained access control process of ciphertext data with the assistance of edge servers. Meanwhile, a distributed key management method based on blockchain is designed, which enables multiple edge servers to collaboratively distribute private keys for users by blockchain. Security analysis and performance evaluation show that BLAC can guarantee data confidentiality, resist conspiracy attacks, and support forward security. Additionally, it has high user-side computational efficiency and low server-side decryption overhead and storage overhead.
CAO Wei , WANG Xing , ZOU Fu-Min , JIN Biao , WANG Xiao-Jun
2024, 33(4):82-92. DOI: 10.15888/j.cnki.csa.009451 CSTR: 32024.14.csa.009451
Abstract:Traffic flow prediction is an important method for achieving urban traffic optimization in intelligent transportation systems. Accurate traffic flow prediction holds significant importance for traffic management and guidance. However, due to the high spatiotemporal dependence, the traffic flow exhibits complex nonlinear characteristics. Existing methods mainly consider the local spatiotemporal features of nodes in the road network, overlooking the long-term spatiotemporal characteristics of all nodes in the network. To fully explore the complex spatiotemporal dependencies in traffic flow data, this study proposes a Transformer-based traffic flow prediction model called multi-spatiotemporal self-attention Transformer (MSTTF). This model embeds temporal and spatial information through position encoding in the embedding layer and integrates various self-attention mechanisms, including adjacent spatial self-attention, similar spatial self-attention, temporal self-attention, and spatiotemporal self-attention, to uncover potential spatiotemporal dependencies in the data. The predictions are made in the output layer. The results demonstrate that the MSTTF model achieves an average reduction of 10.36% in MAE compared to the traditional spatiotemporal Transformer model. Particularly, when compared to the state-of-the-art PDFormer model, the MSTTF model achieves an average MAE reduction of 1.24%, indicating superior predictive performance.
LIU Xin-Yao , LIANG Jun , YU Jia-Lin
2024, 33(4):93-102. DOI: 10.15888/j.cnki.csa.009460 CSTR: 32024.14.csa.009460
Abstract:To solve the problem that it is difficult for neural networks to obtain enough information to correctly classify images by using a small amount of labeled data, this study proposes a new relational network, SDM-RNET, which combines random deep network and multi-scale convolution. First, a stochastic deep network is introduced into the model embedding module to deepen the model depth. Then, in the feature extraction stage, multi-scale depth-separable convolution is adopted to replace ordinary convolution for feature fusion. After the backbone network, deep and shallow layer feature fusion is applied to obtain richer image features and finally learn to predict the categories of images. Compared with other small sample image classification methods on mini-ImageNet, RP2K, and Omniglot datasets, the results show that the proposed method has the highest accuracy on 5-way 1-shot and 5-way 5-shot classification tasks.
2024, 33(4):103-112. DOI: 10.15888/j.cnki.csa.009479 CSTR: 32024.14.csa.009479
Abstract:Existing scene text recognizers are prone to be troubled by blurred text images, leading to poor performance in practical applications. Therefore, several scene text image super-resolution models have been proposed as the pre-processor for text recognizers to improve the quality of input images. However, real-world training samples for the scene text image super-resolution task are difficult to collect. In addition, existing STISR models only learn to transform low-resolution (LR) text images into high-resolution (HR) text images while ignoring blurring patterns from HR to LR images. This study proposes a blurring pattern aware module (BPAM), which learns blurring patterns from existing real-world HR-LR pairs and transfers them to other HR images for generating LR images with different degrees of degradation. Therefore, the proposed BPAM can produce massive HR-LR pairs for STISR models to compensate for the deficiency of training data, significantly improving performance. The experimental results show that when equipped with the proposed BPAM, the performance of SOTA STISR methods can be further improved. For instance, the SOTA method TG achieves a 5.8% improvement in recognition accuracy with CRNN for evaluation.
2024, 33(4):113-122. DOI: 10.15888/j.cnki.csa.009450 CSTR: 32024.14.csa.009450
Abstract:At present, since the recognition of most students’ classroom behavior is mainly based on a single frame image and ignores behavior coherence, video information cannot be made full use of to accurately depict students’ classroom behavior. Therefore, this study proposes an improved YOWO algorithm model to effectively employ video information to identify students’ classroom behavior. First, this paper collects teaching videos from real classroom teaching in a university and produces an AVA format video dataset containing five types of students’ classroom behavior. Second, the temporal shift module (TSM) is adopted to enhance the ability of this model to obtain time context information. Finally, a non-local operation module is utilized to improve the ability of the model to extract key location information. The experimental results show that by optimizing the YOWO model, the recognition performance of the network is better. In the classroom behavior dataset, the mAP value of the improved algorithm is 95.7%, 4.6% higher than that of the original YOWO algorithm. The parameter number in the model is reduced by 32.3% at 81.97×106 and the calculation amount is decreased by 9.6% at 22.6 GFLOPs. The detection speed is 24.03 f/s, an increase of about 3 f/s.
ZHAO Zhen-Bo , REN Xue-Rong , FU Qing-Kun
2024, 33(4):123-132. DOI: 10.15888/j.cnki.csa.009478 CSTR: 32024.14.csa.009478
Abstract:As the resources of edge servers are limited, how to design a reasonable resource management and task scheduling scheme is important research. To improve the utility of system services, this study proposes the strategy of joint resource allocation and computing offloading. Firstly, the optimal matching of communication and computing resources is obtained by binary search and the Lagrange multiplier method. Then, the offloading decision is made based on the whale optimization algorithm integrating with multiple strategies, including adjusting the convergence factor with a nonlinear change strategy of the exponential power, the adaptive weight strategy balancing the exploration and utilization stage, and the wandering strategy of the triangle and Levy flight. Besides, the study introduces a penalty function in fitness evaluation to satisfy the constraint of user access. Finally, it formulates a V-shaped transfer function to make binary offloading decisions. The simulation results show that in various indicator evaluations with other benchmark schemes, the proposed strategy can effectively increase network throughput and significantly improve system utility.
XU Kang-Ye , CHEN Jian-Ping , CHEN Ping-Hua
2024, 33(4):133-142. DOI: 10.15888/j.cnki.csa.009465 CSTR: 32024.14.csa.009465
Abstract:The variability in size, shape, color, and texture, along with the blurring demarcation of the bowel wall, presents a significant challenge in colon polyp segmentation. The detail information loss and lack of interaction between different feature levels due to continuous sampling in single-branch networks lead to poor segmentation results. To address this problem, this study proposes a two-branch colon polyp segmentation network based on local-global feature interaction. The network utilizes a dual branch structure consisting of CNN and Transformer, systematically capturing the precise local details and the global semantic features of the polyp in each layer. To make full use of the complementary nature of feature information at different levels and scales, and to utilize the guidance and enhancement of shallow detailed features by deep semantic features, the paper designs the feature cooperative interaction module to dynamically sense and aggregate cross-level feature interaction information. To enhance the feature of the polyp lesion region while reducing background noise, the feature enhancement module utilizes spatial and channel attention mechanisms. Additionally, the skip-connection mechanism in conjunction with the attention gate further highlights boundary information, resulting in improved edge region segmentation accuracy. Experiments show that the proposed network achieves better mDice and mIoU scores than the baseline network on multiple polyp segmentation datasets, with higher segmentation accuracy and stability.
YU Meng-Fei , YANG Hai-Bo , LU Xin , JIA Jun-Ying
2024, 33(4):143-151. DOI: 10.15888/j.cnki.csa.009474 CSTR: 32024.14.csa.009474
Abstract:The crowd density detection algorithm based on deep learning has made great progress, while there is still a lot of room for improvement in the detection accuracy and robustness of the algorithm in actual complex scenes. Factors such as inconsistent object scales and background information interference in complex scenes make crowd density detection a challenging task. Aiming at this problem, this study proposes a crowd density detection network based on multi-scale feature fusion. The network first uses images of different resolutions to interactively extract coarse and fine-grained features of the crowd and introduces a multi-level feature fusion mechanism to make full use of multi-level scale information. Secondly, the study utilizes the spatial and channel attention mechanism to highlight the weight of crowd characteristics, focus on interested crowds, reduce background information interference, and generate high-quality density maps. Experimental results show that the crowd density detection network with multi-scale feature fusion has better accuracy and robustness than representative crowd density detection methods on multiple typical public datasets.
2024, 33(4):152-161. DOI: 10.15888/j.cnki.csa.009445 CSTR: 32024.14.csa.009445
Abstract:To address the problems of noise interference and missed detection of small objects in water surface object detection, this study proposes an improved You Only Look Once version 8 (YOLOv8) algorithm for water surface small object detection, namely, YOLOv8-WSSOD. Specifically, to reduce the noise interference caused by the complex water surface environment during the downsampling in the backbone network, the study proposes the C2f-BiFormer (C2fBF) module constructed based on BiFormer’s bi-level routing attention mechanism to retain fine-grained contextual feature information during feature extraction. Then, as to the missed detection of small objects on the water surface, a smaller detection head is added to enhance the network’s sensitivity to small objects. At the Neck end, the ghost-shuffle convolution (GSConv) and Slim-neck structures are used to reduce the model’s complexity and maintain precision. Finally, the limitations of the complete intersection over union (CIoU) loss function are overcome by the minimum point distance-based IoU (MPDIoU) loss function to improve the model’s detection precision. The experimental results show that compared with the original YOLOv8 algorithm, the proposed algorithm increases the mean average precision mAP@0.5 and mAP@0.5:0.95 on small objects on the water surface by 4.6% and 2.2%, respectively. Furthermore, the modified algorithm, achieving a detection speed of 86 f/s, is readily available for fast and accurate detection of small objects on the water surface.
XIE Chang-Zuo , LI Zi-Yang , DONG Yu-Min , LI Xue-Song , SHU Zhan , YANG Guang
2024, 33(4):162-170. DOI: 10.15888/j.cnki.csa.009462 CSTR: 32024.14.csa.009462
Abstract:In the era of big data, the number of algorithms used for data processing is exploding. The current management method for a large number of algorithms is usually to classify and label the algorithms, or store task flows composed of algorithms on a task-by-task basis, while insufficient attention has been paid to the topological relationships between algorithms in the task set. With the accumulation of domain knowledge and task flows, the dependency between algorithms becomes increasingly important. Based on the requirement of massive algorithm management, this study proposes a management method for splitting branched dependencies into unbranched dependencies. By searching for topological relationships through pointers in an index-free adjacency graph database, it avoids Join operations and has innate advantages in managing algorithm dependencies. In addition, this study proposes connection points to highlight the reusability of algorithm modules, which are utilized to represent dependency edges in the graph model. The position of algorithm modules in different task flows can be distinguished, so that algorithm modules reused by multiple tasks only need to be represented by one algorithm module node in the graph. Finally, based on specific projects, the algorithm relationship management method proposed in this study is validated. It is proved that the algorithm relationship management method has significant advantages in scenarios where the number of algorithms is large and the algorithm modules are highly reusable.
2024, 33(4):171-178. DOI: 10.15888/j.cnki.csa.009448 CSTR: 32024.14.csa.009448
Abstract:By directly processing each view of original data, multi-view subspace clustering algorithms typically obtain potential subspace representation matrices. However, these methods often underestimate the influences of redundant data, making it challenging to accurately capture the accurate clustering results in the potential subspace representation. Furthermore, the K-means algorithm used to produce the clustering results easily neglects the local structure of the data within the subspaces, leading to unstable results. To address the aforementioned problems, this study proposes a multi-view subspace method to acquire high-quality subspace representations. Specifically, the study initially gets a robust representation through a feature decomposition method. Then, it constructs a joint latent subspace representation for multiple views. Next, it uses spectral rotation to obtain clustering results and employs orthogonal constraints on the partition matrix to reconstruct the subspaces, thereby enhancing clustering performance. Finally, an iterative optimization algorithm is applied to solve relevant optimization problems. Experiments are conducted on five benchmark datasets, and the results demonstrate that the proposed algorithm is more effective than recent multi-view clustering algorithms.
YE Bo-Wen , JIA Xiao-Lin , GU Ya-Jun
2024, 33(4):179-186. DOI: 10.15888/j.cnki.csa.009455 CSTR: 32024.14.csa.009455
Abstract:With the development of the Internet of Things (IoT), efficient consensus algorithms are the key to applying blockchain technology to the IoT. This study proposes an improved PBFT consensus algorithm based on the binary K-means practical Byzantine fault tolerance algorithm (BK-PBFT) to address the issues of high communication times, lack of consideration for consensus power consumption, and high consensus latency in IoT scenarios. Firstly, it obtains the geographic coordinates of the nodes, calculates the comprehensive evaluation values of the nodes, and divides the nodes into a two-layer multi-center clustering cluster by the binary K-means algorithm. Then, PBFT consensus is performed on the blocks in the lower-level cluster and then in the upper-level cluster. Finally, the cluster validates and stores the blocks to complete the consensus. Additionally, this study proves that the algorithm can achieve the minimum number of communication times when nodes are evenly distributed in each cluster, and obtain the optimal cluster number under the least communication times. The analysis and simulation results show that the proposed algorithm can effectively reduce communication times, consensus power consumption, and consensus latency.
2024, 33(4):187-193. DOI: 10.15888/j.cnki.csa.009467 CSTR: 32024.14.csa.009467
Abstract:Aiming at the current inaccurate predictions in 3D human pose due to factors such as occlusion and complexity of poses, this paper proposes an improved 3D human pose estimation algorithm to obtain accurate 3D human pose and enhance the performance of human pose estimation. Meanwhile, it adopts the graph attention block from the spatio-temporal graph attention convolutional network to construct the entire network. On this basis, the network structure of the global multi-head graph attention part is improved to facilitate better information propagation and fusion among nodes and capture semantic information not explicitly represented in the graph. Kinematic constraints are introduced as well, and a bone length loss is added based on the MPJPE loss. By the modeling of local and global spatial node information, the learning of kinematic constraints of human skeletal movements is achieved, including local kinematic connections, symmetry, and global poses. Empirical results show that the improved model effectively enhances the performance of human pose estimation. Compared to the original model on the Human3.6M dataset, a 1.8% improvement in mean per joint position error (MPJPE) and a 1.3% improvement in the Procrustes aligned MPJPE (P-MPJPE) after rigid alignment of predicted and true joints have been realized.
WAN Jia-Long , KUANG Li-Qun , CAO Ya-Ming , GUO Lei , XIONG Feng-Guang
2024, 33(4):194-201. DOI: 10.15888/j.cnki.csa.009481 CSTR: 32024.14.csa.009481
Abstract:The images generated by low-light image enhancement algorithms based on deep learning generally have problems such as noise highlighting and detail loss. However, the performance of end-to-end deep learning algorithms largely depends on the extraction ability of the backbone network. Therefore, exploring more effective backbone network structures can improve the performance benefits of low-light enhancement tasks. This study proposes an image enhancement algorithm based on a composite backbone network fusion strategy, which integrates backbone networks from different image enhancement algorithms to improve the overall network’s feature extraction ability. The algorithm integrates feature information from different backbone networks layer by layer and guides composite features into the decoder. It then fully utilizes different upsampling methods to stack the fused features of the backbone network, ultimately generating images under normal lighting conditions. Through quantitative and qualitative comparative experiments with existing mainstream algorithms, the results show that our method significantly improves the brightness of low-light images while preserving the detailed features of the images. In terms of objective indicators such as peak signal-to-noise ratio and structural similarity, it achieves 24.35 dB and 0.871 in the LOL-V2 dataset, effectively solving the problems of noise highlighting and detail loss after image enhancement.
WANG Wen , LIU Yuan-Xing , WU Xiang-Ning , LI Wen-Chi , TU Yu , ZHANG Feng , FANG Heng , CAI Ze-Yu
2024, 33(4):202-208. DOI: 10.15888/j.cnki.csa.009442 CSTR: 32024.14.csa.009442
Abstract:The relationship extraction method based on remote supervision can cut the cost of labor-based annotated datasets and has been widely used in the construction of the domain knowledge graph. However, the existing remote supervised relationship extraction methods are not domain-specific and also neglect the utilization of domain entity feature information. To solve the above problems, this study proposes a relationship extraction model PCNN-EFMA that integrates entity features and multiple types of attention mechanisms. The model adopts remote supervision and multi-instance technology, no longer limited by labor-based annotation. At the same time, to reduce the impact of noise in remote supervision, the model uses two types of attention: sentence attention and inter-packet attention. In addition, it integrates entity feature information in the word embedding layer and sentence attention, enhancing the model’s feature selection ability. Experiments show that the PR curve of this model is better on the domain dataset, and its average accuracy on P@N is better than that of the PCNN-ATT model.
2024, 33(4):209-214. DOI: 10.15888/j.cnki.csa.009472 CSTR: 32024.14.csa.009472
Abstract:Currently, the application of blockchain in the supply chain is receiving increasing attention from the industry.However, due to the presence of a large number of complex transactions in the supply chain, selecting trustworthy primary nodes poses a challenge. Therefore, based on the machine learning classification algorithms and PBFT (practical Byzantine fault tolerance), this study proposes a blockchain PBFT optimization method applied to the supply chain. The integrated framework for the supply chain and blockchain is analyzed, and K-nearest neighbors (K-NN) is applied to optimize the primary node selection rules of the PBFT consensus algorithm based on the features of participating nodes in the supply chain consensus. Experimental results show that trust evaluation classification of consensus nodes can effectively address efficiency issues caused by view switching, thereby improving the consensus performance of blockchain in terms of throughput, latency, fault tolerance, and other aspects. The proposed method is practical and provides ideas for the application of blockchain in other industries.
HUANG Jian , WANG Hu , ZHAO Xiao-Fei
2024, 33(4):215-225. DOI: 10.15888/j.cnki.csa.009456 CSTR: 32024.14.csa.009456
Abstract:In the face of large-scale image defects and irregular damage areas, existing image restoration methods often produce results with structural inconsistencies and blurry texture details. This study proposes an image restoration algorithm using the generated edge map and multi-scale feature fusion—MSFGAN (multi-scale feature network model based on edge condition). The model adopts a two-stage network design, using the edge map as a restoration condition to constrain the structural aspects of the restoration results. Firstly, the Canny operator is used to extract the edge map of the image to be restored, generating a complete edge map. Then, the complete edge map is combined with the image to be restored for image restoration. To address common issues in image restoration algorithms, an Attention Mechanism Multi-Fusion convolution block (AM block) is proposed, integrating an attention mechanism for feature extraction and fusion of damaged images. Skip connections are introduced in the decoder part of the image restoration network to fuse high-level semantics and low-level features, achieving high-quality detail and texture restoration. Test results on the CelebA and Places2 datasets show that MSFGAN has improved restoration quality compared to current methods. In the 20%–30% mask ratio, the average improvement of SSIM is 0.0291, and PSNR improvement is 1.535 dB. Ablation experiments validate the effectiveness of the proposed optimization and innovations in image restoration tasks.
2024, 33(4):226-234. DOI: 10.15888/j.cnki.csa.009452 CSTR: 32024.14.csa.009452
Abstract:This study proposes a multi-hierarchical classification method for marine organisms. Marine organisms are diverse, and organisms of the same phylum have strong inter-class similarity, while organisms of various phyla have large differences. Meanwhile, a multi-hierarchical classification method is designed by utilizing the similarity among species to help the network learn biological prior knowledge. Additionally, this study designs a C-MBConv module and improves the EfficientNetV2 network architecture by combining the multi-hierarchical classification method, and the improved network architecture is called CM-EfficientNetV2. The experiments show that CM-EfficientNetV2 has higher accuracy than the original network EfficientNetV2, with an accuracy improvement of 1.5% on the inter-tidal marine biology dataset of the Nanji Islands and 2% on CIFAR-100.
LIU Jia-Lin , HE Ze-Yu , LI Jun
2024, 33(4):235-245. DOI: 10.15888/j.cnki.csa.009470 CSTR: 32024.14.csa.009470
Abstract:Recently, reinforcement learning techniques have achieved success in sequence recommendation systems, as they can learn effective recommendation strategies from long-term user feedback signals. However, the design of the model’s reward function faces the challenge of low discriminability. This limits the model’s ability to learn the value differences between different user feedback signals, leading to suboptimal recommendation strategies. Existing studies mainly ensure discriminability of the reward function by adjusting decay factors, but this relies on expert prior knowledge and lacks a theoretical foundation. In order to more reasonably design the reward function and enhance its discriminability, this study analyzes the recommendation system based on counterfactual reasoning and proposes a sequence recommendation algorithm CAL4Rec based on counterfactual discriminability enhancement. Firstly, the proposed method uses structural causal graphs to describe the sequence recommendation process and creatively defines causally identifiable value reward discriminability using causal graphs. Secondly, this method uses a counterfactual generative adversarial self-supervised learning process to optimize the recommendation strategy network and learn the user’s true preferences. Extensive comparative and ablation experiments were conducted on a series of sequence recommendation benchmark datasets for CAL4Rec, and the experimental results show that CAL4Rec’s improvement is effective for various network implementation structures (average 2.34%).
XIAN Guang-Ming , LI Fan-Long , ZHENG Zhao-Ming
2024, 33(4):246-253. DOI: 10.15888/j.cnki.csa.009457 CSTR: 32024.14.csa.009457
Abstract:The controllable text summary models can generate summaries that conform to user preferences. Previous summary models focus on controlling a certain attribute alone, rather than the combination of multiple attributes. When multiple control attributes are satisfied, the traditional Seq2Seq multi-attribute controllable text summary model cannot integrate all control attributes, accurately reproduce key information in the texts, and handle words outside the word lists. Therefore, this study proposes a model based on the extended Transformer and pointer generator network (PGN). The extended Transformer in the model extends the Transformer single encoder-single decoder model form into a dual encoder with dual text semantic information extraction and a single decoder form that can fuse guidance signal features. Then the PGN model is employed to select the source from the source copy words in the text or adopt vocabulary to generate new summary information to solve the OOV (out of vocabulary) problem that often occurs in summary tasks. Additionally, to efficiently complete position information encoding, the model utilizes relative position representation in the attention layer to introduce sequence information of the texts. The model can be leveraged to control many important summary attributes, including lengths, topics, and specificity. Experiments on the public dataset MACSum show that compared with previous methods, the proposed model performs better at ensuring the summary quality. At the same time, it is more in line with the attribute requirements given by users.
CHEN Ye , YANG Chang-Chun , YANG Sen , WANG Yu-Peng , WANG Peng
2024, 33(4):254-262. DOI: 10.15888/j.cnki.csa.009475 CSTR: 32024.14.csa.009475
Abstract:In recent years, unstructured road segmentation has become one of the important research directions in the field of computer vision. Most existing methods are suitable for structured road segmentation and cannot meet the accuracy and real-time requirements of unstructured road segmentation. To address the above issues, this study improves the short-term dense concatenate (STDC) network by introducing residual connections to better integrate multi-scale semantic information. Additionally, it proposes a position attention-aware spatial pyramid pooling (PA-ASPP) module to enhance the network’s position awareness ability for specific regions such as roads. Experiments are conducted on two datasets, RUGD and RELLIS-3D, and the proposed method achieves a mean intersection over union (MIoU) of 50.78% and 49.96% on the test sets of the two datasets, respectively.
WANG Yue , LI Zuo-Yong , YAN Jia-Quan , HU Rong
2024, 33(4):263-270. DOI: 10.15888/j.cnki.csa.009477 CSTR: 32024.14.csa.009477
Abstract:In recent years, underwater acoustic target recognition has received considerable attention. However, due to the time-varying and space-varying nature of the underwater acoustic channel, as well as the complex and variable characteristics of the underwater target sound sources, water sound signal recognition tasks face significant challenges. Traditional methods for water sound signal recognition struggle to capture sufficient representation information of the targets and lack robustness against noise, resulting in suboptimal recognition performance. To address these issues, this study proposes a water sound signal recognition method based on the multi-branch external attention network (MEANet), which can effectively extract features and perform recognition in complex marine environments. MEANet consists of multiple branches for the backbone network, channel and spatial attention modules, and external attention modules. Firstly, the study feeds the input data through multiple parallel branches of the backbone network to extract features at different levels from the water sound signals. Secondly, it employs the channel and spatial attention modules to weight the channels and spatial dimensions of the water sound signals. Finally, the external attention module integrates external memory units and additional computations to guide feature extraction and prediction, significantly improving the recognition rate and robustness of the model. Experimental results demonstrate that the proposed MEANet achieves a recognition rate of 98.84% on the ShipsEar dataset, outperforming other comparative algorithms.
2024, 33(4):271-278. DOI: 10.15888/j.cnki.csa.009463 CSTR: 32024.14.csa.009463
Abstract:Mobile edge computing and ultra-dense network technologies have obvious advantages in improving the computing power of mobile devices and enhancing network capacity. However, under the scenario of convergence between the two, how to effectively reduce the co-channel interference among base stations and reduce the delay and energy consumption of task transmission is an important research topic. Therefore, this study designs a distributed wireless resource management algorithm based on multi-base station game equilibrium. The wireless resource management problem among small base stations is transformed into a game one to propose a reward-driven strategy selection algorithm. The base stations continuously update the selection probability of their strategies by iterations, which finally optimizes the sub-channel allocation and transmission power regulation. Simulation results show that the proposed algorithm has advantages in improving channel utilization and reducing latency and energy consumption for task transmission.
2024, 33(4):279-287. DOI: 10.15888/j.cnki.csa.009468 CSTR: 32024.14.csa.009468
Abstract:Gait recognition is the process of identifying individuals based on their walking patterns. Currently, most gait recognition methods employ shallow neural networks for feature extraction, which performs well in indoor gait datasets but produces poor performance on the newly released outdoor gait datasets. To address the complicated challenges that arise from outdoor gait datasets, this study proposes a deep gait recognition model based on video residual neural networks. In the feature extraction phase, a deep 3D convolutional neural network (3D CNN) is constructed by the proposed video residual blocks to extract the spatio-temporal dynamics features of the entire gait sequence. Subsequently, temporal pooling and horizontal pyramid mapping are introduced to reduce the feature resolution of sampling data and extract local gait features. The training process is driven by a joint loss function, and finally loss functions are balanced and the feature space is adjusted by BNNeck. The experiments are conducted on three publicly available gait datasets, including both indoor (CASIA-B) and outdoor (GREW, Gait3D) gait datasets. The experimental results verify that the model outperforms other models in accuracy and convergence speed on outdoor gait datasets.
LIU Zhi , LI Tao , YUAN Chong
2024, 33(4):288-295. DOI: 10.15888/j.cnki.csa.009480 CSTR: 32024.14.csa.009480
Abstract:Missing data affects the quality of the data, which may lead to inaccurate results and reduce the reliability of the model. Missing value filling reduces the bias and facilitates subsequent analysis. Most missing value filling algorithms assume a weak correlation or even no correlation between multiple missing values, with little consideration of the correlation between missing values and the order of filling. Independent filling of missing values in the sales domain reduces the utilization of missing value information, which has a greater impact on the accuracy of missing value filling. To address the above problems, this study takes the sales field as the research objective and explores the updating mechanism of multiple missing values based on the multidimensional characteristics of sales behavior and the spatial distribution characteristics of output values of different models. In addition, the work studies the incremental filling method of multiple missing values of sales data, which is based on the correlation of features, orders the missing features, and fuses the already-filled data as an information element to incrementally fill in the following missing values. The algorithm also takes into account the generalization of the model. The algorithm takes into account the generalization of the model and the information correlation between the missing data and combines with multi-model fusion to effectively fill multiple missing values. Finally, the effectiveness of the proposed algorithm is verified by a large number of experimental comparisons based on a real-chain drugstore sales dataset.
ZUO Li-Ming , ZHOU Ting , LIU Chen-Ning
2024, 33(4):296-301. DOI: 10.15888/j.cnki.csa.009473 CSTR: 32024.14.csa.009473
Abstract:Leakage tolerance refers to allowing the scheme to leak some secret information to enhance the robustness of the signature scheme, which is suitable for most occasions where the equipment and communication lines cannot be perfectly protected. The length of the short signature is generally only half that of the ordinary signature, which can greatly reduce the communication data volume of the narrowband real-time interactive system. This study proposes a short signature scheme for the signature key associated with the information to be signed, and the scheme is tolerant to partial leakage. The efficiency and security of the scheme are analyzed, and the security of the scheme is proved under the tolerant leak oracle. The experimental results show that the scheme has good performance and is suitable for applications with limited transmission bandwidth.
HUANG Xu-Dong , DI Xiao-Tao , SHEN Ming-Wei
2024, 33(4):302-307. DOI: 10.15888/j.cnki.csa.009403 CSTR: 32024.14.csa.009403
Abstract:Residential demand forecasting is affected by multiple factors and is non-linear. To address this issue, the study modifies the original neighborhood rough set (NRS) and then combines it with extreme learning machines (ELMs) to forecast residential demands. Specifically, the modified NRS (MNRS) algorithm constructs a neighborhood relationship matrix based on the neighborhood radii and standard deviations of different conditional attributes, thereby overcoming the failure of the original NRS algorithm to set the optimal neighborhood value for different conditional attributes. Then, the Pearson correlation coefficient is introduced into output attribute importance ranking to overcome the influence among conditional attributes, and the minimal redundant attribute-based reduction set is obtained to serve as the indicator system for residential demand forecasting. Finally, the residential demand indicator system is input into the ELM model to output an accurate forecasted value. Experimental results show that the MNRS-ELM forecasting model not only effectively reduces the operational complexity but also achieves higher prediction accuracy.