XING Yan , CHEN Xiao-Lu , XU Qi-Ao , HUANG Rui
2024, 33(2):1-12. DOI: 10.15888/j.cnki.csa.009402 CSTR: 32024.14.csa.009402
Abstract:The existing image deblurring methods typically directly use spatial or frequency domain information to restore clear images, ignoring the complementarity of spatial and frequency domain information. Utilizing the spatial domain information of images can effectively restore object structures while utilizing the frequency domain information of images can effectively restore texture details. This study proposes a simple and effective image deblurring framework that can fully utilize both the spatial and frequency domain information of images to produce high-quality and clear images. Firstly, two independent networks with the same structure are employed to learn the mapping relationship from the blurred images to the clear images in the spatial and frequency domain, respectively. Then a separate fusion network is adopted to further elevate the quality of clear images by fully integrating image information from both spatial and frequency domains. The three networks can be linked to form an end-to-end trainable large network, where they interact with each other to obtain high-quality images by joint optimization. The proposed method surpasses 9 state-of-the-art image deblurring methods in terms of peak signal-to-noise ratio, structural similarity index metric, and mean absolute error on the public image deblurring datasets including GoPro, Kohler, and RWBI. The effectiveness of the proposed image deblurring method which integrates both spatial and frequency domain information is verified by a large number of experiments.
YAN Yuan-Xiang , CAO Guo , ZHANG You-Qiang
2024, 33(2):13-22. DOI: 10.15888/j.cnki.csa.009388 CSTR: 32024.14.csa.009388
Abstract:In recent years, significant progress has been made in the classification of hyperspectral images (HSI) based on generative adversarial nets (GAN). Although they can alleviate the problem of limited training sample size, they are easily affected by imbalanced training data and have the problem of pattern collapse. To this end, a SPCA-AD-WGAN model for HSI classification is proposed. Firstly, to address the issue of reduced classification accuracy caused by imbalanced training data, the study adds a separate classifier and trains it separately from the discriminator. Secondly, it introduces the Wasserstein distance into the network to alleviate the GAN model collapse. The experimental results on two HSI datasets indicate that SPCA-AD-WGAN has better classification performance.
DING Mei-Rong , WANG Yu-Hang , ZENG Bi-Qing
2024, 33(2):23-32. DOI: 10.15888/j.cnki.csa.009390 CSTR: 32024.14.csa.009390
Abstract:Session-based recommendation aims to predict the next interaction item for anonymous users based on short-term interaction data. Most of the existing graph neural network session recommendation models treat all neighboring nodes equally during information propagation without distinguishing their importance to the central node, which introduces noise into the model training. In addition, the problem of over-smoothing arises as the number of layers of graph neural networks increases. To address these issues, a model named multi-layer graph attention network with skip connection for session-based recommendation (MGATSC) is proposed. Firstly, the graph attention network is used to learn the importance of neighboring nodes to the central node, and multiple networks are stacked to obtain high-order neighbor information. Then, to alleviate the over-smoothing problem, a skip connection based on the residual attention mechanism is used to update the node embeddings of each network layer, and the final node embedding is obtained through average pooling. Finally, the reverse positional embedding is fused into the node embedding, and recommendations are generated through the prediction layer. Experimental results on three public datasets, Tmall, Diginetica, and Retailrocket, demonstrate that the proposed model outperforms all baseline models, which validates the effectiveness and rationality of the model.
FU Jun-Peng , SUN Qi-Feng , CHEN Pei-Pei , WANG Ya-Ning
2024, 33(2):33-42. DOI: 10.15888/j.cnki.csa.009391 CSTR: 32024.14.csa.009391
Abstract:Aiming at the problem that traditional inversion methods excessively rely on the initial model, resulting in unstable results and low computational efficiency, a real-time inversion method for logging while drilling (LWD) is proposed by integrating independent recurrent neural network and particle swarm optimization algorithm. First, an independent recurrent neural network model is built from sequence data generated by stratigraphic forward modeling, and an attention mechanism is introduced to emphasize the role of key features in the LWD inversion. Next, stochastic inertia weights are introduced into a particle swarm optimization algorithm to improve its global and local search capabilities, and hyperparameter-optimization of neural network model is carried out using the particle swarm optimization algorithm. Finally, ablation experiments and comparison experiments are conducted on the test set of forward simulation and the test set of logging data from
WANG Xi-Bo , QI Cheng-Ye , JIA Zheng-Feng
2024, 33(2):43-53. DOI: 10.15888/j.cnki.csa.009389 CSTR: 32024.14.csa.009389
Abstract:With the development of technologies such as computer networks and wireless communication, copyright protection and information security issues of video media documents have become increasingly the focus of people’s attention, and video media file encryption is a way to effectively protect information security. Traditional video file encryption methods need to encrypt all video frame data in video media files. The efficiency of file encryption is relatively low, and the encryption process is time-consuming. Therefore, a method for improving the efficiency of video media file encryption based on the Chinese SM2 algorithm is proposed according to the structural characteristics of H.264/AVC video frames. This method only encrypts the NALU Header information of the key frame in the encrypted video media during video media file encryption. In the case of detecting H.264 shards, it is also necessary to encrypt the non-IDR Header information. The experimental results show that the method can effectively encrypt video media files while reducing the time required for encryption, thus significantly improving the encryption efficiency of video media files.
2024, 33(2):54-61. DOI: 10.15888/j.cnki.csa.009396 CSTR: 32024.14.csa.009396
Abstract:Label noise is widely present and unavoidable, and it affects the performance of deep network models. Sample selection methods based on the principle of small loss can easily and effectively handle label noise by the “memory effect” of neural networks. This study proposes a new sample selection principle and a two-stage weighted sample selection and relabeling method (WSSR-2s) based on the principle that a closer sample distance in the feature space results in more similarity, combined with the assumption of high and low confidence of the samples. In the early training stage, for high-confidence samples, their voting rights are weighted in the feature space to better guide training. In the middle and later stages of training, for low-confidence samples, their voting rights are transferred to their most similar feature samples for more accurate training. The experimental results on synthetic noise datasets CIFAR-10 and CIFAR-100, as well as real noise datasets ANIMAL-10N and WebVision, show that the proposed method achieves higher accuracy and can better handle label noise problems.
QIAN Hong , WANG Fei , LIU Sha , ZHENG Tian-Yu , SONG Jia-Wei , AN Hong
2024, 33(2):62-71. DOI: 10.15888/j.cnki.csa.009393 CSTR: 32024.14.csa.009393
Abstract:The delay of the computing core access to the main memory of Shenwei heterogeneous many-core processors is very large, and thus the program should try to avoid the access of computing core code to main the memory as much as possible. The global offset table stores the addresses of global variables and functions in the program, which is not suitable to be saved in the rare local storage space of the computing core, and it is not suitable for cache prefetching because of its discrete access patterns. Therefore, accessing the main memory operation introduced by accessing the global offset table has a great influence on program performance. In view of the usage scenarios of static linking and dynamic linking of heterogeneous many-core programs, the usage limitations of linker relaxation optimization are analyzed, and a global symbol relocation optimization method is designed based on “gp address base+extended offset” to avoid accessing the main memory. Experimental results show that at the cost of adding a small amount of code, the relocation optimization method can effectively avoid the operation of accessing the main memory introduced by accessing the global offset table when the computing core code calls functions and accesses global variables, which improves the running performance of many-core programs.
WANG Yan , TA Xue , LU Peng-Yi
2024, 33(2):72-82. DOI: 10.15888/j.cnki.csa.009407 CSTR: 32024.14.csa.009407
Abstract:At present, most image dehazing algorithms ignore the local details of the image and fail to make full use of features at different levels, resulting in color distortion, contrast reduction, and haze residual phenomena in the restored image without fog. To solve this problem, this study proposes an adaptive feature fusion image dehazing network combined with dense attention. The network takes the encoder-decoder structure as the basic framework, and the feature enhancement part and the feature fusion part are embedded in the middle. The dense feature attention block composed of the dense residual network and the Channel-Spatial attention combination module is superimposed on the feature enhancement part. In this way, the network can pay attention to the local details of the image, enhance the reuse of features, and effectively prevent the disappearance of gradients. In the feature fusion part, an adaptive feature fusion module is constructed to fuse low-level and high-level features to prevent shallow feature degradation caused by the deepening of the network. The experimental results show that the proposed algorithm performs well on both synthetic and real fog image datasets. The peak signal-to-noise ratio and structural similarity on SOTS indoor synthetic datasets reach 35.81 dB and
DING Mei-Rong , WANG Zhao-Hong , ZHENG Xin-Ru , ZHANG Ying-Chun
2024, 33(2):83-93. DOI: 10.15888/j.cnki.csa.009426 CSTR: 32024.14.csa.009426
Abstract:In the anomaly detection of time series data, a single model often only extracts temporal features related to its model structure and thus tends to ignore other features. At the same time, facing large-scale temporal data, it is difficult for models to model local trends in temporal data. To address these two issues, this study proposes an anomaly detection model called PEAD based on particle swarm optimization (PSO) and external knowledge. The PEAD model uses a deep learning model as the base model and introduces external knowledge generated by the fast Fourier transform to improve the modeling ability of the base model for local trends. Subsequently, the PEAD model trains the base model through Stacking ensemble learning and then uses the PSO algorithm to sum the weighted output of the base model. The weighted sum of the reconstructed data is used for anomaly detection. The PSO algorithm enables the final output of the model to focus on the global and temporal features of the temporal data and enriches the temporal features extracted by the model, thereby improving its anomaly detection ability. By testing six publicly available datasets, the research results show that the PEAD model performs well on most of the datasets.
XIAN Guang-Ming , ZHAO Zhi-Feng , YANG Xian-Ping
2024, 33(2):94-104. DOI: 10.15888/j.cnki.csa.009385 CSTR: 32024.14.csa.009385
Abstract:One of the key tasks of aspect-level multimodal sentiment classification is to accurately extract and fuse complementary information from two different modals of text and vision, so as to detect the sentiment orientation of the aspect words mentioned in the text. Most of the existing methods only use single context information combined with image information for analysis, revealing the problems such as insensitive to the recognition of the correlation between aspect-, context- and visual-information, and imprecise in local extraction of aspect-related information in vision. In addition, when performing feature fusion, insufficient partial modal information will lead to mediocre fusion effect. To solve the above problems, an attention fusion network AF-Net model is proposed to perform aspect-level multimodal sentiment classification in this study. The spatial transformation network (STN) is used to learn the location information of objects in images to help extract important local features. The Transformer based interaction network is used to model the relationship between aspects, texts and images, and realize multi-modal interaction. At the same time, the similar information between different modal features is supplemented and the multi-feature information is fused by multi-attention mechanism to represent the multi-modal information. Finally, the result of sentiment classification is obtained through Softmax layer. Experiments and comparisons carried out on the two benchmark datasets show that AF-Net can achieve better performance and improve the effect of aspect-level multimodal sentiment classification.
HAN Chun-Rong , YANG Zi-Qiang , GUO Jun-Wen , WANG Peng-Fei , WU Xiao-Long , SUN Chen-Xuan
2024, 33(2):105-114. DOI: 10.15888/j.cnki.csa.009386 CSTR: 32024.14.csa.009386
Abstract:The bearing temperature of the blower is an important indicator to evaluate its stable operation. However, since bearings are usually installed in a relatively closed environment, it is difficult to achieve real-time and accurate detection of bearing temperature. To address this issue, a knowledge graph-based intelligent prediction of the bearing temperature of blowers is presented. First, a statistical method is applied to analyze the operational system of blowers, and the influencing factors related to bearing temperature are obtained. Second, a knowledge graph is constructed by combining mechanism and domain knowledge. In addition, the direct and indirect feature variables that affect the bearing temperature are extracted. Third, a dual modular fuzzy neural network is designed?to deduce the knowledge graph, and the real-time and accurate prediction of the bearing temperature of blowers is realized. Finally, the results show that the intelligent prediction method of bearing temperatures of blowers based on a knowledge graph can accurately model the blower system and has good temperature prediction ability. This research can provide support for real-time monitoring and change trend prediction of bearing temperatures.
LI Jian-Dong , WANG Yan , QU Hai-Cheng
2024, 33(2):115-124. DOI: 10.15888/j.cnki.csa.009395 CSTR: 32024.14.csa.009395
Abstract:Camouflage object detection (COD) aims to accurately and efficiently detect camouflaged objects that are highly similar to the background. Its method can assist in species protection, medical patient detection, and military monitoring, possessing high practical value. In recent years, using deep learning methods to detect camouflaged objects has become an emerging research direction. However, most existing COD algorithms apply a convolutional neural network (CNN) as the feature extraction network and ignore the influence of feature representation and fusion methods on detection performance when combining multi-level features. As the camouflage object detection model based on CNN has a weak ability to extract the global features of the detected object, this study proposes a cross scale interactive learning method for camouflage object detection based on Transformer. The model first puts forward a dual branch feature fusion module, which fuses features that have undergone iterative attention to better fuse high- and low-level features. Secondly, a multi-scale global context information module is introduced to fully integrate context information to enhance features. Finally, a multi-channel pooling module is proposed, which can focus on the local information of the detected object and improve the accuracy of camouflage target detection. The experimental results on the CHAMELEON, CAMO, and COD10K datasets show that this method generates clearer prediction maps and can achieve higher accuracy in camouflage object detection models than current mainstream camouflage object detection algorithms.
2024, 33(2):125-133. DOI: 10.15888/j.cnki.csa.009401 CSTR: 32024.14.csa.009401
Abstract:In medical image registration based on deep learning, when the medical image contains multiple tissue types, the structural difference between different tissue may lead to a decrease in the accuracy of network registration, especially in complex deformation regions, such as the junction of tissue and the lesion region, and accurate registration becomes more difficult. The existing registration algorithms have low registration accuracy for complex deformation regions. At the same time, the existing registration network cannot capture the local and global spatial information of the image at the same time, resulting in insufficient robustness of the network and low accuracy when it is transferred to other organs for registration. In order to solve the above problems, this study?creates a cascaded block registration model based on multi-spatial information extraction. This model can effectively use the local and spatial information of input images, divide medical images into blocks through block fusion technology, and perform fine registration for each image in turn to generate corresponding deformation field blocks. In the last stage of the model, the generated deformation field blocks are fused and restored to enhance the registration strength of the network for the local complex deformation region. The experimental results show that the proposed method not only improves the registration of the brain but also performs well in the registration of other human body parts, which improves the accuracy and reliability of medical image registration and provides better diagnosis and treatment support for clinicians.
2024, 33(2):134-142. DOI: 10.15888/j.cnki.csa.009404 CSTR: 32024.14.csa.009404
Abstract:High-resolution remote sensing images have rich spatial features. To solve the problems of complex models, blurred boundaries, and multi-scale segmentation in remote sensing land cover methods, this study proposes a lightweight semantic segmentation network based on boundary and multi-scale information. First, the method uses a lightweight MobileNetV3 classifier and depthwise separable convolutions to reduce computation. Second, the method adopts top-down and bottom-up feature pyramid structures for multi-scale segmentation. Next, a boundary enhancement module is designed to provide rich boundary detail information for the segmentation task. Then, the method designs a feature fusion module to fuse boundary and multi-scale semantic features. Finally, the method applies cross-entropy and Dice loss functions to deal with the sample imbalance. The mean intersection over union of the WHDLD dataset reaches 59.64%, and the overall accuracy reaches 87.68%. The mean intersection over union of the DeepGlobe dataset reaches 70.42%, and the overall accuracy reaches 88.81%. The experimental results show that the model can quickly and effectively realize the land cover classification of remote sensing images.
ZHANG Jian-Xin , LIU Dong-Wei , ZHANG Mu-Qing , HAN Yu-Tong , ZHANG Jun-Xing
2024, 33(2):143-150. DOI: 10.15888/j.cnki.csa.009394 CSTR: 32024.14.csa.009394
Abstract:To address issues of the limited receptive field and insufficient global information of the U-Net model in MRI brain tumor segmentation, this study?proposes an improved U-Net model, i.e., PyCSAU-Net, by introducing non-local self-attention mechanism and multi-scale pyramidal convolution. The given model leverages the three-dimensional U-Net as the baseline and introduces the extended three-dimensional non-local attention to the horizontal connection of the fourth layer, which solves the issue of insufficient long-term modeling ability caused by the limited convolution kernel size to a certain extent, thus improving the segmentation performance. Moreover, it replaces the normal convolution by three-dimensional pyramidal convolution with multi-scale characteristics to capture more discriminant deep features of brain tumors at multi-levels and multi-resolutions. The segmentation results of 0.904/0.901, 0.781/0.774, and 0.825/0.824 are achieved on the publicly BraTS 2019 and BraTS 2020 validation datasets on the whole tumor, enhanced tumor, and tumor core, respectively. It demonstrates the effectiveness and competitiveness of PyCSAU-Net for the brain tumor segmentation task.
FU Qiang , JIANG Xue-Wei , CHENG Peng
2024, 33(2):151-158. DOI: 10.15888/j.cnki.csa.009400 CSTR: 32024.14.csa.009400
Abstract:Unmanned aerial vehicles (UAVs) cannot identify and locate foreign objects in the scene during the inspection in low-light environments, resulting in the subsequent intelligent algorithms failing to obtain the environmental semantic information. To this end, this study proposes a method to fuse information from the ORB-SLAM2 algorithm with the YOLOv5 model, which is applicable to the improvement of low-light object detection. First, deep learning training and fusion algorithm validation are performed by self-collecting low-light datasets from RGB-D cameras. Then, the target pixel coordinates are extracted by combining the keyframe information, the output of the object detection module, and the inherent information of the camera. Finally, the position of the target object is solved relative to the world coordinate system by keyframe information and pixel coordinates. The study achieves more accurate recognition of target objects in low-light environments and localization of target objects in the world coordinate system at the sub-meter level, which provides an effective solution for intelligent inspection of UAVs in low-light environments.
SHU Xiao-Feng , WU Xiao-Hong , QING Lin-Bo , TENG Qi-Zhi , LUO Bin-Bin
2024, 33(2):159-165. DOI: 10.15888/j.cnki.csa.009379 CSTR: 32024.14.csa.009379
Abstract:It is important to understand the characteristics of rock porosity, pore size distribution, and pore connectivity for oil and gas exploration and exploitation, and the analysis and judgment of these characteristics need to rely on the image segmentation technology of rock thin sections. There are a large number of fine particles in the images of rock thin sections, and the edge features among these particles are very similar, which cannot be accurately distinguished. Meanwhile, uneven staining during section manufacturing will cause unbalanced color characteristics of the pores of the thin sections, resulting in the inability to segment. Therefore, to improve the segmentation effect of rock thin sections, this study proposes an improved segmentation algorithm based on U2Net. The main contents are as follows. (1) The U2Net network is adopted as the backbone to improve the model’s ability to express image features, and coordinate attention is combined to enhance the ability to express image features. (2) The introduction of a multi-scale feature extraction module enlarges the receptive field of the convolutional layers and enables the utilization of multi-scale feature information from the feature map. Empirical evaluations demonstrate that the proposed method outperforms conventional segmentation techniques and other state-of-the-art segmentation networks in small particle segmentation. Additionally, the proposed algorithm exhibits superior segmentation accuracy and robustness.
ZHAO Kui , LI Qi , GAO Yan-Jun , MA Hui-Min
2024, 33(2):166-175. DOI: 10.15888/j.cnki.csa.009422 CSTR: 32024.14.csa.009422
Abstract:In the field of medicine, there are differences between patients with the same disease, and seemingly simple diseases may show different levels of complexity, which brings great challenges to patient identification, treatment, and prognosis. In this study, the electronic medical history stored in vertically unstructured time sequence is used to solve the heterogeneity of patients, enhance the acquisition of hidden information by seizing the characteristics of irregular medical treatment intervals, and capture the connection between current medical records and past and future information through forward and backward bidirectional learning, so as to deepen the level of feature extraction of original sequences and make the model make more accurate decisions. The BT-DST model proposed in this study?uses a time-aware LSTM unit to construct a bidirectional autoencoder to learn a strong single representation of a patient, which is then used in patient clustering to obtain the subtype of the patient for the current disease through statistical analysis. In addition, different types of therapeutic interventions can be applied to different populations, which provides precise medicine for different types of patients according to their health conditions.
QIN Yun-Fei , CUI Xiao-Long , CHENG Lin , FAN Ji-Dong
2024, 33(2):176-187. DOI: 10.15888/j.cnki.csa.009387 CSTR: 32024.14.csa.009387
Abstract:To solve the problem of small target detection and target occlusion, this study constructs corresponding traffic scenes based on the VisDrone2019 data set and proposes a small target detection algorithm. First, the shallow features of the backbone network are fully used to improve the problem of missing small targets. The small target detection layer P2 is added to the original network structure of the YOLOv7 algorithm, and a multi-level shallow information fusion module is added to the feature fusion network of the model of the small target detection layer P2, so as to improve the small target detection effect of the algorithm. Secondly, the global context module is used to build the connection between the target and the global context, enhance the ability of the model to distinguish between the target and the background, and improve the detection effect when the target is missing features due to occlusion. Finally, the CIoU loss function in the baseline model is replaced by NWD, a loss function specially designed for small targets in this study, so as to solve the problem that IoU itself and its extension are highly sensitive to the position deviation of small targets. Experiments show that the improved YOLOv7 model has improved by 2.3% and 2.8% respectively in the small target aerial photography data set VisDrone2019 (test set and validation set) with mAP.5:.95, achieving excellent detection results.
DONG Ci-Hao , CHEN Lei-Ming , HUANG Zi-Ling , ZHU Yi-Chang , QIU Jia-Kang , LIU Shang-Ru
2024, 33(2):188-197. DOI: 10.15888/j.cnki.csa.009408 CSTR: 32024.14.csa.009408
Abstract:With wearable devices entering life on a large scale, human behavior recognition based on temporal data generated by motion sensors has become a research hotspot in this field. However, the current methods cannot find the relationship between multiple sensor data in time and space. In addition, when the traditional neural network learns a new task, the new task parameters will overwrite the old task parameters, causing catastrophic forgetting problems. To this end, this study proposes a human behavior recognition algorithm based on the fusion method of graph attention network and generative playback continuous learning mechanism. The algorithm extracts temporal features through convolutional neural network and graph attention network, enabling the model to focus on temporal and spatial features at the same time. In addition, the algorithm adopts an episodic memory continuous learning method based on a generative data replay strategy, which remembers historical data distributions by conditional variational autoencoders, to address the catastrophic forgetting problem. Finally, compared with different baseline algorithms on multiple public datasets, the experimental results show that the proposed algorithm can achieve higher accuracy while mitigating the catastrophic forgetting problem more effectively.
ZHANG Tao-Jie , ZHOU Di-Bin , LI Jin-Di , YU Chen
2024, 33(2):198-206. DOI: 10.15888/j.cnki.csa.009384 CSTR: 32024.14.csa.009384
Abstract:Considering that traditional edge detection algorithms are difficult to handle blurred medical images, this study proposes an edge detection network ECENet based on deep learning. First, the network is based on the CHRNet model, and its last two layers are pruned to make the model more efficient and lightweight. Secondly, the attention module SKSAM is added to the feature extraction stage of the network to optimize the adaptive extraction of image features and reduce the impact of noise. Finally, context-aware fusion blocks are applied to connect multi-scale network outputs to help the model better understand the structure and semantic information of the image. In addition, considering the pixel-level accuracy and the smoothness of the boundary, the loss function is optimized to provide better gradient signals for model training. Experimental results show that the proposed algorithm increases optimal data set size (ODS) and optimal image ratio (OIS) indicators to 0.816 and 0.823 respectively; the relevant edge indicator parameters were significantly improved, with PSNR increased by 16.8% and SSIM by 37.6%.
LI Jin-Di , ZHANG Tao-Jie , ZHOU Di-Bin , LIU Wen-Hao
2024, 33(2):207-215. DOI: 10.15888/j.cnki.csa.009398 CSTR: 32024.14.csa.009398
Abstract:Traditional edge detection algorithms are difficult to deal with complex images, and the existing depth-based edge detection models often have edge positioning errors and information loss in the detection results. Aiming at such problems, this study proposes a high-precision edge detection algorithm RCF-CLF based on RCF. First, the HDC structure is introduced to avoid the grid effect caused by superimposing the same dilated convolution. Second, a feature enhancement structure is designed to fuse multi-scale information and expand the receptive field. Then, a cross-layer fusion structure is designed, which integrates high-level and low-level information to extract accurate edge information. Finally, the attention mechanism CBAM is introduced to focus on the edge area of the object and suppress the non-edge area, thereby improving the ability of the network to extract edge information. This study evaluates the proposed method on the BSDS500 and BIPED datasets. Compared with the RCF algorithm, the main indicators ODS, OIS, and AP reached 0.893, 0.901, and 0.945, respectively, with an increase of nearly 5 percentage points on the BIPED dataset. On the BSDS500 dataset, the main indicators have also improved. In addition, compared with other similar algorithms, the proposed algorithm also has certain advantages, which can achieve more accurate edge positioning.
CHEN Lin-Guo , XIONG Ling , DAI Qi-Liang , WANG Dong-Mei , LI Shu-Fan
2024, 33(2):216-223. DOI: 10.15888/j.cnki.csa.009427 CSTR: 32024.14.csa.009427
Abstract:The composite piece is the core cutting unit of the PDC drill bit, and its automatic detection technology is the basis of the automatic repair technology of the composite piece. This paper proposes a PDC drill bit composite piece detection method based on the improved YOLOv7. Based on YOLOv7, the conventional convolution is replaced with depth-separable convolution, which reduces the amount of parameters and computing cost. As the SimAM attention mechanism is introduced, the method can derive 3D attention weights from neurons without additional parameters and also improve the expressive ability of convolutional neural networks. SPPCSPC is replaced with SPPFCSPC, which improves the speed while ensuring that the receptive field remains unchanged. The priori frames of K-means++ algorithm clusters are adopted and a heuristic algorithm is applied to locate defective composite pieces. Experimental results show that compared with the original YOLOv7 model, the mAP of the proposed algorithm is increased by 2.75%, the number of parameters reduced by about 80%, and the inference speed increased by 9.12 f/s. It also has greater advantages than other algorithms and can realize industrial applications of composite piece detection.
2024, 33(2):224-231. DOI: 10.15888/j.cnki.csa.009397 CSTR: 32024.14.csa.009397
Abstract:An improved dung beetle optimization algorithm integrating multiple strategies (MSDBO) is proposed to solve the problems of weak global exploration ability, low convergence accuracy, and easy capture by local optimum solution. Firstly, this study introduces the social learning strategy to guide the dung beetle to update its position, which improves the global exploration ability of the algorithm and avoids the algorithm falling into local optimal. Secondly, the study proposes a direction-following strategy to establish the interaction between the thief and the ball-rolling dung beetle, which improves the accuracy of optimization. Finally, taking into account the performance and time consumption, it introduces environment-aware probability to guide the thief to adopt the direction-following strategy reasonably. Several optimization algorithms are selected and compared with MSDBO. By solving and analyzing 12 benchmark test functions, it is proved that the optimization performance of MSDBO is significantly better than that of the comparison algorithm. The results of pressure vessel design optimization verify the effectiveness of MSDBO in solving practical engineering constraint optimization problems.
LI Wei-Xiang , LI Wu-Jin , CHEN Si-Yuan
2024, 33(2):232-238. DOI: 10.15888/j.cnki.csa.009415 CSTR: 32024.14.csa.009415
Abstract:In UAV photogrammetry, traditional ground point cloud extraction methods have poor adaptability when extracting roads from image point cloud data. Therefore, this study proposes a UAV photogrammetric point cloud road adaptive extraction method. Firstly, the point cloud is divided into three categories based on its spatial geometric characteristics. Then, corresponding methods are applied to remove non-road point cloud categories. Finally, the point cloud data obtained through the adaptive extraction method is filtered for smoothing and subjected to color-based region growing segmentation. Experimental results show that the I-class error of road point cloud extracted by this method is 4.97%, and the II-class error is 1.14%. This method effectively extracts target road surfaces, improving the efficiency of point cloud data processing in UAV photogrammetric applications.
LIU Yu , CHEN Yong-Can , ZHOU Yan-Ping
2024, 33(2):239-245. DOI: 10.15888/j.cnki.csa.009428 CSTR: 32024.14.csa.009428
Abstract:Aiming at the Pareto optimal problem for multi-objective flow shop scheduling, this study builds a multi-objective flow shop scheduling problem model with maximum completion time and maximum delay time as the optimization objectives. Meanwhile, the study designs a genetic reinforcement learning algorithm based on Q-learning for the Pareto optimal solution of the problem. The algorithm introduces state variables and action variables and obtains the initial population by Q-learning algorithm to improve the initial solution quality. During the evolution of the algorithm, the Q-table is applied to guide the mutation operation to expand the local search range. The Pareto fast non-dominated sorting and congestion calculation are adopted to improve the solution quality and diversity, and the Pareto optimal solution is obtained step by step. The effectiveness of the improved genetic enhancement algorithm for the Pareto optimal solution of the multi-objective flow shop scheduling problem is verified by comparing the proposed algorithm with the genetic algorithm, NSGA-II algorithm, and Q-learning algorithm.
MA Li , WANG Jun , LIANG Xian-He , HAO Jin-Hua
2024, 33(2):246-252. DOI: 10.15888/j.cnki.csa.009414 CSTR: 32024.14.csa.009414
Abstract:In the process of fat quantification standardization in liver MRI images, it is often necessary to manually sample the liver area of interest, but the manual sampling strategy is time-consuming and the results are variable. Compared with manually sketched regions of interest, the whole liver segmentation based on deep learning method has lower variability error and uncertainty, and better performance in fat quantitative analysis. To improve the segmentation performance during the whole liver segmentation task, this study makes improvements based on the UNETR++ model. This method combines the advantages of a convolutional neural network and Transformer structure and adds convolutional structure branches to supplement local features. Meanwhile, it introduces a gated attention mechanism to suppress irrelevant background information to make the model more prominent features of the segmented region. The improved method has better DCS and HD95 indexes than UNETR++ and other segmentation models.
WANG Hua-Yi , HUANG Yao-Cheng , CAI Bo
2024, 33(2):253-264. DOI: 10.15888/j.cnki.csa.009413 CSTR: 32024.14.csa.009413
Abstract:As a research direction of computer vision, image similarity comparison has a wide range of applications, such as face recognition, person re-identification, and target tracking. However, the summary and induction of image similarity algorithms are relatively few, and there are challenges in applying them to actual industrial production. This study summarizes the principle and performance of traditional image processing algorithms and deep learning image processing algorithms in image similarity comparison, aiming to select the best algorithm for the scene of drug image similarity comparison. Among the traditional image processing algorithms, the ORB algorithm performs best on the test set, with an accuracy of 93.09%. In the deep learning algorithm, the study adopts an improved Siamese network structure, invents a label generation method, sets a specific data augmentation strategy, and adds a feature surface classification network to improve the training efficiency and accuracy. The final test results show that the improved Siamese network performs best and can achieve an accuracy of 98.56% and an inference speed of 27.80 times/s. In summary, the improved Siamese network algorithm is more suitable for the fast comparison of drug images and is expected to be widely used in the future pharmaceutical industry.
LOU Yao-Di , YUE Jun-Feng , ZHOU Di-Bin , LIU Wen-Hao
2024, 33(2):265-275. DOI: 10.15888/j.cnki.csa.009406 CSTR: 32024.14.csa.009406
Abstract:Since the existing deep model faces many problems such as a large number of model parameters, insufficient feature fusion, and low detection accuracy for small targets in the field of industrial bearing appearance defect detection, a lightweight adaptive feature fusion detection network (Efficient-YOLO) is proposed. First of all, the network uses the EfficientNetV2 structure embedded in the CBAM attention mechanism for basic feature extraction to ensure model accuracy and significantly optimize the model parameters. Secondly, an adaptive feature fusion network (CBAM-BiFPN) is designed to strengthen the network’s extraction of effective feature information. Then, the Swin?Transformer mechanism is introduced in the downstream feature fusion network, and the Ghost convolution introduced by the upstream network is used to greatly improve the model’s global perception of bearing appearance defects. Finally, the improved non-maximum suppression method (Soft-CIoU-NMS) is applied in the inference phase, with distance-related weight evaluation factors added, so as to reduce missed detection of overlapping frames. The experimental results show that compared with the existing mainstream detection models, the method has a mAP of 90.1% on the bearing surface defect dataset. The number of parameters is reduced to 1.99M. and the calculation amount is 7 GFLOPs. The recognition rate of small targets with bearing defects is significantly improved, which meets the needs of industrial bearing appearance defect detection.
2024, 33(2):276-283. DOI: 10.15888/j.cnki.csa.009405 CSTR: 32024.14.csa.009405
Abstract:In order to improve the authenticity of Chinese lip synchronized facial animation videos, this study proposes a text audio-driven facial animation generation technology based on the improved Wav2Lip model. Firstly, a Chinese lip synchronized dataset is constructed, which is used to pre-train the lip discriminator to make it more accurate in discriminating Chinese lip synchronized facial animations. Then, in the Wav2Lip model, text features are introduced to improve lip time synchronization and thus improve the authenticity of facial animation videos. The model in this study synthesizes the extracted text information, audio information, and speaker facial information and generates a highly realistic lip synchronized facial animation video under the supervision of a pre-trained lip discriminator and video quality discriminator. The comparative experiments with the ATVGnet model and Wav2Lip model show that the lip synchronized facial animation video generated by the proposed model improves the synchronization between lip shape and audio and enhances the overall realism of the facial animation video. The paper provides a solution for the current facial animation generation.
SONG Ying , LU Yu-Hang , CHEN Yi-Fei
2024, 33(2):284-290. DOI: 10.15888/j.cnki.csa.009392 CSTR: 32024.14.csa.009392
Abstract:Obstacle detection and tracking technology is an important technology in the process of mobile robot driving, which is conducive to improving the movement safety of mobile robots. In order to improve the accuracy of obstacle detection, two improvements have been made to overcome the over-segmentation and under-segmentation of Euclidean clustering. A dynamic Euclidean clustering search radius method is proposed to solve the problem of too sparse distant point clouds, and a method of changing radius search to extended search in the depth direction is proposed to solve the problems of incomplete detection and trailing in the depth direction of point cloud data. In order to improve the accuracy of dynamic obstacle tracking, a new calculation formula of association matrix is designed when two frame obstacle data association is performed, and six degrees of freedom information and size information of the obstacle are added, which improves the success rate of dynamic matching. Simulation experiments show that the improved obstacle detection accuracy reaches 95.2%, and the multi-target tracking accuracy reaches 13.2 mm.
HUANG Wei-Cong , ZHOU Zhuo-Yi , LI Xiong-Bin , LIANG Yan
2024, 33(2):291-298. DOI: 10.15888/j.cnki.csa.009425 CSTR: 32024.14.csa.009425
Abstract:In clinical practice, accurate pain assessment is crucial for pain management and diagnosis. However, traditional assessment methods are highly subjective and reliant on the expertise of medical professionals, highlighting the urgent need for more reliable and objective alternatives. The research on pain detection based on facial expression by deep learning has made remarkable progress in recent years, whereas the complex structure and high computational cost restrict its practical application. Therefore, this study proposes an improved 3D convolutional neural network (CNN) that utilizes a lightweight 3D CNN named L3D as the backbone network. It also incorporates an enhanced SE attention mechanism to fuse multiple features of different scales, capturing spatiotemporal characteristics with strong discriminative power in pain sequences. The proposed method is evaluated on UNBC-McMaster and BioVid datasets. Compared with the state-of-the-art methods, the proposed method achieves superior performance in pain detection and computational complexity.
HUANG Jian , ZHAO Xiao-Fei , WANG Hu , HU Qi-Sheng
2024, 33(2):299-307. DOI: 10.15888/j.cnki.csa.009399 CSTR: 32024.14.csa.009399
Abstract:The surrounding areas of underground infrastructure such as optical cables and high-pressure oil and gas pipelines are vulnerable to brutal invasion by excavators. This study proposes an excavator detection and working state discrimination method combined with Yolopose and a multilayer perceptron. First, the Yolopose-ex extraction network based on Yolopose’s six-point posture of the excavator is designed. Secondly, the Yolopose-ex model is utilized to extract the change information of the excavator’s working posture in the video, and the working state feature vector (MSV) of the excavator is constructed. Finally, the multilayer perceptron (MLP) is adopted to analyze the working status of the excavator in the video. The experimental results show that the proposed method overcomes the problem of difficult discrimination of complex backgrounds, and the accuracy of the identification of the working state of the excavator reaches 96.6%, which has a high reasoning speed and generalization ability.