Volume 33,Issue 11,2024 Table of Contents

Pilot’s Gaze Zone Classification Based on Multi-modal Data Fusion

DUAN Gao-Le , WANG Chang-Yuan , WU Gong-Pu , WANG Hong-Yan

2024, 33(11):1-14. DOI: 10.15888/j.cnki.csa.009677

Abstract (441) HTML (675) PDF 3.76 M (1481) Comment (0) Favorites

Abstract:To avoid eye image disappearance and inaccurate head pose estimation during image capture, a non-contact method for acquiring eye information is employed to collect facial images, determining the pilot’s current gaze direction from a single image frame. Concurrently, considering the poor classification of current networks due to the neglect of visual obstruction caused by head movements, with a combination of facial images and head poses, a multimodal data fusion network for the pilot’s gaze region classification is proposed using an improved MobileVit model. Firstly, a multi-modal data fusion module is introduced to address the problem of overfitting resulting from size imbalances during feature concatenation. Additionally, an inverse residual block based on a parallel branch SE mechanism is proposed to fully leverage spatial and channel feature information in the shallow layers of the network. Moreover, multi-scale features are captured by integrating the global attention mechanism from the Transformer. Finally, the Mobile Block structure is redesigned and the depthwise separable convolution is utilized to reduce model complexity. Experimental comparisons with mainstream baseline models are conducted using a self-made dataset FlyGaze. The results demonstrate that the PilotT model achieves classification accuracies exceeding 92% for gaze regions 0, 3, 4, and 5, with robust adaptability to facial deflection. These findings hold practical significance for enhancing flight training quality and facilitating pilot intention recognition and fatigue assessment.

Pedestrian Multi-object Tracking Combining Discriminative Appearance and Motion Cues

WANG Jun , LI Ying-Chun , CHENG Yong

2024, 33(11):15-26. DOI: 10.15888/j.cnki.csa.009681

Abstract (253) HTML (623) PDF 3.61 M (1270) Comment (0) Favorites

Abstract:In multi-object tracking tasks, the interference of external noise can lead to unreliable system modeling of traditional methods, thus reducing the accuracy of object position prediction; and the congestion and obstruction caused by dense crowds seriously affect the reliability of the object appearance, resulting in incorrect identity association. To address these issues, this study proposes a multi-object tracking algorithm Ecsort. This algorithm improves position prediction accuracy by introducing a noise compensation module based on traditional motion prediction to reduce errors caused by noise interference. Secondly, this algorithm introduces a feature similarity matching module. It can achieve accurate identity association by learning discriminative appearance features of objects and combining the advantages of motion cues and discriminative appearance features. Extensive experimental results on multi-object tracking benchmark datasets demonstrate that, compared to the baseline model, this method improves ID F1 score (IDF1), higher order tracking accuracy (HOTA), association accuracy (AssA), and detection accuracy (DetA) by 1.1%, 0.5%, 0.6%, and 0.3% respectively on the MOT17 test set, and by 2.3%, 1.9%, 3.4%, and 0.2% respectively on the MOT20 test set.

Facial Expression Recognition via Optimized Bilinear ResNet34

LYU Jun , CHANG Wan-Ting , CHEN Fu-Long , WANG Zhi-Wei

2024, 33(11):27-37. DOI: 10.15888/j.cnki.csa.009682

Abstract (305) HTML (623) PDF 1.95 M (1304) Comment (0) Favorites

Abstract:An optimized bilinear structure based on ResNet34, termed OBSR-Net, is proposed for more accurate and quick facial expression recognition. OBSR-Net adopts a bilinear network structure as its overall framework and incorporates ResNet34 as the backbone network to model the local paired feature interaction by translation invariance, to extract more complete and effective features. At the same time, transfer learning mitigates the limitations imposed by small sample image data sets of facial expressions on deep learning. In addition, gradient concentration, a new general optimization technique, is utilized during the training process. This technique operates directly on gradients by concentrating gradient vectors to zero mean, which can be regarded as a projected gradient descent method with a constrained loss function. Experiments on two public datasets, namely Fer2013 and CK+, reveal that OBSR-Net achieves recognition accuracy of 77.65% and 98.82%, respectively. The experimental results show that OBSR-Net is more competitive than other advanced facial expression recognition methods.

Review on Security Challenges and Solutions to Edge Computing

WEN Mu-Qi , WEN Wu-Shao

2024, 33(11):38-47. DOI: 10.15888/j.cnki.csa.009702

Abstract (288) HTML (908) PDF 2.23 M (2437) Comment (0) Favorites

Abstract:Compared with centralized cloud computing frameworks, edge computing deploys additional “edge servers” between a cloud center and on-site intelligent devices to support those devices to quickly and efficiently complete computing tasks and event processing. In an edge computing system, there are a large number of on-site intelligent devices and heterogeneous edge computing servers. Also, stored data is sensitive and requires high privacy. These characteristics of edge computing systems make it difficult to ensure network security. Solving information and network security of edge computing systems is the key to the large-scale industrialization of edge computing technology. However, due to the limitations of computing capacity, network capacity, and storage capacity of edge server devices and on-site intelligent devices, traditional computer network security technology may not fully meet the requirements. Analyzing effective sensitive data protection technologies suitable for edge computing systems, such as federated learning, lightweight encryption, confused and virtual location information, and anonymous identity authentication, and exploring new technologies such as artificial intelligence and blockchain to prevent malicious attacks in edge computing will greatly promote the industrial development of edge computing.

S-UNet: Short-term Precipitation Forecasting Network Based on U-Net and LSTM

XU Meng , DU Jing-Lin , LIU Rui

2024, 33(11):48-57. DOI: 10.15888/j.cnki.csa.009683

Abstract (258) HTML (566) PDF 1.85 M (1158) Comment (0) Favorites

Abstract:The development of deep learning technology invites most research to consider short-term precipitation nowcasting as a prediction task of radar echo sequences. Due to the nonlinear spatiotemporal transformations involved in the complexity of precipitation, existing short-term nowcasting methods have problems such as low accuracy, short extrapolation time, and difficulty in dealing with complex nonlinear spatiotemporal transformations. To address these issues, this study proposes an S-UNet short-term precipitation forecasting network based on U-Net and LSTM. Firstly, the study introduces the S-UNet layer (SL) module to help the network better extract radar sequence features and construct the overall trend of spatiotemporal changes, thereby improving the network efficiency and increasing the extrapolation duration. Secondly, to better address the complexity of radar echo deformation, accumulation, and dissipation, and to enhance the network’s ability to capture complex spatial relationships and simulate movement trajectories, this study constructs the radar feature (RF) module based on LSTM. Finally, by combining the SL module and the RF module with the U-Net framework, the S-UNet short-term precipitation nowcasting network is proposed, achieving remarkable performance on the KNMI dataset. Experimental results show that, compared with the mainstream methods, on the KNMI’s NL-50 and NL-20 datasets, the proposed method improves the Heidke skill score (HSS) and critical success index (CSI) by 5.25% (6.57%) and 2.17% (4.75%) respectively, reaching 0.30(0.29) and 0.72(0.58); the accuracy increases by 2.10% (1.35%), reaching 0.80 (0.80); and the false acceptance rate decreases by 4.27% (1.80%), dropping to 0.24 (0.38). Additionally, the effectiveness of the proposed modules and their combination methods are verified through ablation experiments.

Lightweight Steel Surface Defect Detection with Multiscale Fusion

YANG Ben-Chen , LI Shi-Xi , JIN Hai-Bo , KANG Jie

2024, 33(11):58-67. DOI: 10.15888/j.cnki.csa.009678

Abstract (888) HTML (622) PDF 15.41 M (1401) Comment (0) Favorites

Abstract:The quality of steel surface defect inspection directly affects industrial production safety and machine performance. However, in real factories, steel quality control is limited by equipment conditions, making it challenging to achieve high-precision and real-time inspection. To solve this problem, a lightweight YOLOv8n detection algorithm with multi-scale fusion is proposed. Firstly, a lightweight multi-scale fusion backbone network (RepHGnetv2) is introduced, combining HGnetv2 and RepConv to improve the feature extraction and generalization capabilities of Backbone and reduce the complexity of the model. In the Head part, the ordinary convolution of the original algorithm is replaced with the ADown downsampling module, which reduces computational complexity and improves semantic retention. Finally, the loss function of the original algorithm is replaced by SlideLoss to address sample imbalance. Ablation and comparison experiments are conducted on the NEU-DET dataset. Compared with the original algorithm, the improved algorithm increases precision by 9.3%, reduces the model size by 25.5%, decreases computational complexity by 17.2%, and improves FPS to a certain extent. Comparative experiments are conducted on the VOC2012 dataset to evaluate the generalizability of the improved algorithm, and the results show that the improved algorithm exhibits strong generalizability and effectively improves the accuracy and efficiency of defect detection.

Precipitation Nowcasting Based on MCGAN Model

LIU Rui , DU Jing-Lin , XU Meng

2024, 33(11):68-78. DOI: 10.15888/j.cnki.csa.009673

Abstract (185) HTML (453) PDF 3.66 M (1080) Comment (0) Favorites

Abstract:In the field of precipitation nowcasting, the existing radar echo extrapolation methods based on deep learning have some shortcomings. In terms of image quality, the prediction images are indistinct and deficient in small-scale details, while in terms of prediction accuracy, the precipitation results are not accurate enough. This study proposes a multi-scale generative adversarial (MCGAN) model, which consists of a multi-scale convolutional generator and a fully convolutional discriminator. The generator part adopts an encoder-decoder architecture, which mainly includes multi-scale convolutional blocks and downsampling gating units. Using the dynamic spatiotemporal variability loss function, the MCGAN model is trained under the generative adversarial network (GAN) framework to achieve more accurate and clearer predictions of echo intensity and distribution. Verified in the Shanghai public radar dataset, the performance of the model in this study decreases by 11.15% in the MSE index in image quality evaluation, and increases by 8.99% and 2.95% in the SSIM index and PSNR index compared with the mainstream deep learning models, respectively. In the evaluation of prediction accuracy, the CSI, POD, and HSS indexes increase by 11.92%, 15.89%, and 9.01% on average, and the FAR index decreases by 14.81% on average. In addition, the role of each component of the MCGAN model is demonstrated by ablation experiments.

Optimized Architecture for Cooperative Multi-agent Reinforcement Learning

LIU Wei , CHENG Xu , LI Hao-Yuan

2024, 33(11):79-89. DOI: 10.15888/j.cnki.csa.009636

Abstract (223) HTML (705) PDF 3.03 M (1047) Comment (0) Favorites

Abstract:Numerous real-world tasks require the collaboration of multiple agents, often with limited communication and incomplete observations. Deep multi-agent reinforcement learning (Deep-MARL) algorithms show remarkable effectiveness in tackling such challenging scenarios. Among these algorithms, QTRAN and QTRAN++ are representative approaches capable of learning a broad class of joint-action value functions with strong theoretical guarantees. However, the performance of QTRAN and QTRAN++ is hindered by their reliance on a single joint action-value estimator and their neglect of preprocessing agent observations. This study introduces a novel algorithm called OPTQTRAN, which significantly improves upon the performance of QTRAN and QTRAN++. Firstly, the study proposes a dual joint action-value estimator structure that leverages a decomposition network module to compute additional joint action-values. To ensure accurate computation of joint action-value estimators, it designs an adaptive network that facilitates efficient value function learning. Additionally, it introduces a multi-unit network that groups agent observations into different units for effective estimation of utility functions. Extensive experiments conducted on the widely-used StarCraft benchmark across diverse scenarios demonstrate that the proposed approach outperforms state-of-the-art MARL methods.

Time Series Imputation Method Based on Diffusion and Temporal-frequency Attention

WANG Pan , ZENG Qian-Xin , YANG Huan

2024, 33(11):90-100. DOI: 10.15888/j.cnki.csa.009669

Abstract (275) HTML (586) PDF 1.79 M (1133) Comment (0) Favorites

Abstract:Time series imputation aims to restore data integrity by filling in missing values based on existing data. Currently, RNN-based imputation methods suffer from large errors, and increasing the number of network layers often leads to exploding and vanishing gradients. Additionally, GAN-based and VAE-based imputation methods frequently encounter challenges such as training difficulties and pattern collapse. To address these challenges, this study proposes a time series imputation model named diffusion model and time-frequency attention (DTFA), which reconstructs missing data from Gaussian noise through reverse diffusion. Specifically, this study utilizes multi-scale convolutional modules and two-dimensional attention mechanisms to capture temporal dependencies in time-domain data and employs MLPs and two-dimensional attention mechanisms to learn real and imaginary parts of frequency-domain data. This study also implements a linear imputation module to augment the existing observed data, thereby providing better guidance for model imputation. Finally, this study trains a noise estimation network by minimizing the Euclidean distance between real noise and estimated noise and then utilizes reverse diffusion to fill in the missing values in time series data. The experimental results demonstrate that DTFA outperforms mainstream baseline models in terms of imputation effectiveness on three public datasets: ETTm1, WindPower, and Electricity.

Retail Commodity Detection Based on Deformable Convolution and Multiple Attention

WANG Tian , LIU Li-Bo

2024, 33(11):101-110. DOI: 10.15888/j.cnki.csa.009695

Abstract (151) HTML (671) PDF 5.81 M (1161) Comment (0) Favorites

Abstract:A retail commodity detection algorithm based on improved YOLOv8s is proposed in response to the difficulty in accurately extracting global features and irrelevant feature interference caused by retail commodity rotation and deformation. Firstly, using normalized deformable convolutions to replace some standard convolutions enhances the ability to extract global features by fully capturing long-range dependencies and highlighting key channel features. Secondly, using an improved dynamic detection head and a multi-attention mechanism based on spatial perception, scale perception, and task perception captures more discriminative local features of goods to suppress irrelevant feature interference. Finally, the InnerEIoU loss function is used to replace CIoU to reduce the missed detection rate of goods. Experimental results show that the proposed algorithm achieves an mAP@0.5:0.95 of 93.3% on the RPC retail commodity dataset, which is 1.5% higher than the original algorithm and better than other mainstream detection algorithms. At the same time, the number of model parameters and the amount of computation decrease by 10.0% and 6.5% respectively, enabling accurate retail commodity detection in practical scenarios with limited storage and computing resources.

Aerial Infrared Target Detection Based on Attention and Quantization Awareness

ZHOU Jin , PEI Xiao-Fang

2024, 33(11):111-120. DOI: 10.15888/j.cnki.csa.009699

Abstract (150) HTML (508) PDF 2.43 M (932) Comment (0) Favorites

Abstract:Aiming at the problems of low contrast, poor recognition accuracy, and difficult detection of infrared targets in aerial scenes, this study proposes an aerial infrared target detection algorithm based on attention and quantization awareness. Firstly, the DC-ELAN module is constructed by using DCNv2 to replace the 3×3 convolution in the ELAN module, which effectively improves the ability of the model to capture local and global features, and then strengthens the feature representation ability of the network. Secondly, by cleverly integrating the SE attention mechanism into the SPPCSPC module and the ELAN module, the SE-SPPCSPC module and the SE-ELAN module are designed, which helps to enhance the spatial self-attention of the feature map, and the model can better focus on target areas. In addition, the QARepVGG module is introduced to improve the quantization awareness of the model and enhance its robustness to quantization errors. Finally, the DyHead module is introduced, which can dynamically adjust the detection head according to different input images, improve the detection ability of the model to targets of different sizes and shapes, and further improve the accuracy and robustness of infrared target detection. Experimental results show that compared with the original model, the improved YOLOv7-tiny model has 3.4% and 4.8% increases in mAP@0.5 and mAP@0.5:0.95 values without increasing the amount of calculation, which significantly improves model detection accuracy.

Tuberculosis Pathogen Detection Based on Improved Faster R-CNN

JU Rui-Wen , SUN Zhen , LI Qing-Dang

2024, 33(11):121-130. DOI: 10.15888/j.cnki.csa.009679

Abstract (610) HTML (514) PDF 2.95 M (960) Comment (0) Favorites

Abstract:In this study, a detection method for tuberculosis pathogens based on Faster R-CNN is proposed to detect tuberculosis with higher accuracy and lower missed detection rate. First, the Mosaic data enhancement method is used to expand the dataset to improve the generalization ability of the model. At the same time, the K-means clustering algorithm is introduced to re-cluster the used dataset to generate the initial candidate box size of the paired anchor points. Secondly, the original feature extraction network in Faster R-CNN is replaced with Res2Net, and all its convolution kernels are replaced with empty convolution. This can bring a larger receptive field compared with the original convolution when the number of parameters remains unchanged. Furthermore, the improved GC-FPN module is introduced to make the model pay more attention to small target information while being lightweight. Finally, ROI Align is introduced to solve the problem of deviation between the candidate box and the initial regression position. The experimental results show that, compared with the original Faster R-CNN algorithm, the improved Faster R-CNN model has a 2.7% higher accuracy and an 1.4% higher recall rate on the open data set. This algorithm has been verified on the dataset of tuberculosis images and possesses high accuracy.

Face Anti-spoofing Based on Supervised Multi-view Contrastive Learning and Two-stage Bilinear Feature Fusion

SUN Wen-Yun , LI Jin , JIN Zhong

2024, 33(11):131-141. DOI: 10.15888/j.cnki.csa.009701

Abstract (534) HTML (576) PDF 2.47 M (982) Comment (0) Favorites

Abstract:In this study, a multi-branch network that integrates multi-scale frequency features and depth map features trained by generative adversarial network (GAN) is proposed. Specifically, edge texture information in high-frequency features is beneficial to capturing moire patterns. Low-frequency features are more sensitive to color distortion. Depth maps are more discriminative than RGB images from the visual level as auxiliary information. Supervised multi-view contrastive learning is employed to further enhance multi-view feature learning. Moreover, a two-stage bilinear feature fusion method is proposed to effectively integrate multi-branch features from different views. To evaluate the model, ablation experiments, feature fusion comparison experiments, intra-set experiments and inter-set experiments are conducted on four widely used public datasets, namely CASIA-FASD, Replay-Attack, MSU-MFSD, and OULU-NPU. The experiment result shows that the average HTER of the proposed model on the four tested protocols is 5% (20.3% to 15.0%) better than the DFA method in the inter-set evaluation.

Large-scale Multi-objective Optimization Algorithm with Multiple Strategies

PEI Qian-Ru , ZOU Feng , CHEN De-Bao

2024, 33(11):142-156. DOI: 10.15888/j.cnki.csa.009672

Abstract (303) HTML (557) PDF 2.76 M (1045) Comment (0) Favorites

Abstract:When dealing with large-scale multi-objective optimization problem (LSMOP), the MOEA/D algorithm shows poor scalability in the decision space and a tendency to converge to local optima as the dimensionality of decision variables increases. To address this issue, this study proposes a large-scale MOEA/D algorithm with multiple strategies (MSMOEA/D). The MSMOEA/D algorithm introduces a hybrid initialization strategy based on autoencoders in the optimization process to expand the coverage of the initial population, thus promoting global search. Moreover, a neighborhood adjustment strategy based on aggregation functions is proposed, which can more accurately control the search range during the search process by adjusting neighborhood sizes, thereby avoiding low search efficiency caused by excessively large or small neighborhoods. Furthermore, a mutation-selection strategy based on non-dominated sorting is adopted during the optimization process. Different subproblems select their mutation strategies according to the number of individuals in the first level of non-dominated sorting to avoid the population falling into local optima and enhance the overall performance of the algorithm. Finally, the MSMOEA/D algorithm and other existing algorithms are evaluated using LSMOP and DTLZ test problems. Experimental results verify the effectiveness of the proposed algorithm for solving LSMOPs.

Skin Lesion Segmentation Based on Edge Enhancement Combined with Multi-scale Information Fusion

QI Xiang-Ming , ZHANG Zhi-Wei

2024, 33(11):157-166. DOI: 10.15888/j.cnki.csa.009676

Abstract (210) HTML (577) PDF 2.85 M (1128) Comment (0) Favorites

Abstract:To address the problems of skin lesions, such as varied sizes, low contrast with surrounding skin, blurred and irregular boundaries, artifacts, and hair interference, this study proposes a skin lesion segmentation algorithm that combines edge enhancement with multi-scale information fusion. The algorithm consists of an encoder, a multi-scale sensing module, an edge enhancement module, and a lightweight decoder. Firstly, a Transformer module is built in the encoder to extract global information, and convolution operations are used to extract local information. Secondly, a multi-scale sensing module is designed to integrate multi-scale features using a gated atrous convolution pyramid module with a dense connection structure. An edge enhancement module is constructed, utilizing deep features to promote the exploration of edge features to better retain details and edge information. Finally, a lightweight decoder is designed, employing the CARAFE lightweight operator for upsampling, to maintain high segmentation accuracy with fewer parameters. Comparative experiments on open data sets ISIC2016 and ISIC2018 show that the segmentation accuracy of the proposed algorithm is higher than that of other popular algorithms.

Depression Diagnosis Algorithm Based on Sleep EEG Signals

YANG Jia-Hao , ZHANG Jia-Hui , YAO Shao-Cong , QIU Qian , PAN Jia-Hui

2024, 33(11):167-176. DOI: 10.15888/j.cnki.csa.009671

Abstract (598) HTML (571) PDF 1.68 M (1060) Comment (0) Favorites

Abstract:The diagnosis of depression is an important research direction in the medical field. However, existing methods for diagnosing depression face problems such as high cost, low efficiency, low accuracy, and weak interpretability. To solve these problems, this study proposes an automatic algorithm for depression diagnosis based on sleep EEG signals, combined with sleep staging. This method first combines convolutional neural networks with bidirectional long short-term memory neural networks to extract advanced features of sleep signals. At the same time, it analyzes the correlation among different sleep stages, improving the accuracy and interpretability of sleep staging. The experimental results show that this method achieves the highest accuracy of 95.82% on the public dataset Sleep-EDF, surpassing most existing methods. Subsequently, based on the results of sleep staging, the compression net 2 dimension (DepNet2D) model combined with convolutional neural networks is proposed to extract features and classify EEG data during the REM phase. This model can effectively learn the spatiotemporal dependencies of sleep EEG, capture the feature patterns of brain activity in patients with depression, and improve the accuracy of identifying the spectral features of patients. The experimental results show that in the diagnosis of depression, the proposed method in this study reaches accuracy of 88.82%, which is higher than that of traditional models. The proposed method enhances the interpretability of depression diagnosis and has practical value for modern depression research and analysis, providing new ideas and methods for research and clinical practice in the field of mental health.

Underwater Target Detection via Improved YOLOv8

ZHOU Xin , LI Yuan-Lu , WU Ming-Xuan , FAN Xiao-Ting , WANG Jian-Xiang

2024, 33(11):177-185. DOI: 10.15888/j.cnki.csa.009680

Abstract (820) HTML (577) PDF 3.39 M (1146) Comment (0) Favorites

Abstract:An improved YOLOv8 algorithm for underwater target detection is proposed to prevent missed detection of objects with different scales and overlapping occlusion. Firstly, deformable convolutions are introduced into the backbone network (deformable convolution network, DCN) to improve the feature extraction capability of the model by means of the adaptive deformation mechanism of convolution kernels. Secondly, a module combining atrous convolution and spatial pyramid, termed ASPF, is designed to expand the receptive field of the output feature map and improve the perception ability of the model for detecting underwater targets of multiple scales. Finally, the loss function is improved to optimize the training process of the model and improve detection accuracy. The improved algorithm is tested on the URPC data set, and the results show that its detection accuracy reaches 87.3%, which is 3.4% higher than that of the original YOLOv8 algorithm. Moreover, it can accurately detect underwater targets with different scales and overlapping occlusion.

Road Recognition in Remote Sensing Images Using SegFormer Fused with Attention Mechanism

WANG Xiao-Jie , CHEN Shao-Kang , YAN Hao-Wei , YANG He-Meng , YAN Zheng-Liang , WANG Sen

2024, 33(11):186-193. DOI: 10.15888/j.cnki.csa.009641

Abstract (227) HTML (538) PDF 2.50 M (893) Comment (0) Favorites

Abstract:Road information is of great significance and value in remote sensing images, and thus the accurate extraction of roads is crucial for many applications. However, there are two main challenges in road recognition. Firstly, the background of satellite images is complex and diverse, while the morphology of roads is also complex and diverse, which poses a challenge to automatic road recognition. Secondly, road pixels only account for a small portion of the entire image, leading to class imbalance. To address these challenges, this study proposes an automatic road recognition algorithm based on an improved SegFormer model. The algorithm employs two main strategies to improve the recognition performance. Firstly, spatial attention modules are added to the output of each stage of the SegFormer encoder. This module helps to weaken the interference from complex backgrounds and enhance the attention to road areas. By introducing spatial attention mechanisms, the model can better capture the features of roads, thereby improving recognition accuracy. Secondly, a hybrid loss function that combines pixel contrast loss and cross-entropy loss is used. Such a loss function can better handle class imbalance problems and make the model place more focus on training road categories. By optimizing the training process, the model can better learn road feature representation, thereby improving recognition accuracy. Comparative experimental analysis shows that the improved model achieves an approximate 3.3% improvement in the mIoU metric on the test set.

PM_2.5 Concentration Prediction Based on VE-GEP Algorithm

WANG Chao-Xue , ZOU Fei

2024, 33(11):194-201. DOI: 10.15888/j.cnki.csa.009688

Abstract (140) HTML (528) PDF 1.37 M (985) Comment (0) Favorites

Abstract:Accurate prediction of PM_2.5 concentration is essential for public health and environmental protection, but its nonlinearity, variability, and complexity make it difficult. Based on this, this study proposes a gene expression programming algorithm based on virus evolution (VE-GEP) to predict PM_2.5 concentration in response to the shortcomings of traditional GEP. The algorithm introduces a resurrection mechanism and a mutagenic restart mechanism based on GEP. The resurrection mechanism removes poor-quality individuals from the population and improves individual quality in the population. The mutagenic restart mechanism increases population diversity and enhances algorithm optimization-seeking ability by introducing high-quality genes and new individuals. Experimental results show that the VE-GEP algorithm improves the prediction models to different degrees compared to GEP, DSCE-GEP, and CNN-LSTM in spring, summer, and fall, with improvements in the fitness of 1.28%/0.1%/0.13%, 1.86%/1.29%/0.42%, and 0.57%/0.24%/0.29%, respectively, which provides new ideas and methods for PM_2.5 concentration prediction studies.

Lightweight Citrus Maturity Detection Based on Improved YOLOv8n

XIAO Yang , XIANG Ming-Yu , LI Xi

2024, 33(11):202-208. DOI: 10.15888/j.cnki.csa.009693

Abstract (168) HTML (612) PDF 3.20 M (1002) Comment (0) Favorites

Abstract:To achieve intelligent citrus picking, fast and accurate identification of citrus in the orchard environment becomes critical. Aiming at the defective adaptation of existing target detection algorithms to the environment and low efficiency, this study proposes a lightweight citrus maturity detection algorithm based on the YOLOv8n model, YOLOv8n-CMD (YOLOv8n citrus maturity detection). Firstly, the backbone network structure is optimized to improve the detection of small targets. Secondly, the CBAM attention mechanism is added to improve the classification effect of the model. Then, Ghost convolution is introduced, and the neck C2f module in the original YOLOv8 model is combined with Ghost to reduce the amount of computation and that of parameters. Finally, the SimSPPF module is used in place of the original pyramidal pooling layer to improve model detection efficiency. Experimental results show that the YOLOv8n-CMD algorithm reduces the number of parameters and computation by 31.8% and 7.4%, respectively, and improves the accuracy by 3.0%, which is more suitable for citrus detection research in the orchard environment.

Fine-grained User Intention Understanding for Mobile Applications Based on Multi-modality Fusion

ZHANG Yi-Han , HONG Geng , YANG Zhe-Min

2024, 33(11):209-223. DOI: 10.15888/j.cnki.csa.009653

Abstract (132) HTML (654) PDF 3.69 M (991) Comment (0) Favorites

Abstract:With the increasing complexity of mobile applications, existing privacy leak detection methods based on user intent face greater challenges. On the one hand, traditional privacy leak detection, which is based on APP-level user intent, only focuses on whether the privacy collection behavior of the application aligns with its core functional requirements. This approach is not suitable for today’s mobile APP security detection, which has broad functionalities and diverse user intents, necessitating a more fine-grained user intent classification. On the other hand, current research mainly focuses on evaluating whether the privacy collection behaviors triggered by interface widgets, such as icons, are consistent with user intent. However, the improper design and misuse of icons are very common, which limits the effectiveness of privacy risk assessments that rely solely on widget-based user intents. Therefore, a comprehensive understanding of user intent at the overall interface level is still needed. In response to the above issues, this study first extracts and summarizes a fine-grained user intent list suitable for privacy compliance detection based on Chinese privacy policies. Then, based on the characteristics of mobile application interface design, a multi-classification model with multi-modal feature fusion is designed and implemented to identify the user intent reflected by the entire mobile interface. Evaluation results show that the intent extraction tool in this study has achieved 83% in both precision and recall, and the user intent classification model reaches 80% and 83% in precision and recall, respectively, demonstrating good detection effectiveness and practical usability.

Detection for Sensitive Data Collection Behaviors in Mini-programs

HUA Nan , YANG Zhe-Min

2024, 33(11):224-236. DOI: 10.15888/j.cnki.csa.009642

Abstract (130) HTML (591) PDF 1.64 M (1021) Comment (0) Favorites

Abstract:Mini-programs have been widely used in recent years, causing widespread privacy and security concerns for carrying a large amount of sensitive user data. Existing privacy and security analysis techniques for traditional mobile applications cannot be directly applied to mini-programs. On the one hand, it is difficult for existing methods to effectively analyze the privacy transfer caused by the closed-source mini-program framework and the cross-scope privacy transfer caused by the JavaScript closures, resulting in a lack of analysis results. On the other hand, the mechanism of dynamic sub-package loading leads to incomplete analysis scope, further resulting in a lack of analysis results. This study proposes a hybrid dynamic/static method for analyzing the privacy collection behaviors in mini-programs. First, this method constructs a data propagation path based on either control flow or data dependency for different unit boundaries in the mini-programs, namely the mini-program privacy propagation flow graph. Furthermore, this method effectively explores the mini-program UI by learning and transferring traditional mobile application UI design knowledge, and using the control flow association between UI events and page transition information as a guide, thereby triggering the sub-package loading process. The corresponding sub-package code is analyzed and integrated with existing analysis results to form a more comprehensive mini-program privacy propagation flow graph. This study implements the tracking of sensitive data in mini-programs through the privacy propagation flow graph. Based on the above method, this study implements MiniSafe, a privacy collection behavior analysis tool for mini-programs. The evaluation results show that MiniSafe achieves 90.4% and 87.4% in precision and recall respectively, both of which outperform existing work. MiniSafe detects an average of 7 sensitive data collection behaviors in each mini-program. By considering sensitive data collection behaviors in mini-program sub-packages, the overall detection number has increased by 42.9%, demonstrating good detection performance and practical usability.

Multi-granularity Feature Fusion for Biomedical Named Entity Recognition and Normalization

LIU Tong , SHI Chang-Ling , NI Wei-Jian

2024, 33(11):237-246. DOI: 10.15888/j.cnki.csa.009640

Abstract (155) HTML (540) PDF 2.00 M (908) Comment (0) Favorites

Abstract:To extract rich entity information and normalized expressions from biomedical literature, this study proposes a multi-granularity feature fusion approach for biomedical named entity recognition and normalization (MGFFA). By integrating character-level, word-level, and concept-level textual information, the model significantly enhances its learning capability. It also incorporates a memory bank for storing and synthesizing information from different levels to achieve a deeper understanding of the complex relationships between entities and their normalized labels. With the integration of pre-trained models, MGFFA captures not only coarse-grained semantic representations of text but also conducts detailed analysis at the morphological level, thereby comprehensively improving the recognition accuracy of long-span entities. Experimental results on the NCBI and NC5CDR datasets demonstrate that the model outperforms other baseline models overall.

Elevator Risk Prediction Based on Deep Survival Analysis and SHAP

ZENG Qian-Xin , WANG Pan , YANG Huan , YANG Yong

2024, 33(11):247-256. DOI: 10.15888/j.cnki.csa.009685

Abstract (122) HTML (643) PDF 2.50 M (1041) Comment (0) Favorites

Abstract:This study proposes a comprehensive solution that combines deep survival analysis, data segmentation, and data imputation to address the issue of statistical predictive maintenance for elevators, which is characterized by low frequency and irregular time periods. This study establishes both dynamic and static survival vectors to capture factors influencing major fault risks. Additionally, to tackle left censoring in recorded data, this research employs data imputation and explores the impact of different imputation methods and segmentation strategies on the accuracy of deep survival models. Finally, this study utilizes SHAP to analyze deep survival models in elevators to reveal the dynamic influence of various factors on fault risks. The results indicate that a model combining rough data segmentation with Cox imputation demonstrates strong predictive capability and accuracy. The DeepSurv model excels in predictive capability and stability. The contribution of factors such as elevator age and lifting height to major fault risks can shift under specific conditions.

Vehicle Target Recognition Based on Transfer Learning

LI Hui , WANG Yan-E

2024, 33(11):257-263. DOI: 10.15888/j.cnki.csa.009667

Abstract (176) HTML (557) PDF 1.20 M (900) Comment (0) Favorites

Abstract:To improve the accuracy and real-time performance of vehicle recognition, this study proposes a vehicle recognition method based on transfer learning. This optimized method improves the accuracy of vehicle recognition, reduces model training time, and improves the robustness of the model by integrating convolutional neural networks and support vector machines. This method first uses a convolutional neural network to train its network on the CIFAR-10 data set. Residual optimization is then applied to a deeper pre-trained network to extract fine-grained features. During the parameter transfer process of the model network, only the pre-trained parameters of the convolutional layer are transferred, and a fully connected layer is added for fine-tuning on the vehicle data set. Finally, the extracted features are directly put into the support vector machine for classification. Detailed model experiments and result analysis demonstrate that this method achieves the highest recognition accuracy of 97.56% and a recognition time of 260 ms per single image, indicating optimized performance in both recognition time and accuracy.

WeChat

Mobile website

>Survey

Current Issue

Volume

Issue