Volume 34,Issue 1,2025 Table of Contents

Survey on Prompt Engineering in Large Language Model

WANG Dong-Qing , LU Fei , ZHANG Bing-Hui , LI Dao-Tong , PENG Ji-Yang , WANG Bing , YAO Fan-Yi , AI Shan-Bin

2025, 34(1):1-10. DOI: 10.15888/j.cnki.csa.009782 CSTR: 32024.14.csa.009782

Abstract (541) HTML (397) PDF 1.23 M (1096) Comment (0) Favorites

Abstract:Prompt engineering plays a crucial role in unlocking the potential of large language model. This method guides the model’s response by designing prompt instructions to ensure the relevance, coherence, and accuracy of the response. Prompt engineering does not require fine-tuning model parameters and can be seamlessly connected with downstream tasks. Therefore, various prompt engineering techniques have become a research hotspot in recent years. Accordingly, this study introduces the key steps for creating effective prompts, summarizes basic and advanced prompt engineering techniques, such as chain of thought and tree of thought, and deeply explores the advantages and limitations of each method. At the same time, it discusses how to evaluate the effectiveness of prompt methods from different perspectives and using different methods. The rapid development of these technologies enables large language models to succeed in a variety of applications, ranging from education and healthcare to code generation. Finally, future research directions of prompt engineering technology are prospected.

Survey on Deep Learning-based Lesion Segmentation and Detection in Acute Ischemic Stroke

MAO Tian-Chi , LI Yang , LI Ming , SUN Xing , MA Jin-Gang

2025, 34(1):11-25. DOI: 10.15888/j.cnki.csa.009709 CSTR: 32024.14.csa.009709

Abstract (251) HTML (403) PDF 1.79 M (1176) Comment (0) Favorites

Abstract:Acute ischemic stroke is the most common type of stroke in clinical practice. Due to its sudden onset and short treatment time window, it becomes one of the important factors leading to disability and death world wide. With the rapid development of artificial intelligence, deep learning technology shows great potential in the diagnosis and treatment of acute ischemic stroke. Deep learning models can quickly and efficiently segment and detect lesions based on patients’ brain images. This study introduces the development history of deep learning models and commonly used public datasets for stroke research. For various modalities and scanning sequences derived from computerized tomography (CT) and magnetic resonance imaging (MRI), it elaborates on the research progress of deep learning technology in the field of lesion segmentation and detection in acute ischemic stroke and summarizes and analyzes the improvement ideas of related research. Finally, it points out existing challenges of deep learning in this field and proposes possible solutions.

Survey on Classification and Identification of Microscopic Remaining Oil Occurrence Forms Based on Deep Learning

ZHAO Ya , GUAN Yu , JIA Di

2025, 34(1):26-36. DOI: 10.15888/j.cnki.csa.009733 CSTR: 32024.14.csa.009733

Abstract (191) HTML (407) PDF 1.32 M (1091) Comment (0) Favorites

Abstract:The research on the classification and identification of microscopic residual oil occurrence states plays a vital role in residual oil exploitation and is of great significance for improving oil field recovery. In recent years, a large number of studies in this field have promoted the development of technologies for identifying microscopic residual oil by introducing deep learning. However, deep learning has not yet established a unified framework for microscopic residual oil identification, nor has it formed a standardized operation process. To guide future research, this study reviews existing methods for identifying residual oil and introduces the identification technologies for microscopic residual oil based on machine vision from several aspects, including image acquisition and classification standards, image processing, and residual oil identification methods. Residual oil identification methods are categorized into traditional and deep learning-based methods. The traditional methods are further divided into those based on manual feature extraction and those based on machine learning classification. The deep learning-based methods are divided into single-stage and two-stage methods. Detailed summaries are provided for data enhancement, pre-training, image segmentation, and image classification. Finally, this study discusses the challenges of applying deep learning to microscopic residual oil identification and explores future development trends.

Behavior Tree Generation Based on Scene Semantic Perception and Reasoning with Large Language Models

YAN Long-Wu , ZHENG Wang-Li , LIN Yun-Han

2025, 34(1):37-46. DOI: 10.15888/j.cnki.csa.009742 CSTR: 32024.14.csa.009742

Abstract (280) HTML (219) PDF 6.81 M (1025) Comment (0) Favorites

Abstract:Embodied AI requires the ability to interact with and perceive the environment, and capabilities such as autonomous planning, decision making, and action taking. Behavior trees (BTs) become a widely used approach in robotics due to their modularity and efficient control. However, existing behavior tree generation techniques still face certain challenges when dealing with complex tasks. These methods typically rely on domain expertise and have a limited capacity to generate behavior trees. In addition, many existing methods have language comprehension deficiencies or are theoretically unable to guarantee the success of the behavior tree, leading to difficulties in practical robotic applications. In this study, a new method for automatic behavior tree generation is proposed, which generates an initial behavior tree with task goals based on large language models (LLMs) and scene semantic perception. The method in this study designs robot action primitives and related condition nodes based on the robot’s capabilities. It then uses these to design prompts to make the LLMs output a behavior plan (generated plan), which is then transformed into an initial behavior tree. Although this paper takes this as an example, the method has wide applicability and can be applied to other types of robotic tasks according to different needs. Meanwhile, this study applies this method to robot tasks and gives specific implementation methods and examples. During the process of the robot performing a task, the behavior tree can be dynamically updated in response to the robot’s operation errors and environmental changes and has a certain degree of robustness to changes in the external environment. In this study, the first validation experiments on behavior tree generation are carried out and verified in the simulated robot environment, which demonstrates the effectiveness of the proposed method.

Multi-scale Deformable 3D Medical Image Registration Based on Transformer

CHEN Lu-Ying , YU Guo-Rong , BAO Hai-Zhou , BIAN Xiao-Yong , CHEN Cong-Peng

2025, 34(1):47-57. DOI: 10.15888/j.cnki.csa.009760 CSTR: 32024.14.csa.009760

Abstract (228) HTML (215) PDF 2.29 M (860) Comment (0) Favorites

Abstract:Deformable 3D medical image registration remains challenging due to irregular deformations of human organs. This study proposes a multi-scale deformable 3D medical image registration method based on Transformer. Firstly, the method adopts a multi-scale strategy to realize multi-level connections to capture different levels of information. Self-attention mechanism is employed to extract global features, and dilated convolution is used to capture broader context information and more detailed local features, so as to enhance the registration network’s fusion capacity for global and local features. Secondly, according to the sparse prior of the image gradient, the normalized total gradient is introduced as a loss function, effectively reducing the interference of noise and artifacts on the registration process, and better adapting to different modes of medical images. The performance of the proposed method is evaluated on publicly available brain MRI datasets (OASIS and LPBA). The results show that the proposed method can not only maintain the advantages of the learning-based method in run-time but also well performs in mean square error and structural similarity. In addition, ablation experiment results further prove the validity of the method and normalized total gradient loss function design proposed in this study.

Electricity Market Clearing Provenance Model Based on PROV and Smart Contracts

XU Zhan-Yang , SHI Hong-Yan , YUE Zi-Yu , ZHAO Hong , XU Jian , WANG Zhe

2025, 34(1):58-68. DOI: 10.15888/j.cnki.csa.009722 CSTR: 32024.14.csa.009722

Abstract (174) HTML (226) PDF 1.92 M (770) Comment (0) Favorites

Abstract:In the current electricity market, the volume of daily spot market clearing data has reached millions or tens of millions. With the increase in trading activities and the complexity of the market structure, ensuring the integrity, transparency, and traceability of trading data has become a key issue to be studied in the field of market clearing in China. Therefore, this study proposes a data provenance method for power market clearing based on the PROV model and smart contracts, aiming to automate the storage and updating of provenance information through smart contracts to improve the transparency of the clearing process and the trust of the participants. The proposed method utilizes the elements of entities, activities, and agents in the PROV model, combined with the hierarchical storage and immutability of blockchain technology, to record and track trading activities and rule changes in the electricity market. The method not only enhances data transparency and trust among market participants but also optimizes data management and storage strategies, reducing operational costs. In addition, the method provides proof of compliance for power market clearing, helping market participants meet increasing regulatory requirements.

Efficient Tracking for UAVs Based on Siamese Network

WANG Jian-Hao , YE Ming , YAO Jia-Feng

2025, 34(1):69-79. DOI: 10.15888/j.cnki.csa.009744 CSTR: 32024.14.csa.009744

Abstract (158) HTML (129) PDF 3.56 M (558) Comment (0) Favorites

Abstract:In the field of visual tracking, most deep learning-based trackers overemphasize accuracy while overlooking efficiency, thereby hindering their deployment on mobile platforms such as drones. In this study, a deep cross guidance Siamese network (SiamDCG) is put forward. To better deploy on edge computing devices, a unique backbone structure based on MobileNetV3-small is devised. Given the complexity of drone scenarios, the traditional method of regressing target boxes using Dirac δ distribution has significant drawbacks. To overcome the blurring effects inherent in bounding boxes, the regression branch is converted into predicting offset distribution, and the learned distribution is used to guide classification accuracy. Excellent performances on multiple aerial tracking benchmarks demonstrate the proposed approach’s robustness and efficiency. On an Intel i5 12th generation CPU, SiamDCG runs 167 times faster than SiamRPN++, while using 98 times fewer parameters and 410 times fewer FLOPs.

Autonomous Navigation System for Firefighting Robots in Urban Areas

QIU Zhong-Yu , YIN Xiao-Qia , CHEN Kai , SHI Chen-Wei , CHEN Ze-Hua , LIU Shuang

2025, 34(1):80-89. DOI: 10.15888/j.cnki.csa.009725 CSTR: 32024.14.csa.009725

Abstract (187) HTML (112) PDF 3.25 M (514) Comment (0) Favorites

Abstract:When firefighting robots are deployed for medium to long-distance emergency tasks in urban areas, they often struggle with the inability to obtain a global prior map of the environment in advance. Consequently, they require manual remote control to reach the fire location, which involves cumbersome operations and significantly reduces firefighting efficiency. To address these issues, this study designs a new autonomous navigation system for firefighting robots in urban areas. This system is based on commercial electronic maps (such as Amap, Baidu Maps, and other 2D electronic maps) and effectively integrates the global navigation satellite system (GNSS) with local laser-based environmental sensing technologies. Firstly, commercial electronic maps are used to plan rough global sub-goal points. The sequence of global goal points is then registered with the actual positioning information and sent to the local planner. Subsequently, local planning tasks are performed within the local grid map established by laser sensing, following the sequence of sub-goal points. The improved local planner updates the sub-goal points dynamically based on real-time environmental changes during movement. Multiple simulations are conducted in a simulated environment, and validation is performed using a tracked vehicle in real-world scenarios. The results indicate that the designed system can accurately execute long-distance outdoor navigation tasks without a global prior map of the environment, providing an efficient and safe solution for the outdoor navigation of firefighting robots.

Instances Segmentation of Urban Streetscape Incorporating Attention and Multi-scale Feature

WANG Jun , LYU Jia , CHENG Yong

2025, 34(1):90-99. DOI: 10.15888/j.cnki.csa.009740 CSTR: 32024.14.csa.009740

Abstract (164) HTML (114) PDF 3.59 M (447) Comment (0) Favorites

Abstract:Algorithms for the instance segmentation of urban street scenes can significantly improve the accuracy and efficiency of urban environment perception and intelligent transportation system. To address mutual occlusions between pedestrians and vehicles and significant background interference in urban street scenes, this study proposes an instance segmentation model, FMInst, based on a frequency attention mechanism and multi-scale feature fusion. Firstly, a high and low-frequency attention mechanism is constructed for interactive coding to increase high-resolution detail information. Secondly, a soft pooling operation is introduced into the Patch Merging layer of the Swin Transformer backbone network to reduce the loss of feature information and effectively improve the segmentation of small-scale targets. Finally, an MLP layer is combined to construct multi-scale deep convolution, which effectively enhances the extraction of local information and improves the segmentation accuracy. Comparison experiments conducted on the public dataset Cityscapes show that FMInst reaches an mAP of 35.6%, with an improvement of 1.2%, and an AP50 of 61.4%, with an improvement of 2.2%. The mask quality and the segmentation effect of the instance segmentation are greatly improved.

MRI Image Segmentation of Knee Cartilage Based on Semi-supervised Learning and Conditional Probability

MA Chun-Shuai , CHENG Yuan-Zhi

2025, 34(1):100-109. DOI: 10.15888/j.cnki.csa.009759 CSTR: 32024.14.csa.009759

Abstract (135) HTML (110) PDF 1.64 M (361) Comment (0) Favorites

Abstract:This study introduces a knee cartilage segmentation method based on semi-supervised learning and conditional probability, to address the scarcity and quality issues of annotated samples in medical image segmentation. As it is difficult for existing embedded deep learning models to effectively model the hierarchical relationships among network outputs, the study proposes an approach combining conditional-to-unconditional mixed training and task-level consistency. In this way, the hierarchical relationships and relevance among labels are efficiently utilized, and the segmentation accuracy is enhanced. Specifically, the study employs a dual-task deep network predicting both pixel-level segmentation images and geometric perception level set representations of the target. The level set is shifted into an approximate segmentation map through a differentiable task transformation layer. Meanwhile, the study also introduces task-level consistency regularization between level line-based and directly predicted segmentation maps on labeled and unlabeled data. Extensive experiments on two public datasets demonstrate that this approach can significantly improve performance through the incorporation of unlabeled data.

Traffic Sign Recognition Under Complex Conditions

HUANG Jian , ZHAN Yue , HU Fan

2025, 34(1):110-117. DOI: 10.15888/j.cnki.csa.009734 CSTR: 32024.14.csa.009734

Abstract (159) HTML (107) PDF 5.34 M (449) Comment (0) Favorites

Abstract:This study aims to delve into the joint detection of traffic signs and signals under complex and variable traffic conditions, analyzing and resolving the detrimental effects of harsh weather, low lighting, and image background interference on detection accuracy. To this end, an improved RT-DETR network is proposed. Based on a resource-limited operating environment, this study introduces a network, ResNet with PConv and efficient multi-scale attention (PE-ResNet), as the backbone to enhance the model’s capability to detect occlusions and small targets. To augment the feature fusion capability, a new cross-scale feature-fusion module (NCFM) is introduced, which facilitates better integration of semantic and detailed information within images, offering a more comprehensive understanding of complex scenes. Additionally, the MPDIoU loss function is introduced to more accurately measure the positional relationships among target boxes. The improved network reduces the parameter count by approximately 14% compared to the baseline model. On the CCTSDB 2021 dataset, S2TLD dataset, and the self-developed multi-scene traffic signs (MTST) dataset, the mAP50:95 increases by 1.9%, 2.2%, and 3.7%, respectively. Experimental results demonstrate that the enhanced RT-DETR model effectively improves target detection accuracy in complex scenarios.

Lightweight Image Super-resolution Network Based on Hierarchical Progressive Fusion of Feature

ZHANG Hao , MA Ji , YUAN Jiang

2025, 34(1):118-127. DOI: 10.15888/j.cnki.csa.009723 CSTR: 32024.14.csa.009723

Abstract (135) HTML (119) PDF 2.68 M (509) Comment (0) Favorites

Abstract:In recent years, with the development of deep learning techniques, convolutional neural network (CNN) and Transformers have made significant progress in image super-resolution. However, for the extraction of global features of an image, it is common to stack individual operators and repeat the computation to gradually expand the receptive field. To better utilize global information, this study proposes that local, regional, and global features should be explicitly modeled. Specifically, local information, regional-local information, and global-regional information of an image are extracted and fused hierarchically and progressively through channel attention-enhanced convolution, a dual-branch parallel architecture consisting of a window-based Transformer and CNN, and a dual-branch parallel architecture consisting of a standard Transformer and a window-based Transformer. In addition, a hierarchical feature fusion method is designed to fuse the local information extracted from the CNN branch and the regional information extracted from the window-based Transformer. Extensive experiments show that the proposed network achieves better results in lightweight SR. For example, in the 4× upscaling experiments on the Manga109 dataset, the peak signal-to-noise ratio (PSNR) of the proposed network is improved by 0.51 dB compared to SwinIR.

Next Point-of-interest Recommendation Based on Dual-granularity Sequence Fusion

PENG Jin , SHI Yan-Cui , LIU Ling-Yun

2025, 34(1):128-136. DOI: 10.15888/j.cnki.csa.009724 CSTR: 32024.14.csa.009724

Abstract (109) HTML (100) PDF 1.43 M (334) Comment (0) Favorites

Abstract:Existing methods fail to effectively leverage check-in information to provide precise location recommendation services. To address this problem, this study introduces a novel model for the next point-of-interest (POI) recommendation based on dual-granularity sequence fusion. Firstly, the model integrates fine-grained spatio-temporal sequence information with naturally occurring coarse-grained categorical sequence information in real life. It effectively captures long-term dependency relationships using gated recurrent units to enrich the context of check-ins. Subsequently, the model uses the extracted information to transform the “hard” segmentation of long sequences into a “soft” segmentation, enabling the extraction of complete semantic information from local sub-sequences. Finally, the recommendation model aggregates salient information from each local sub-sequence. Experimental results on the Foursquare and Gowalla datasets show that the proposed model improves the recall by 9.07% and 9.37%, respectively, and enhances the normalized discounted cumulative gain by 9.72% and 10.24%, respectively. These results indicate that the proposed model exhibits superior recommendation performance.

PAF-Net: Parallel Attention Network for Efficient Sacroiliac Joint Segmentation

YAN Wu-Jun , WANG Jia-Hui , QIU Yu-Ru

2025, 34(1):137-144. DOI: 10.15888/j.cnki.csa.009730 CSTR: 32024.14.csa.009730

Abstract (111) HTML (142) PDF 1.82 M (410) Comment (0) Favorites

Abstract:A lesion of the sacroiliac joint is one of the primary signs for the early warning of ankylosing spondylitis. Accurate and efficient automatic segmentation of the sacroiliac joint is crucial for assisting doctors in clinical diagnosis and treatment. The limitations in feature extraction in sacroiliac joint CT images, due to diverse gray levels, complex backgrounds, and volume effects resulting from the narrow sacroiliac joint gap, hinder the improvement of segmentation accuracy. To address these problems, this study proposes the first U-shaped network for sacroiliac joint segmentation diagnosis, utilizing the concept of hierarchical cascade compensation for downsampling information loss and parallel attention preservation of cross-dimensional information features. Moreover, to enhance the efficiency of clinical diagnosis, the traditional convolutions in the U-shaped network are replaced with efficient partial convolution blocks. The experiment, conducted on a sacroiliac joint CT dataset provided by Shanxi Bethune Hospital, validates the effectiveness of the proposed network in balancing segmentation accuracy and efficiency. The network achieves a DICE value of 91.52% and an IoU of 84.41%. The results indicate that the improved U-shaped segmentation network effectively enhances the accuracy of sacroiliac joint segmentation and reduces the workload of medical professionals.

Integration of BERT and GCN for Automatic Software Requirement Classification

GUAN Hui , GAO Qi , HAN Zhi-Yuan

2025, 34(1):145-152. DOI: 10.15888/j.cnki.csa.009754 CSTR: 32024.14.csa.009754

Abstract (146) HTML (114) PDF 1.50 M (365) Comment (0) Favorites

Abstract:Considering the unique domain-specific information inherent in software requirement texts, as well as the important contextual relationships and inherent ambiguities they contain, this study proposes a model that integrates graph convolutional network (GCN) with BERT for automatic software requirements classification, named BERT-FGCN (BERT-FusionGCN). This model leverages the advantages of GCN in propagating information and aggregating features from neighboring nodes to capture the contextual relationships between words or sentences in requirement statements, thereby improving the classification results. Initially, a text co-occurrence graph and a dependency syntax graph of requirement texts are constructed. These graphs are then fused to capture the structural information of the sentences. The GCN is then employed to perform convolution on the graph structure of the modeled requirement statements to obtain graph vectors. Finally, these graph vectors are fused with the vectors obtained from BERT feature extraction to achieve automatic classification of software requirement texts. Experiments conducted on the PROMISE dataset demonstrate that BERT-FGCN achieves an F1-score of 95% in binary classification, and increases the F1-score by 2% in multi-class classification tasks.

Cross-modal Person Re-identification Based on Improved CLIP-ReID

JIA Jun-Ying , YANG Xin-Ru , YANG Hai-Bo , XU Zhan

2025, 34(1):153-160. DOI: 10.15888/j.cnki.csa.009741 CSTR: 32024.14.csa.009741

Abstract (139) HTML (107) PDF 1.84 M (393) Comment (0) Favorites

Abstract:Narrowing the difference between modalities is always challenging in cross-modal person re-identification from images to texts. To address this challenge, this study proposes an improved method based on contrastive language-image pretraining-person re-identification (CLIP-ReID) by integrating a context adjustment network module and a cross-modal attention mechanism module. The former module performs a deep nonlinear transformation on image features and effectively combines with learnable context vectors to enhance the semantic relevance between images and texts. The latter module dynamically weights and fuses features from images and texts so that the model can take into account the other modality when processing the information of one modality, improving the interaction between different modalities. The method is evaluated on three public datasets. Experimental results show that the mAP on the MSMT17 dataset is increased by 2.2% and R1 is increased by 1.1%. On the Market1501 dataset, there is a 0.5% increase in mAP and a 0.1% rise in R1. The DukeMTMC dataset sees a 0.4% enhancement in mAP and a 1.2% increase in R1. The results show that the proposed method effectively improves the accuracy of person re-identification.

Image Dehazing Algorithm Based on Frequency and Attention Mechanism

WANG Jun , MENG Ru-Jun , CHENG Yong

2025, 34(1):161-170. DOI: 10.15888/j.cnki.csa.009736 CSTR: 32024.14.csa.009736

Abstract (156) HTML (127) PDF 2.67 M (418) Comment (0) Favorites

Abstract:Atmospheric fog and aerosols can significantly reduce visibility and distort colors in images, bringing great difficulties to advanced image recognition. Existing image dehazing algorithms often face problems such as excessive enhancement, loss of details, and insufficient dehazing. To avoid excessive enhancement and insufficient dehazing, this study proposes an image dehazing algorithm based on frequency and attention mechanisms. The algorithm adopts an encoder-decoder structure and constructs a dual-branch frequency extraction module to obtain both global and local high and low-frequency information. A frequency fusion module is then constructed to adjust the weight proportions of the high and low-frequency information. To optimize the dehazing effect, the algorithm introduces an additional channel-pixel module and a channel-pixel attention module during down sampling. Experimental results show that FANet achieves a PSNR of 40.07 dB and an SSIM of 0.9958 on the SOTS-indoor dataset, and a PSNR of 39.77 dB and an SSIM of 0.9958 on the SOTS-outdoor dataset. The proposed algorithm also achieves good results on the HSTS and Haze4k test sets. It effectively alleviates color distortion and incomplete dehazing compared with other dehazing algorithms.

Knowledge Distillation Based on Energy and Entropy Balanced Transfer

SHENG Zi-Qiang , ZHU Zi-Qi

2025, 34(1):171-178. DOI: 10.15888/j.cnki.csa.009719 CSTR: 32024.14.csa.009719

Abstract (140) HTML (123) PDF 1.40 M (321) Comment (0) Favorites

Abstract:The temperature in knowledge distillation (KD) is set as a fixed value during the distillation process in most previous work. However, when the temperature is reexamined, it is found that the fixed temperature restricts inherent knowledge utilization in each sample. This study divides the dataset into low-energy and high-energy samples based on energy scores. Through experiments, it is confirmed that the confidence score of low-energy samples is high, indicating that predictions are deterministic, while the confidence score of high-energy samples is low, indicating that predictions are uncertain. To extract the best knowledge by adjusting non-target class predictions, this study applies higher temperatures to low-energy samples to generate smoother distributions and applies lower temperatures to high-energy samples to obtain clearer distributions. In addition, to address the imbalanced dependence of students on prominent features and their neglect of dark knowledge, this study introduces entropy-reweighted knowledge distillation, which utilizes the entropy predicted by teachers to reweight the energy distillation loss on a sample basis. This method can be easily applied to other logic-based knowledge distillation methods and achieve better performance, which can be closer or even better than feature-based methods. This study conducts extensive experiments on image classification datasets (CIFAR-100, ImageNet) to validate the effectiveness of this method.

Road Damage Detection with Improved YOLOv8

WANG Han-Yi , LI Chun-Biao , SONG Heng

2025, 34(1):179-189. DOI: 10.15888/j.cnki.csa.009737 CSTR: 32024.14.csa.009737

Abstract (602) HTML (121) PDF 3.16 M (516) Comment (0) Favorites

Abstract:This study proposes an algorithm for road damage detection based on an improved YOLOv8 to address challenges in road damage detection, including multi-scale targets, complex target structures, uneven sample distribution, and the impact of hard and easy samples on bounding box regression. The algorithm introduces dynamic snake convolution (DSConv) to replace some of the Conv modules in the original faster implementation of CSP bottleneck with 2 convolutions (C2f) module, aiming to adaptively focus on small and intricate local features, thereby enhancing the perception of geometric structures. By incorporating an efficient multi-scale attention (EMA) module before each detection head, the algorithm achieves cross-dimensional interaction and captures pixel-level relationships, improving its generalization capability for complex global features. Additionally, an extra small object detection layer is added to enhance the precision of small object detection. Finally, a strategy termed Flex-PIoUv2 is proposed, which alleviates sample distribution imbalance and anchor box inflation through linear interval mapping and size-adaptive penalty factors. Experimental results demonstrate that the improved model increases the F1 score, mAP50, and mAP50-95 on the RDD2022 dataset by 1.5%, 2.1%, and 1.2%, respectively. Additionally, results on the GRDDC2020 and China road damage datasets validate the strong generalization of the proposed algorithm.

Few-shot Relational Triple Extraction Based on Module Transfer and Semantic Similarity Inference

LIU Tong , LIU Bing-Xiao , NI Wei-Jian

2025, 34(1):190-199. DOI: 10.15888/j.cnki.csa.009721 CSTR: 32024.14.csa.009721

Abstract (124) HTML (118) PDF 2.42 M (414) Comment (0) Favorites

Abstract:Existing few-shot relational triple extraction methods often struggle with handling multiple triples in a single sentence and fail to consider the semantic similarity between the support set and the query set. To address these issues, this study proposes a few-shot relational triple extraction method based on module transfer and semantic similarity inference. The method uses a mechanism that constantly transfers among three modules, namely relation extraction, entity recognition, and triple discrimination, to extract multiple relational triples efficiently from a query instance. In the relation extraction module, BiLSTM and a self-attention mechanism are integrated to better capture the sequence information of the emergency plan text. In addition, a method based on semantic similarity inference is designed to recognize emergency organizational entities in sentences. Finally, extensive experiments are conducted on ERPs⁺, a dataset for emergency response plans. Experimental results show that the proposed model is more suitable for relational triple extraction in the field of emergency plans compared with other baseline models.

Fast Apple Recognition Based on Lightweight YOLOv8 Model

NIE Zhong-Qiang , ZHU Ming

2025, 34(1):200-210. DOI: 10.15888/j.cnki.csa.009749 CSTR: 32024.14.csa.009749

Abstract (532) HTML (148) PDF 5.88 M (494) Comment (0) Favorites

Abstract:This study proposes a lightweight apple detection algorithm based on an improved YOLOv8n model for apple fruit recognition in natural orchard environments. Firstly, the study uses a combination of DSConv and FEM feature extraction modules to replace some regular convolutions in the backbone network for lightweight improvements. In this way, the floating-point numbers and computational quantity during the convolution process can be reduced. To maintain performance during the lightweight process, a structured state space model is introduced to construct the CBAMamba module, which efficiently processes features through the Mamba structure, during the feature processing procedure. Subsequently, the convolutions at the detecting head are replaced with RepConv and the convolution layer is reduced. Finally, the bounding box loss function is changed to the dynamic non-monotonic focusing mechanism WIoU to accelerate model convergence and further enhance model detection performance. The experiments show that, on the public dataset, the improved YOLOv8 algorithm outperforms the original YOLOv8n algorithm by 1.6% in mAP@0.5 and 1.2% in mAP@0.5:0.95. Meanwhile, it also increases FPS by 8.0% and reduces model parameters by 13.3%. The lightweight design makes it highly practical in robotics and embedded system deployment fields.

Cigarette Laser Code Recognition Based on Dual-state Asymmetric Network

LIANG Shang-Rong , WANG Hui-Qin , MA Qi , WANG Ke , WEN Yu-Dong

2025, 34(1):211-222. DOI: 10.15888/j.cnki.csa.009729 CSTR: 32024.14.csa.009729

Abstract (88) HTML (126) PDF 2.15 M (391) Comment (0) Favorites

Abstract:Cigarette laser code recognition is an important tool for tobacco inspection. This study proposes a method for recognizing cigarette codes based on a dual-state asymmetric network. Insufficient training on samples of distorted cigarette codes leads to the weak generalization ability of the model. To address this issue, a nonlinear local augmentation (NLA) method is designed, which generates effective training samples with distortion to enhance the generalization ability of the model through spatial transformation using controllable datums at the edges of cigarette codes. To address the problem of low recognition accuracy due to the similarity between cigarette codes and their background patterns, a dual-state asymmetric network (DSANet) is proposed, which divides the convolutional layers of the CRNN into training and deployment modes. The training mode enhances the key feature extraction capability of the model by introducing asymmetric convolution for optimizing feature weight distribution. For real-time performance, the deployment mode designs BN fusion and branch fusion methods. By calculating fusion weights and initializing convolutional kernels, convolutional layers are equivalently converted back to their original structures, which reduces user-side inference time. Finally, a self-attention mechanism is introduced into the loop layer to enhance the extraction capability of the model for cigarette code features by dynamically adjusting the weights of sequence features. Comparative experiments show that this method has higher recognition accuracy and speed, with the recognition accuracy reaching 87.34%.

TSEncoder: Fault Classification Based on SAVMD and Multi-source Data Fusion

JI Long-Bing , ZHOU Yu , QIAN Ju

2025, 34(1):223-235. DOI: 10.15888/j.cnki.csa.009750 CSTR: 32024.14.csa.009750

Abstract (88) HTML (120) PDF 2.85 M (468) Comment (0) Favorites

Abstract:Aiming at the problems that mechanical equipment signals in actual operation are susceptible to noise interference, making it difficult to accurately extract fault features, and that the information from a single position of the equipment cannot fully reflect operational status, this study proposes an improved spatio-temporal fault classification method of signal adaptive decomposition and multi-source data fusion. Firstly, an improved signal adaptive decomposition algorithm named signal adaptive variational mode decomposition (SAVMD) is proposed, and a weighted kurtosis sparsity index named weighted kurtosis sparsity (WKS) is constructed to filter out intrinsic mode function (IMF) components rich in feature information for signal reconstruction. Secondly, multi-source data from different position sensors are fused, and the data set obtained by periodic sampling is used as the input of the model. Finally, a spatio-temporal fault classification model is built to process multi-source data, which reduces noise interference through an improved sparse self-attention mechanism and effectively processes time step and spatial channel information by using a dual-encoder mechanism. Experiments on three public mechanical equipment fault datasets achieve average accuracy rates of 99.1%, 98.5%, and 99.4% respectively. Compared with other fault classification methods, it has better performance, good adaptability and robustness, and provides a feasible method for fault diagnosis of mechanical equipment.

Modeling and Solution of Sustainable Online Ride-hailing Scheduling Problem Based on Multi-objective and Dynamic Solution Space Programming

LI Zhen , GUO Yu-Han

2025, 34(1):236-247. DOI: 10.15888/j.cnki.csa.009735 CSTR: 32024.14.csa.009735

Abstract (107) HTML (102) PDF 2.25 M (321) Comment (0) Favorites

Abstract:Considering the balance among economic, environmental, and social benefits in ride-hailing operations, this study proposes a multi-objective schedule model that balances these three benefits, as well as an algorithm based on dynamic space programming. The model integrates traditional taxi services and shared transport for the first time, comprehensively covering four different interaction scenarios between drivers and passengers, to achieve synergistic improvement of the three benefits through optimization strategies. The algorithm creatively combines the LAPJV algorithm and the branch and bound method to ensure that the optimal matching strategy satisfying multi-objective optimization can be efficiently explored and determined under the given threshold constraints. Compared with SCIP, the average error of the algorithm is within 4%, and the average solving speed is improved by 99.1%. This study systematically applies this algorithm to solve and generate Pareto frontier graphs for different threshold constraints, intuitively displaying the trade-offs and changing trends of one of the three objectives (economic, environmental, and social benefits) under the constraints of the other two objectives. This study provides a decision-making basis for ride-hailing operations.

Small Object Detection Based on Layer Aggregation Network and Cross Stage-adaptive Spatial Feature Fusion

YU Long-Kun , ZHAN Qiang-Bo , SHEN Hong , WANG Zi-Hao

2025, 34(1):248-257. DOI: 10.15888/j.cnki.csa.009686 CSTR: 32024.14.csa.009686

Abstract (104) HTML (97) PDF 2.71 M (457) Comment (0) Favorites

Abstract:Traditional object detection algorithms often face challenges such as poor detection performance and low detection efficiency. To address these problems, this study proposes a method for detecting small objects based on an improved YOLOv7 network. This method adds more paths to the efficient layer aggregation module (ELAN) of the original network and effectively integrates the feature information from different paths before introducing the selective kernel network (SKNet). This allows the model to pay more attention to features of different scales in the network and extract more useful information. To enhance the model’s perception of spatial information for small objects, an eSE module is designed and connected to the end of ELAN, thus forming a new efficient layer aggregation network module (EF-ELAN). This module preserves image feature information more completely and improves the generalization ability of the network. Additionally, a cross stage-adaptively spatial feature fusion module (CS-ASFF) is designed to address the issue of inconsistent feature scales in small object detection. This module is improved based on the ASFF network and the Nest connection method. It extracts weights through operations such as convolution and pooling on each image of the feature pyramid, applies the feature information to a specific layer, and utilizes other feature layers to enhance the network’s feature processing capabilities. Experimental results show that the proposed algorithm improves the average precision rate by 1.5% and 2.1% on the DIOR and DOTA datasets, respectively, validating its effectiveness in enhancing the detection performance of small objects.

WSN Coverage Reliability Optimization Based on Confident Information Coverage Model

MEI Si-Man , GAO Peng-Yi , CHEN Kai , LI Sheng-Hui , WU Ya-Huan

2025, 34(1):258-266. DOI: 10.15888/j.cnki.csa.009700 CSTR: 32024.14.csa.009700

Abstract (96) HTML (152) PDF 1.39 M (318) Comment (0) Favorites

Abstract:With the rapid development and application of Artificial Intelligence and the Internet of Things (AIoT), new challenges are posed to the network’s useful life, reliability, and coverage. The current wireless sensor network (WSN) consists of a large number of self-organizing sensor nodes deployed in monitoring areas, exhibiting advantages such as low cost, energy efficiency, self-organization, and large-scale deployment. However, how to further extend the network life and enhance the coverage reliability of wireless sensor networks remains a primary challenge in current research. To address these challenges, a coverage reliability assessment model is proposed by integrating the backbone network with coverage models, collaborative sensing of sensor nodes, and spatial correlation. Subsequently, a coverage reliability optimization algorithm based on the confident information coverage model is proposed. On one hand, the algorithm utilizes the confident information coverage model to ensure collaborative sensing of data, enhancing network service quality. On the other hand, it employs backbone network optimization for routing to conserve energy consumption. Furthermore, to validate the superiority of the proposed algorithm, sensor multi-states, and coverage rate are taken as evaluation metrics, with RMSE threshold and energy consumption as performance indicators. The proposed algorithm is compared with ACR and CICR algorithms. Finally, a verification model is built on Matlab simulation software. Simulation results demonstrate that the proposed algorithm significantly improves coverage reliability.

Visibility Estimation Based on Improved Dark Channel Prior

ZHOU Lei , ZHANG Hao-Rui , WANG Meng-Yuan , XU Xing-Chen , YAN Wei-Ming , HU Bin , ZHAO Dong

2025, 34(1):267-275. DOI: 10.15888/j.cnki.csa.009738 CSTR: 32024.14.csa.009738

Abstract (74) HTML (107) PDF 1.92 M (353) Comment (0) Favorites

Abstract:Existing methods for detecting atmospheric visibility are easily influenced by subjective factors and equipment complexity. To address this issue, this study proposes a new algorithm for estimating atmospheric visibility based on image processing. First, combined with the dark channel prior theory, a method for estimating global atmospheric light values, based on the difference between image brightness and saturation, is introduced to obtain the atmospheric transmittance. Next, curvature filtering is used to refine the transmittance. Then, atmospheric visibility is estimated through the lane line detection technology and the extinction coefficient. Finally, a visibility correction model based on a linear regression equation is established to correct the estimated atmospheric visibility. Experimental results show that the proposed algorithm is accurate and practical for visibility estimation in traffic monitoring scenes in foggy weather.

Adaptation Fine-tuning Based on Singular Value Decomposition

LIN Zhi-Peng , GUO Zheng-Rong , ZHANG Wei-Zhi , GUO Gong-De

2025, 34(1):276-284. DOI: 10.15888/j.cnki.csa.009731 CSTR: 32024.14.csa.009731

Abstract (98) HTML (120) PDF 1.64 M (322) Comment (0) Favorites

Abstract:The rise of large language models has profoundly impacted natural language processing. With the growth of computational resources and the expansion of model sizes, the potential applications of large language models in natural language processing are increasingly evident. However, the widely used low-rank adaptation (LoRA) method faces challenges related to fine-tuning efficiency and storage costs as model sizes increase. To address this issue, this study proposes a singular value decomposition-based adaptation fine-tuning method. This method only requires the diagonal matrix and scaling vector obtained from singular value decomposition to be trainable parameters, achieving performance improvement in multiple natural language processing tasks while reducing training costs. Experimental results show that the proposed method outperforms other methods of the same order of magnitude in GLUE and E2E benchmark tests. Compared with commonly used parameter-efficient fine-tuning methods, it demonstrates significant advantages in reducing the number of trainable parameters and improving fine-tuning efficiency, achieving the highest performance gains in experiments on the fine-tuning efficiency of trainable parameters. Future research will focus on optimizing the proposed method to achieve more efficient fine-tuning in a wider range of tasks and larger-scale models.

Time Series Trend Feature Extraction Based on Angular Key Points and Inflection Points

LIU Bing-Ke , REN Rui-Bin , WANG Xi

2025, 34(1):285-293. DOI: 10.15888/j.cnki.csa.009717 CSTR: 32024.14.csa.009717

Abstract (118) HTML (111) PDF 1.18 M (322) Comment (0) Favorites

Abstract:The piecewise linear representation algorithm of the time series represents the whole series with fewer points according to trend changes in the series. However, most of these algorithms focus on the information of local sequence points and rarely pay attention to global data. Some algorithms only focus on fitting on datasets instead of being applied to classification. To solve these problems, this study proposes an algorithm for extracting trend features from time series based on angle key points and inflection points. The algorithm selects angle key points according to the angle change values of the sequence data and then extracts inflection points based on these key points. It determines whether interpolation is needed according to segmentation requirements, so as to obtain a segmentation sequence meeting the requirements. Fitting and classification experiments are conducted on simulated data and 40 public datasets. Experimental results show that the proposed algorithm exhibits better fitting on the simulated data, compared with other algorithms such as piecewise aggregate approximation (PAA), the TD algorithm, the BU algorithm, the FFTO algorithm based on inflection points, the Trend algorithm based on turning points and trend segments, and the ITTP algorithm based on trend turning points. On the UCR public datasets, the proposed algorithm achieves an average fitting error of 1.165. Its classification accuracy is 2.8% higher than the DTW-1NN algorithm published by Keogh.

Fuzzing for Binary Software Based on Program Analysis

WANG Wen-Ting , SUN Jia-Jun , WAN Yi-Feng , WANG Wen-Jie , TIAN Dong-Hai

2025, 34(1):294-307. DOI: 10.15888/j.cnki.csa.009703 CSTR: 32024.14.csa.009703

Abstract (119) HTML (124) PDF 1.46 M (328) Comment (0) Favorites

Abstract:Existing methods for binary fuzzing are difficult to dive into programs to find vulnerabilities. To address this problem, this study proposes a multi-angle optimization method integrating hardware-assisted program tracing, static analysis, and concolic execution. Firstly, static analysis and hardware-assisted tracing are used to calculate program path complexity and execution probability. Then, seed selection and mutation energy allocation are performed according to the path complexity and execution probability. Meanwhile, concolic execution is leveraged to assist seed generation and record key bytes for targeted variations. Experimental results show that this method finds more program paths as well as crashes in most cases, compared to other fuzzing methods.

WeChat

Mobile website

>Survey

Current Issue

Volume

Issue