AIPUB归智期刊联盟
WANG Yan , CHEN Yan-Yan , LIU Jing-Jing , HU Jin-Yuan
2025, 34(2):1-10. DOI: 10.15888/j.cnki.csa.009780 CSTR: 32024.14.csa.009780
Abstract:Aiming at the existing image dehazing algorithms which still have problems such as incomplete dehazing, blurred edges of dehazed images, and detail information loss, this study presents an image dehazing algorithm based on Transformer and gated fusion mechanism. Global features of the image are extracted by the improved channel self-attention mechanism to improve the efficiency of the model in processing images. A multi-scale gated fusion block is designed to capture features of different scales. The gated fusion mechanism improves the adaptability of the model to different degrees of dehazing by dynamically adjusting weights while better preserving the image edges and detail information. Residual connections are used to enhance the reusability of features and improve the generalization ability of the model. Experimental verification shows that the proposed dehazing algorithm can effectively restore the content information in real hazy images. On the synthesized hazy image dataset SOTS, the peak signal-to-noise ratio reaches 34.841 dB, and the structural similarity reaches 0.984. The dehazed image has complete content information without blurred detail information and incomplete dehazing.
2025, 34(2):11-18. DOI: 10.15888/j.cnki.csa.009773 CSTR: 32024.14.csa.009773
Abstract:In the field of knowledge distillation (KD), feature-based methods can effectively extract the rich knowledge embedded in the teacher model. However, Logit-based methods often face issues such as insufficient knowledge transfer and low efficiency. Decoupled knowledge distillation (DKD) conducts distillation by dividing the Logits output by the teacher and student models into target and non-target classes. While this method improves distillation accuracy, its single-instance-based distillation approach fails to capture the dynamic relationships among samples within a batch. Especially when there are significant differences in the output distributions of the teacher and student models, relying solely on decoupled distillation cannot effectively bridge these differences. To address the issues inherent in DKD, this study proposes a perception reconstruction method. This method introduces a perception matrix. By utilizing the representational capabilities of the model, it recalibrates Logits, meticulously analyzes intra-class dynamic relationships, and reconstructs finer-grained inter-class relationships. Since the objective of the student model is to minimize representational disparity, this method is extended to decoupled knowledge distillation. The outputs of the teacher and student models are mapped onto the perception matrix, enabling the student model to learn richer knowledge from the teacher model. A series of validations on the CIFAR-100 and ImageNet-1K datasets demonstrate that the student model trained with this method achieves a classification accuracy of 74.98% on the CIFAR-100 dataset, which is 0.87 percentage points higher than that of baseline methods, thereby enhancing the image classification performance of the student model. Additionally, comparative experiments with various methods further verify the superiority of this method.
LING Gang , ZHAO Jie , MO Ding-Jie , ZHANG Dong-Qing
2025, 34(2):19-27. DOI: 10.15888/j.cnki.csa.009743 CSTR: 32024.14.csa.009743
Abstract:The lack of lighting and the complex environment in the mine, coupled with the small target size of safety helmets, lead to poor detection performance of safety helmets by general object detection models. To solve these issues, an improved mine safety helmet wearing detection model based on YOLOv8s is proposed. Firstly, the effectiveSE module is combined with the C2f module in the neck network of YOLOv8s to design a new C2f-eSE module, improving the feature extraction ability of the network structure. The CIoU loss function is replaced by the Wise-EIoU loss function to improve the model’s robustness. In addition, the spatial and channel reconstruction convolution (SCConv) module is introduced into the detection head. A new lightweight SPS detection head is designed based on the parameter sharing concept, reducing the number of parameters and computational complexity of the model. Finally, adding a P2 detection layer to the model enables the feature extraction network to incorporate more shallow information and improves the detection ability for small-sized targets. Experimental results show that the mAP50 index of the improved model increases by 3.2%, the number of parameters decreases by 1.6%, and GFLOPs decreases by 5.6%.
TAN Chen-Han , JIA Ke-Bin , WANG Hao-Yu
2025, 34(2):28-36. DOI: 10.15888/j.cnki.csa.009779 CSTR: 32024.14.csa.009779
Abstract:Automatic text summarization is an important branch in the field of natural language processing (NLP), and one of its main difficulties lies in how to evaluate the quality of the generated summaries quickly, objectively, and accurately. Given the problems of low evaluation accuracy, the need for reference texts, and the large consumption of computing resources in the existing text summary quality evaluation methods, this study proposes an evaluation method for the quality of text summaries based on large language models. It designs a prompt construction method based on the principle of the chain of thought (CoT) to improve the performance of large language models in the evaluation of text summary quality. At the same time, a chain of thought data set is generated and a small large language model is trained in the way of model fine-tuning, significantly reducing the computing requirements. The proposed method first determines the evaluation dimension according to the characteristics of the text summary and constructs the prompt based on the principle of chain of thought. The prompt is utilized to guide the large language model to generate the chain of thought process and evaluation results based on the summary samples. Accordingly, a chain of thought data set is generated. The generated chain of thought data set is used to fine-tune and train the small large language model. Finally, the study uses the fine-tuned small-scale large language model to complete the quality evaluation of the text summary. Comparative experiments and analyses on the Summeval dataset show that this evaluation method significantly improves the evaluation accuracy of the small-scale large language model in the task of text summary quality evaluation. The study provides a text summary quality evaluation method, which is a method with high evaluation accuracy, low computing requirements, and easy deployment without reference texts.
XU Fei , ZHAO Qian-Ben , YANG Xue
2025, 34(2):37-48. DOI: 10.15888/j.cnki.csa.009755 CSTR: 32024.14.csa.009755
Abstract:Unmanned aerial vehicle (UAV) is equipped with an edge server to constitute a mobile edge server. It can provide computing services for user equipment (UE) in some scenarios where base stations are difficult to deploy. With the help of deep reinforcement learning to train the intelligent body, it can formulate reasonable offloading decisions in a continuous and complex state space. It can also offload partial computing-intensive missions produced by users to edge servers for execution, thus improving the working and responding time of the system. However, at the moment, the fully connected neural networks used by the deep reinforcement learning algorithm are unable to handle the time-series data in the scenarios of UAV-assisted mobile edge computing (MEC). In addition, the training efficiency of the algorithm is low, and the decision-making performance is poor. To address the above problems, this study proposes a twin delayed deep deterministic policy gradient algorithm based on long short term memory (LSTM-TD3), using LSTM to improve the Actor-Critic network structure of the TD3 algorithm. In this way, the network is divided into three parts: the memory extraction unit containing LSTM, the current feature extraction unit, and the perceptual integration unit. Besides, the sample data in the experience pool are improved, and the historical data are defined, which provides the memory extraction unit with a better training effect. Simulation results show that, compared with the AC algorithm, the DQN algorithm, and the DDPG algorithm, the LSTM-TD3 algorithm has the best performance when optimizing the offloading strategy with the minimum total delay of the system as the target.
FAN Hai-Wei , ZHANG Chao-Liang , NIU Xin-Yang , WAN Qing-Song , DENG Yu-Lian
2025, 34(2):49-60. DOI: 10.15888/j.cnki.csa.009753 CSTR: 32024.14.csa.009753
Abstract:Traditional algorithms for knowledge-aware propagation recommendation face challenges including low correlation of higher-order features, unbalanced information utilization, and noise introduction. To address these challenges, this study proposes a multi-level contrastive learning for knowledge-aware propagation recommender algorithm utilizing knowledge enhancement (MCLK-KE). By constructing enhanced views and utilizing mask reconstruction-based self-supervised pre-training, the algorithm extracts deeper information from key triples to effectively suppress noise signals. It achieves a balanced utilization of knowledge and interactive signals while enhancing feature representation by comparing graphs to capture effective node attributes globally. Multi-task training significantly improves model performance by incorporating recommendation prediction, contrastive learning, and mask reconstruction tasks. In tests on three publicly available datasets, MCLK-KE demonstrates a maximum increase of 3.3% in AUC and 5.3% in F1 scores compared to the best baseline model.
PENG Bo , WANG Xiao-Bo , WEI Xiang-Lin , CHENG Jie , QIN Hua-Wang , FAN Jian-Hua
2025, 34(2):61-73. DOI: 10.15888/j.cnki.csa.009763 CSTR: 32024.14.csa.009763
Abstract:In complex terrain conditions, UAV formation path planning based on deep reinforcement learning can optimize the path of UAV formation, with better path length and environmental adaptability than traditional heuristic algorithms. However, it still has problems such as insufficient training stability and poor real-time planning. For UAV clusters with a leader-follower mode, this study proposes a real-time 3D path planning method for UAV formation based on the SPER-TD3 algorithm. Firstly, the prioritized experience replay mechanism based on SumTree is integrated into the TD3 algorithm, and the SPER-TD3 algorithm is designed to determine the path of the UAV formation. Then, an angle formation control method is used to optimize the path of the followers, and a dynamic path smoothing algorithm is applied to optimize the steering angle. To accelerate the training convergence speed and stability of the SPER-TD3 algorithm, and solve the long-term dependence problem, a network model structure combining LSTM, self-attention mechanism, and multiple perceptrons is designed. Simulation experiments are conducted in environments with various obstacles. Results show that the method mentioned above is superior to eight mainstream deep reinforcement learning algorithms in terms of path safety coverage rate, flight path smoothness, success rate, and reward size. Its comprehensive evaluation value of importance is 8.5% to 72.9% higher than existing methods, and it has the best training stability.
LIU Yao , CHEN Dong-Fang , WANG Xiao-Feng
2025, 34(2):74-83. DOI: 10.15888/j.cnki.csa.009765 CSTR: 32024.14.csa.009765
Abstract:Transformer-based object detection algorithms often suffer from problems such as insufficient accuracy and slow convergence. Although many studies have proposed improvements to address these problems and have achieved certain outcomes, most of them overlook two key shortcomings when applying Transformer structure to the field of object detection. Firstly, self-attention computation results are not diversified. Secondly, due to the complexity of set prediction, the models are unstable during target matching. To overcome these deficiencies, this study proposes several enhancements. Firstly, an adaptive token pooling module is designed to increase self-attention weight diversity. Secondly, a rough-prediction-based anchor box localization module is introduced, which provides positional prior information for queries to enhance stability during bipartite matching. Lastly, a group-based denoising task is designed, which trains the model to distinguish between positive and negative queries near the target, thereby improving the model’s ability to perform set prediction. Experimental results show that the proposed improved algorithm achieves better training results on the COCO dataset. Compared with the baseline model, the improved algorithm significantly outperforms in both detection accuracy and convergence speed.
ZHOU Di , LIU Hao , CHENG Yuan-Zhi , LI Hui , LIU Xiao-Ya
2025, 34(2):84-91. DOI: 10.15888/j.cnki.csa.009768 CSTR: 32024.14.csa.009768
Abstract:In spectral 3D CT data, the traditional convolution has a poor ability to capture global features, and the full-scale self-attention mechanism consumes large resources. To solve this problem, this study introduces a new visual attention paradigm, the wave self-attention (WSA). Compared with the ViT technology, this mechanism uses fewer resources to obtain the same amount of self-attention information. In addition, to more adequately extract the relative dependency among organs and to improve the robustness and execution speed of the model, a plug-and-play module, the wave random-encoder (WRE), is designed for the WSA mechanism. The encoder is capable of generating a pair of mutually inverse asymmetric global (local) position information matrices. The global position matrix is used to globally conduct random sampling of the wave features, and the local position matrix is used to complement the local relative dependency lost due to random sampling. In this study, experiments are performed on the task of segmenting the kidney and lung parenchyma in the standard datasets Synapse and COVID-19. The results show that this method outperforms existing models such as nnFormer and Swin-UNETR in terms of accuracy, the number of parameters, and inference rate, arriving at the SOTA level.
YANG Wen-Hao , KUANG Li-Qun , WANG Song , ZHANG Jue
2025, 34(2):92-101. DOI: 10.15888/j.cnki.csa.009762 CSTR: 32024.14.csa.009762
Abstract:It is a significant challenge for high-precision 3D object detection for autonomous vehicles equipped with multiple sensors in the dusty wilderness. The variable wilderness terrain aggravates the regional feature differences of detected objects. Additionally, dust particles can blur the object features. To address these issues, this study proposes a 3D object detection method based on multi-modal feature dynamic fusion and constructs a multi-level feature self-adaptive fusion module and a feature alignment augmentation module. The former module dynamically adjusts the model’s attention to global-level features and regional-level features, leveraging multi-level receptive fields to reduce the impact of regional variances on recognition performance. The latter module bolsters the feature representation of regions of interest before multi-modal feature alignment, effectively suppressing interference factors such as dust. Experimental results show that compared with the average precision of the baseline, that of this approach is improved by 2.79% in the self-built wilderness dataset and by 1.7% in the hard-level test of the KITTI dataset. This shows our method has good robustness and precision.
ZHANG Yu , ZHANG Wen-Tian , ZHANG Wei-Xi , SHANG Ying
2025, 34(2):102-110. DOI: 10.15888/j.cnki.csa.009757 CSTR: 32024.14.csa.009757
Abstract:The uncertain execution order of asynchronous messages in Android applications is the main reason for their flakiness. Most existing flaky test studies trigger instability testing by randomly determining the execution order of asynchronous messages, which is ineffective and inefficient. This study proposes a concurrent flaky test detection based on the happens-before (HB) relationship for Android applications. After analyzing the HB relationship between asynchronous messages in the execution trace of Android application test cases, the proposed method determines the asynchronous message workscope. Then, it designs a scheduling strategy with maximum differentiation to determine the asynchronous message execution order under guidance to maximize the difference between the asynchronous message execution order and the original test execution trace on the test execution trace after scheduling. Then, the method tries to change test execution results to detect flakiness in the test. For effectiveness verification of the method, experiments are conducted on 50 test cases of 40 Android applications, and the experimental results show that the method can detect all the flaky tests, improving the detection effect by 6% and shortening the average detection time by 31.78% compared with the current state-of-the-art techniques.
2025, 34(2):111-121. DOI: 10.15888/j.cnki.csa.009761 CSTR: 32024.14.csa.009761
Abstract:Distributed storage systems achieve high-reliability and low-overhead data storage by erasure code. To provide different reliability and access performance, storage systems need to perform redundancy transitions on erasure code data by changing coding parameters. The stripe merging mechanism provides a way for redundancy transitioning in storage systems. However, the stripe merging process based on traditional erasure code can result in a large amount of data block redistribution and checksum block re-computation I/O overhead. Worst still, the I/O will be amplified in multiple merging operations. In response to these issues, this study proposes new Tree Reed-Solomon (TRS) codes that eliminate data block redistribution I/O by decentralizing data blocks, and save checksum block re-computation I/O by designing coding matrices. TRS codes further design storage units to organize the stripes taking part in merging into a tree, enabling multiple merging operations to be efficiently completed from bottom to top based on tree structure. To test the performance of TRS codes, this study designs and implements a distributed storage prototype. Experiments have shown that compared to other erasure codes, TRS codes can greatly reduce stripe merging operation time.
LYU Ming-Hai , WANG Yu-Bo , LYU Fu , FENG Yong-An
2025, 34(2):122-134. DOI: 10.15888/j.cnki.csa.009751 CSTR: 32024.14.csa.009751
Abstract:The YOLOv8n algorithm exhibits suboptimal performance when dealing with complex backgrounds, dense targets, and small-sized objects with limited pixel information, leading to reduced precision, missed detection, and misclassification. To address these issues, this study proposes an algorithm, LNCE-YOLOv8n, for safety equipment detection. This algorithm includes a linear multi-scale fusion attention (LMSFA) mechanism, which adaptively focuses on key features to improve the extraction of information from small targets while reducing computational loads. An architecture called C2f_New networks (C2f_NewNet) is also introduced, which maintains high performance and reduces depth through an effective parallelization design. Combined with a lightweight universal up-sampling operator, content-aware reassembly of features (CARAFE), the proposed algorithm realizes efficient cross-scale feature fusion and propagation and aggregates contextual information within a large receptive field. Based on the SIoU (symmetric intersection over union) loss function, this study proposed enhanced SIoU (ESIoU) to improve the adaptability and accuracy of the model in complex environments. Tested on a safety equipment dataset, LNCE-YOLOv8n outperforms YOLOv8n, exhibiting a 5.1% increase in accuracy, a 2.7% rise in mAP50, and a 3.4% boost in mAP50-95, significantly enhancing the detection accuracy of safety equipment for workers in complex construction conditions.
PENG Jun-Feng , YU Kai , LI Guo-Jing
2025, 34(2):135-144. DOI: 10.15888/j.cnki.csa.009764 CSTR: 32024.14.csa.009764
Abstract:Key sentence extraction technology refers to using artificial intelligence to automatically find key sentences from a long text. This technology can be used for preprocessing information retrieval and is of great significance for downstream tasks such as text classification and extractive summarization. Traditional unsupervised key sentence extraction technologies are mostly based on statistics and graphical model methods, which have problems such as low accuracy and the need to build a large-scale corpus in advance. This study proposes T5KSEChinese, a method that can extract key sentences without supervision in the Chinese context. This method uses an encoder-decoder architecture to ignore the mismatch in length between the target sentence and the original text by inputting and outputting prompt words to obtain more accurate results. At the same time, a contrastive learning positive sample construction method is also proposed and combined with contrastive learning to conduct semi-supervised training on the encoder part of the model, which can improve the performance of downstream tasks. The method uses lightweight models to outperform the large language model with tens of times the number of parameters in the unsupervised downstream task. The final experimental results prove the accuracy and reliability of the proposed method.
ZHENG Guang-Hai , ZHANG Hai-Ning , QU Ying-Wei
2025, 34(2):145-153. DOI: 10.15888/j.cnki.csa.009778 CSTR: 32024.14.csa.009778
Abstract:Aiming at degraded and blurred images captured under harsh weather conditions such as haze, rain, and snow, which make accurate recognition and detection challenging, this study proposes a pedestrian and vehicle detection algorithm, lightweight blur vision network (LiteBlurVisionNet), for blurred scenes. In the backbone network, the global context enhancer attention-improved lightweight MobileNetV3 module is used, reducing the number of parameters and making the model more efficient in image processing under harsh weather conditions such as haze and rain. The neck network adopts a lighter Ghost module and the spectral ghost unit module improved from the Ghost bottleneck module. These modules can more effectively capture global context information, improve the discrimination and expressive ability of features, help reduce the number of parameters and computational complexity, and thereby improve the network’s processing speed and efficiency. In the prediction part, DIoU NMS based on the non-maximum suppression method is used for maximum local search to remove redundant detection boxes and improve the accuracy of the detection algorithm in blurred scenes. Experimental results show that the parameter count of the LiteBlurVisionNet algorithm model is reduced by 96.8% compared to the RTDETR-ResNet50 algorithm model, and by 55.5% compared to the YOLOv8n algorithm model. The computational load of the LiteBlurVisionNet algorithm model is reduced by 99.9% compared to the Faster R-CNN algorithm model and by 57% compared to the YOLOv8n algorithm model. The mAP0.5 of the LiteBlurVisionNet algorithm model is improved by 13.71% compared to the IAL-YOLO algorithm model and by 2.4% compared to the YOLOv5s algorithm model. This means the model is more efficient in terms of storage and computation and is particularly suitable for resource-constrained environments or mobile devices.
YAN Bo-Wen , LIU Yong-Ze , XIA Hai-Dong , SONG Xiao-Qiang
2025, 34(2):154-164. DOI: 10.15888/j.cnki.csa.009767 CSTR: 32024.14.csa.009767
Abstract:Cartoon character face detection is more challenging than face detection because it involves many difficult scenarios. Given the huge differences between different cartoon characters’ faces, this study proposes a cartoon character face detection algorithm, named YOLOv8-DEL. Firstly, the DBBNCSPELAN module is designed based on GELAN fusion BDD to reduce model size and enhance detection performance. Next, a multi-scale attention mechanism called ELA is introduced to improve the SPPF structure and enhance the feature extraction ability of the backbone model. Finally, a new detection head for shared convolution is designed to make the network lighter. At the same time, the original CIoU loss function is replaced by Shape-IoU to improve the convergence efficiency of the model. Experiments are carried out on the iCartoonFace dataset, and ablation experiments are carried out to verify the proposed model. Besides, the proposed model is compared with the YOLOv3-tiny, YOLOv5n, and YOLOv6 models. The mAP of the improved model YOLO-DEL reaches 90.3%, 1.2% higher than that of YOLOv8. The parameters amount is 1.69M, 47% lower than that of YOLOv8. The GFLOPs value is 44% lower than that of YOLOv8. Experimental results show that the proposed method effectively improves cartoon character face detection precision while compressing the network model’s size. Thus, the proposed method has proved to be effective.
2025, 34(2):165-173. DOI: 10.15888/j.cnki.csa.009774 CSTR: 32024.14.csa.009774
Abstract:In the contemporary field of unsupervised deep hashing research, methods predicated on contrastive learning are predominant. However, sampling bias brought about by the random extraction of negative samples in contrastive learning deteriorates image retrieval accuracy. To address the issue, this study proposes a novel unsupervised deep hashing based on bias suppressing contrastive learning (BSCDH). It proposes a bias suppression method (BSS) based on a contrastive learning framework. This method approximates incorrect negative samples as extremely hard negative samples and designs a bias suppression coefficient to suppress these extremely hard negative samples, thereby alleviating the negative impact of sampling bias. The corresponding suppression coefficient value is determined based on the similarity between the current negative sample and the query sample. Distance relationship between the current negative sample and adjacent hash centers is introduced to correct the suppression coefficient value, reducing the possibility of excessive suppression of normal negative samples. Ultimately, the mAP@5000 of the BSCDH method (64 bits) achieves 0.696, 0.833, and 0.819 respectively on the CIFAR-10, FLICKR25K, and NUS-WIDE datasets, demonstrating a significant performance advantage over the baseline. Extensive experiments conducted in this paper verify that BSCDH exhibits high retrieval accuracy in unsupervised image retrieval methods and can effectively address sampling bias.
ZHI Yuan , LEI Hai-Wei , ZHANG Bin-Long
2025, 34(2):174-182. DOI: 10.15888/j.cnki.csa.009771 CSTR: 32024.14.csa.009771
Abstract:There are two problems in existing hierarchical text classification model: underutilization of the label information across hierarchical instances, and lack of handling unbalanced label distribution. To solve these problems, this study proposes a hierarchical text classification method for label co-occurrence and long-tail distribution (LC-LTD) to study the global semantic of text based on shared labels and balanced loss function for long-tail distribution. First, a contrastive learning objective based on shared labels is devised to narrow the semantic distance between text representations with more shared labels in feature space and to guide the model to generate discriminative semantic representations. Second, the distribution balanced loss function is introduced to replace binary cross-entropy loss to alleviate the long-tail distribution problem inherent in hierarchical classification, improving the generalization ability of the model. LC-LTD is compared with various mainstream models on WOS and BGC public datasets, and the results show that the proposed method achieves better classification performance and is more suitable for hierarchical text classification.
2025, 34(2):183-194. DOI: 10.15888/j.cnki.csa.009772 CSTR: 32024.14.csa.009772
Abstract:Image steganalysis aims to detect whether an image undergoes steganography processing and thus carries secret information. Steganalysis algorithm based on Siamese networks determines whether an image carries secret information by calculating the dissimilarity between the left and right partitions of the image to be detected. This approach currently boasts relatively high accuracy among deep learning image steganalysis algorithms. However, Siamese network-based image steganalysis algorithms still have certain limitations. First, the convolutional blocks stacked in the preprocessing and feature extraction layers of the Siamese network overlook the issue of steganographic signals easily being lost as they are transmitted from shallow to deep layers. Second, SRM filters used in existing Siamese networks still employ high-pass filters from other networks to suppress image content, ignoring single-sized generated residual maps. To address the above problems, this study proposes a Siamese network image steganalysis method based on enhanced residual features. The proposed method designs an attention-based inverted residual module. By adding the attention-based inverted residual module after the convolutional blocks in the preprocessing and feature extraction layers, it reuses image features, introduces an attention mechanism, and enables the network to assign more weights to feature maps of complex-textured image regions. Meanwhile, to better suppress image content, a multi-scale filter is proposed, adjusting the residual types to operate with convolutional kernels of different sizes, thereby enriching residual features. Experimental results show that the proposed attention-based inverted residual module and multi-scale filter provide better classification performance compared to existing methods.
LI Xin-Ya , HE Xing-Xing , REN Rui-Bin
2025, 34(2):195-205. DOI: 10.15888/j.cnki.csa.009770 CSTR: 32024.14.csa.009770
Abstract:The density peaks clustering (DPC) algorithm achieves clustering by identifying cluster centers based on local density and relative distance. However, it tends to overlook cluster centers in low-density regions for data with uneven density distribution and unbalanced cluster sizes. Therefore, the number of clusters needs to be set artificially. Besides, if a data point allocation occurs to be wrong in the whole strategy, it will lead to incorrect allocation of subsequent points. To address these issues, this study proposes an adaptive sparse-aware density peaks clustering algorithm. Firstly, fuzzy points are introduced to minimize their impact on the subcluster merging process. Secondly, the subtractive clustering method is used to identify the low-density regions’ center. Then, noise is identified and subcluster centers are updated based on new local density and reverse nearest neighbor. Finally, a redefined global overlap metric combined with global separability guides subcluster merging while automatically determining clustering results using these metrics. Experimental results demonstrate that compared to DPC and its improved algorithms, the proposed algorithm effectively identifies sparse clusters in both synthetic and UCI datasets while reducing chain reactions caused by non-center assignments. Also, the proposed algorithm can automatically determine the optimal clustering number, ultimately yielding more accurate clustering results.
WENG Hui-Min , GUO Gong-De , LIN Shi-Shui
2025, 34(2):206-215. DOI: 10.15888/j.cnki.csa.009769 CSTR: 32024.14.csa.009769
Abstract:Most of the existing knowledge graph link prediction methods focus only on the semantic relationships between a head entity h, a relationship r, and a tail entity t in a single triad in learning semantic information. They do not consider the links between related entities and entity relationships in different triads. To address this problem, this study proposes the DeepE_CL model. Firstly, the study uses the DeepE model to learn the semantic information of related triads and entities with the same entity relationship pairs or entity relationship pairs with the same entities. Secondly, the extracted semantic information of the related triads is used to calculate the corresponding scoring function and cross-entropy loss, and the extracted semantic information of entities with the same entity relationship pairs or entity relationship pairs with the same entities is optimized through the comparative learning model, so as to predict the missing information of the related triads. This paper validates the proposed method through four common datasets and compares the proposed method with other baseline models by applying four evaluation indicators, including MR, MRR, Hit@1, and Hit@10. The experimental results show that the DeepE_CL model achieves the best results in all indicators. To further validate the usefulness of the model, this study also applies the model to a real traditional Chinese medicine (TCM) dataset, and the experimental results show that compared with the DeepE model, the DeepE_CL model reduces the MR indicators by 18, and improves the MRR, Hit@1 indicators by 0.8%, 1.1%, and the Hit@10 indicators remain unchanged. The experiments demonstrate that the DeepE_CL model, introducing a comparative learning model, is very effective in improving the performance of knowledge graph link prediction.
YUN Kai , JIA Rong-Hao , WEI Guo-Hui , ZHAO Shuang , LI Xue-Hui , MA Zhi-Qing
2025, 34(2):216-224. DOI: 10.15888/j.cnki.csa.009752 CSTR: 32024.14.csa.009752
Abstract:Pneumonia is a prevalent respiratory disease for which early diagnosis is crucial to effective treatment. This study proposes a hybrid model, CTFNet, which combines convolutional neural network (CNN) and Transformer to aid in the effective and accurate diagnosis of pneumonia. The model integrates a convolutional tokenizer and a focused linear attention mechanism. The convolutional tokenizer performs more compact feature extraction through convolution operations, retaining key local features of images while reducing computational complexity to enhance model expressiveness. The focused linear attention mechanism reduces the computational demands of the Transformer and optimizes the attention framework, significantly improving model performance. On the Chest X-ray Images dataset, CTFNet demonstrates outstanding performance in pneumonia classification tasks, achieving an accuracy of 99.32%, a precision of 99.55%, a recall of 99.55%, and an F1-score of 99.55%. The impressive performance highlights the model’s potential for clinical applications. The model is evaluated on the COVID-19 Radiography Database dataset for its generalization ability. In this dataset, CTFNet achieves an accuracy above 98% in multiple binary classification tasks. These results indicate that CTFNet exhibits strong generalization ability and reliability across various tasks in pneumonia image classification.
WANG Jun , CHEN Ying-Ying , CHENG Yong
2025, 34(2):225-236. DOI: 10.15888/j.cnki.csa.009747 CSTR: 32024.14.csa.009747
Abstract:Existing super-resolution reconstruction methods based on convolutional neural networks are limited by their receptive fields, which makes it difficult to fully utilize the rich contextual information and auto-correlation in remote sensing images, resulting in suboptimal reconstruction performance. To address this issue, this study proposes a novel network, termed MDT, a remote sensing image super-resolution rebuilding method based on multi-distillation and Transformer. Firstly, the network combines multiple distillations with a dual attention mechanism to progressively extract multi-scale features from low-resolution images, thereby reducing feature loss. Next, a convolutional modulation-based Transformer is constructed to capture global information in the images, recovering more complex texture details and enhancing the visual quality of the reconstructed images. Finally, a global residual path is added during upsampling to improve the propagation efficiency of features within the network, effectively reducing image distortion and artifacts. Experiments conducted on the AID and UCMerced datasets demonstrate that the proposed method achieves a peak signal-to-noise ratio (PSNR) and a peak structural similarity index (SSIM) of 29.10 dB and 0.7807, respectively, on ×4 super-resolution tasks. The quality of the reconstructed images is significantly improved, with better visual effects in terms of detail preservation.
LI Ming-Wei , CHEN Hao-Peng , LI Feng-Huan , CHEN Chen
2025, 34(2):237-245. DOI: 10.15888/j.cnki.csa.009766 CSTR: 32024.14.csa.009766
Abstract:Since existing work on the task of fake news detection frequently ignores the semantic sparsity of news text and the potential relationships between rich information, which limits the model’s capacity to understand and recognize fake news, this study proposes a fake news detection method based on heterogeneous subgraph attention networks. Heterogeneous graphs are constructed to model the abundant features of fake news, such as text, party affiliation, and topic of news samples. The heterogeneous graph attention network is constructed at the feature layer to capture the correlations between different types of information, and a subgraph attention network is constructed at the sample layer to mine the interactions between news samples. Moreover, the mutual information mechanism based on self-supervised contrastive learning focuses on discriminative subgraph representations within the global graph structure to capture the specificity of news samples. Experimental results demonstrate that the method proposed in this study achieves about 9% and 12% improvement in accuracy and F1 score, respectively, compared with existing methods on the Liar dataset, which significantly improves the performance of fake news detection.
LIU Jia-Hui , GUAN Jing-Chao , FANG Hong-Qing , CHAO Jian-Shu
2025, 34(2):246-253. DOI: 10.15888/j.cnki.csa.009758 CSTR: 32024.14.csa.009758
Abstract:In autonomous driving, the task of using bird’s eye view (BEV) for 3D object detection has attracted significant attention. Existing camera-to-BEV transformation methods are facing challenges of insufficient real-time performance and high deployment complexity. To address these issues, this study proposes a simple and efficient view transformation method that can be deployed without any special engineering operations. First, to address the redundancy in complete image features, a width feature extractor is introduced and supplemented by a monocular 3D detection task to refine the key features of the image. In this way, the minimal information loss in the process can be ensured. Second, a feature-guided polar coordinate positional encoding method is proposed to enhance the mapping relationship between the camera view and the BEV representation, as well as the spatial understanding of the model. Lastly, the study has achieved the interaction between learnable BEV embeddings and width image features through a single-layer cross-attention mechanism, thus generating high-quality BEV features. Experimental results show that, compared to lift, splat, shoot (LSS), on the nuScenes validation set, this network structure improves mAP from 29.5% to 32.0%, an increase of 8.5%, and NDS from 37.1% to 38.0%, an increase of 2.4%. This demonstrates the effectiveness of the model in 3D object detection tasks in autonomous driving scenarios. Additionally, compared to LSS, it reduces latency by 41.12%.
ZHANG Qiang , CHEN Cheng , LI Qing , XUE Bing
2025, 34(2):254-263. DOI: 10.15888/j.cnki.csa.009781 CSTR: 32024.14.csa.009781
Abstract:Given the insufficient adaptability of existing polymer dosage splitting algorithms when dealing with well groups in different blocks, this study proposes a polymer flooding well group splitting method based on an improved bald eagle search algorithm. Firstly, the preliminary splitting coefficients are obtained through grey correlation analysis. Then, the difference between the cumulative injection volume and the actual fluid production volume of each extraction well is calculated, and a reasonable threshold range and constraint conditions are set. Secondly, the bald eagle search algorithm is improved by introducing Sobol sequence and ICMIC mapping, golden sine Lévy flight guidance mechanism, nonlinear convergence factor, and adaptive inertia weighting strategy, which enhances the algorithm’s searching capability and convergence accuracy. Finally, the improved bald eagle search algorithm is used to solve the optimization model of well group splitting coefficients in the actual block of an oilfield. The results show that the calculated splitting injection volume has a high degree of agreement with the actual fluid production volume and has good splitting accuracy.
LI Gui-Yong , LIAO Fu-Jian , TIAN Xu
2025, 34(2):264-271. DOI: 10.15888/j.cnki.csa.009748 CSTR: 32024.14.csa.009748
Abstract:In computation-intensive and latency-sensitive tasks, unmanned aerial vehicle (UAV)-assisted mobile edge computing has been extensively studied due to its high mobility and low deployment costs. However, the energy consumption of UAVs limits their ability to work for extended periods, and there are often dependencies among different modules within offloading tasks. To address these issues, directed acyclic graph (DAG) is utilized to model the dependencies among internal modules of tasks. Considering the impacts of system latency and energy consumption, an optimal offloading strategy is derived to minimize system costs. To achieve optimization, a binary grey wolf optimization algorithm based on subpopulation, Gaussian mutation, and reverse learning (BGWOSGR) is proposed. Simulation results show that the proposed algorithm reduces system costs by around 19%, 27%, 16%, and 13% compared to four other methods, with a faster convergence speed.
CHEN Peng-Yun , FANG Min-Ying , HOU Xiao-Bo , DU Jun-Wei
2025, 34(2):272-280. DOI: 10.15888/j.cnki.csa.009746 CSTR: 32024.14.csa.009746
Abstract:This study proposes an analysis method based on association mining between historical accident reports and a root cause index system to fully leverage experts’ experience in root cause analysis of past accidents and enhance the accuracy and comprehensiveness of such analysis, thereby reducing chemical safety incidents. By constructing an association matrix between accident reports and the index system, this method utilizes a pre-trained model to represent accident and index texts. It integrates secondary and tertiary index information based on an attention mechanism and finally employs a graph convolutional neural network for root cause analysis. Validation on a dataset of 1351 samples demonstrates that this method significantly improves the accuracy of root cause prediction, effectively utilizing expert analysis of historical accidents to analyze current accidents and uncover the limitations in previous accident analysis. Additionally, this method accurately identifies the root causes of accidents even with incomplete incident descriptions. The application of this method will enhance accident prevention and risk management in occupational safety.
LUO Wei , CHEN Shi-Jun , WU Hua-Wei
2025, 34(2):281-291. DOI: 10.15888/j.cnki.csa.009739 CSTR: 32024.14.csa.009739
Abstract:To solve the vehicle routing problem with time windows (VRPTW), this study establishes a mixed-integer programming model aimed at minimizing total distance and proposes a hybrid ant colony optimization algorithm with relaxed time window constraints. Firstly, an improved ant colony algorithm, combined with TSP-Split encoding and decoding, is proposed to construct a routing solution that allows time-window constraints to be violated, to improve the global optimization ability of the algorithm. Then, a repair strategy based on variable neighborhood search is proposed to repair infeasible solutions using the principle of return in time and the penalty function method. Finally, 56 Solomon and 12 Homberger benchmark instances are tested. The results show that the proposed algorithm is superior to the comparative algorithms from references. The known optimal solution can be obtained in 50 instances, and quasi-optimal solutions can be obtained in the remaining instances within acceptable computing time. The results prove the effectiveness of the proposed algorithm.