LING Gang , ZHAO Jie , MO Ding-Jie , ZHANG Dong-Qing
Online: December 19,2024 DOI: 10.15888/j.cnki.csa.009743
Abstract:The lack of lighting and the complex environment in the mine, coupled with the small target size of safety helmets, lead to poor detection performance of safety helmets by general object detection models. To solve these issues, an improved mine safety helmet wearing detection model based on YOLOv8s is proposed. Firstly, the effectiveSE module is combined with the C2f module in the neck network of YOLOv8s to design a new C2f-eSE module, improving the feature extraction ability of the network structure. The CIoU loss function is replaced by the Wise-EIoU loss function to improve the model’s robustness. In addition, the spatial and channel reconstruction convolution (SCConv) module is introduced into the detection head. A new lightweight SPS detection head is designed based on the parameter sharing concept, reducing the number of parameters and computational complexity of the model. Finally, adding a P2 detection layer to the model enables the feature extraction network to incorporate more shallow information and improves the detection ability for small-sized targets. Experimental results show that the mAP50 index of the improved model increases by 3.2%, the number of parameters decreases by 1.6%, and GFLOPs decreases by 5.6%.
PENG Bo , WANG Xiao-Bo , WEI Xiang-Lin , CHENG Jie , QIN Hua-Wang , FAN Jian-Hua
Online: December 19,2024 DOI: 10.15888/j.cnki.csa.009763
Abstract:In complex terrain conditions, UAV formation path planning based on deep reinforcement learning can optimize the path of UAV formation, with better path length and environmental adaptability than traditional heuristic algorithms. However, it still has problems such as insufficient training stability and poor real-time planning. For UAV clusters with a leader-follower mode, this study proposes a real-time 3D path planning method for UAV formation based on the SPER-TD3 algorithm. Firstly, the prioritized experience replay mechanism based on SumTree is integrated into the TD3 algorithm, and the SPER-TD3 algorithm is designed to determine the path of the UAV formation. Then, an angle formation control method is used to optimize the path of the followers, and a dynamic path smoothing algorithm is applied to optimize the steering angle. To accelerate the training convergence speed and stability of the SPER-TD3 algorithm, and solve the long-term dependence problem, a network model structure combining LSTM, self-attention mechanism, and multiple perceptrons is designed. Simulation experiments are conducted in environments with various obstacles. Results show that the method mentioned above is superior to eight mainstream deep reinforcement learning algorithms in terms of path safety coverage rate, flight path smoothness, success rate, and reward size. Its comprehensive evaluation value of importance is 8.5% to 72.9% higher than existing methods, and it has the best training stability.
PENG Jun-Feng , YU Kai , LI Guo-Jing
Online: December 19,2024 DOI: 10.15888/j.cnki.csa.009764
Abstract:Key sentence extraction technology refers to using artificial intelligence to automatically find key sentences from a long text. This technology can be used for preprocessing information retrieval and is of great significance for downstream tasks such as text classification and extractive summarization. Traditional unsupervised key sentence extraction technologies are mostly based on statistics and graphical model methods, which have problems such as low accuracy and the need to build a large-scale corpus in advance. This study proposes T5KSEChinese, a method that can extract key sentences without supervision in the Chinese context. This method uses an encoder-decoder architecture to ignore the mismatch in length between the target sentence and the original text by inputting and outputting prompt words to obtain more accurate results. At the same time, a contrastive learning positive sample construction method is also proposed and combined with contrastive learning to conduct semi-supervised training on the encoder part of the model, which can improve the performance of downstream tasks. The method uses lightweight models to outperform the large language model with tens of times the number of parameters in the unsupervised downstream task. The final experimental results prove the accuracy and reliability of the proposed method.
LI Ming-Wei , CHEN Hao-Peng , LI Feng-Huan , CHEN Chen
Online: December 19,2024 DOI: 10.15888/j.cnki.csa.009766
Abstract:Since existing work on the task of fake news detection frequently ignores the semantic sparsity of news text and the potential relationships between rich information, which limits the model’s capacity to understand and recognize fake news, this study proposes a fake news detection method based on heterogeneous subgraph attention networks. Heterogeneous graphs are constructed to model the abundant features of fake news, such as text, party affiliation, and topic of news samples. The heterogeneous graph attention network is constructed at the feature layer to capture the correlations between different types of information, and a subgraph attention network is constructed at the sample layer to mine the interactions between news samples. Moreover, the mutual information mechanism based on self-supervised contrastive learning focuses on discriminative subgraph representations within the global graph structure to capture the specificity of news samples. Experimental results demonstrate that the method proposed in this study achieves about 9% and 12% improvement in accuracy and F1 score, respectively, compared with existing methods on the Liar dataset, which significantly improves the performance of fake news detection.
ZHOU Di , LIU Hao , CHENG Yuan-Zhi , LI Hui , LIU Xiao-Ya
Online: December 19,2024 DOI: 10.15888/j.cnki.csa.009768
Abstract:In spectral 3D CT data, the traditional convolution has a poor ability to capture global features, and the full-scale self-attention mechanism consumes large resources. To solve this problem, this study introduces a new visual attention paradigm, the wave self-attention (WSA). Compared with the ViT technology, this mechanism uses fewer resources to obtain the same amount of self-attention information. In addition, to more adequately extract the relative dependency among organs and to improve the robustness and execution speed of the model, a plug-and-play module, the wave random-encoder (WRE), is designed for the WSA mechanism. The encoder is capable of generating a pair of mutually inverse asymmetric global (local) position information matrices. The global position matrix is used to globally conduct random sampling of the wave features, and the local position matrix is used to complement the local relative dependency lost due to random sampling. In this study, experiments are performed on the task of segmenting the kidney and lung parenchyma in the standard datasets Synapse and COVID-19. The results show that this method outperforms existing models such as nnFormer and Swin-UNETR in terms of accuracy, the number of parameters, and inference rate, arriving at the SOTA level.
Online: December 19,2024 DOI: 10.15888/j.cnki.csa.009773
Abstract:In the field of knowledge distillation (KD), feature-based methods can effectively extract the rich knowledge embedded in the teacher model. However, Logit-based methods often face issues such as insufficient knowledge transfer and low efficiency. Decoupled knowledge distillation (DKD) conducts distillation by dividing the Logits output by the teacher and student models into target and non-target classes. While this method improves distillation accuracy, its single-instance-based distillation approach fails to capture the dynamic relationships among samples within a batch. Especially when there are significant differences in the output distributions of the teacher and student models, relying solely on decoupled distillation cannot effectively bridge these differences. To address the issues inherent in DKD, this study proposes a perception reconstruction method. This method introduces a perception matrix. By utilizing the representational capabilities of the model, it recalibrates Logits, meticulously analyzes intra-class dynamic relationships, and reconstructs finer-grained inter-class relationships. Since the objective of the student model is to minimize representational disparity, this method is extended to decoupled knowledge distillation. The outputs of the teacher and student models are mapped onto the perception matrix, enabling the student model to learn richer knowledge from the teacher model. A series of validations on the CIFAR-100 and ImageNet-1K datasets demonstrate that the student model trained with this method achieves a classification accuracy of 74.98% on the CIFAR-100 dataset, which is 0.87 percentage points higher than that of baseline methods, thereby enhancing the image classification performance of the student model. Additionally, comparative experiments with various methods further verify the superiority of this method.
ZHENG Guang-Hai , ZHANG Hai-Ning , QU Ying-Wei
Online: December 19,2024 DOI: 10.15888/j.cnki.csa.009778
Abstract:Aiming at degraded and blurred images captured under harsh weather conditions such as haze, rain, and snow, which make accurate recognition and detection challenging, this study proposes a pedestrian and vehicle detection algorithm, lightweight blur vision network (LiteBlurVisionNet), for blurred scenes. In the backbone network, the GlobalContextEnhancer attention-improved lightweight MobileNetV3 module is used, reducing the number of parameters and making the model more efficient in image processing under harsh weather conditions such as haze and rain. The neck network adopts a lighter Ghost module and the SpectralGhostUnit module improved from the GhostBottleneck module. These modules can more effectively capture global context information, improve the discrimination and expressive ability of features, help reduce the number of parameters and computational complexity, and thereby improve the network’s processing speed and efficiency. In the prediction part, DIoU NMS based on the non-maximum suppression method is used for maximum local search to remove redundant detection boxes and improve the accuracy of the detection algorithm in blurred scenes. Experimental results show that the parameter count of the LiteBlurVisionNet algorithm model is reduced by 96.8% compared to the RTDETR-ResNet50 algorithm model, and by 55.5% compared to the YOLOv8n algorithm model. The computational load of the LiteBlurVisionNet algorithm model is reduced by 99.9% compared to the Faster R-CNN algorithm model and by 57% compared to the YOLOv8n algorithm model. The mAP0.5 of the LiteBlurVisionNet algorithm model is improved by 13.71% compared to the IAL-YOLO algorithm model and by 2.4% compared to the YOLOv5s algorithm model. This means the model is more efficient in terms of storage and computation and is particularly suitable for resource-constrained environments or mobile devices.
TAN Chen-Han , JIA Ke-Bin , WANG Hao-Yu
Online: December 19,2024 DOI: 10.15888/j.cnki.csa.009779
Abstract:Automatic text summarization is an important branch in the field of natural language processing (NLP), and one of its main difficulties lies in how to evaluate the quality of the generated summaries quickly, objectively, and accurately. Given the problems of low evaluation accuracy, the need for reference texts, and the large consumption of computing resources in the existing text summary quality evaluation methods, this study proposes an evaluation method for the quality of text summaries based on large language models. It designs a prompt construction method based on the principle of the chain of thought to improve the performance of large language models in the evaluation of text summary quality. At the same time, a chain of thought data set is generated and a small large language model is trained in the way of model fine-tuning, significantly reducing the computing requirements. The proposed method first determines the evaluation dimension according to the characteristics of the text summary and constructs the prompt based on the principle of chain of thought. The prompt is utilized to guide the large language model to generate the chain of thought process and evaluation results based on the summary samples. Accordingly, a chain of thought data set is generated. The generated chain of thought data set is used to fine-tune and train the small large language model. Finally, the study uses the fine-tuned small-scale large language model to complete the quality evaluation of the text summary. Comparative experiments and analyses on the Summeval dataset show that this evaluation method significantly improves the evaluation accuracy of the small-scale large language model in the task of text summary quality evaluation. The study provides a text summary quality evaluation method, which is a method with high evaluation accuracy, low computing requirements, and easy deployment without reference texts.
ZHANG Qiang , CHEN Cheng , LI Qing , XUE Bing
Online: December 19,2024 DOI: 10.15888/j.cnki.csa.009781
Abstract:Given the insufficient adaptability of existing polymer dosage splitting algorithms when dealing with well groups in different blocks, this study proposes a polymer flooding well group splitting method based on an improved bald eagle search algorithm. Firstly, the preliminary splitting coefficients are obtained through grey correlation analysis. Then, the difference between the cumulative injection volume and the actual fluid production volume of each extraction well is calculated, and a reasonable threshold range and constraint conditions are set. Secondly, the bald eagle search algorithm is improved by introducing Sobol sequence and ICMIC mapping, golden sine Lévy flight guidance mechanism, nonlinear convergence factor, and adaptive inertia weighting strategy, which enhances the algorithm's searching capability and convergence accuracy. Finally, the improved bald eagle search algorithm is used to solve the optimization model of well group splitting coefficients in the actual block of an oilfield. The results show that the calculated splitting injection volume has a high degree of agreement with the actual fluid production volume and has good splitting accuracy.
Online: December 16,2024 DOI: 10.15888/j.cnki.csa.009774
Abstract:In the contemporary field of unsupervised deep hashing research, methods predicated on contrastive learning are predominant. However, sampling bias brought about by the random extraction of negative samples in contrastive learning deteriorates image retrieval accuracy. To address the issue, this study proposes a novel unsupervised deep hashing based on bias suppressing contrastive learning (BSCDH). It proposes a bias suppression method (BSS) based on a contrastive learning framework. This method approximates incorrect negative samples as extremely hard negative samples and designs a bias suppression coefficient to suppress these extremely hard negative samples, thereby alleviating the negative impact of sampling bias. The corresponding suppression coefficient value is determined based on the similarity between the current negative sample and the query sample. Distance relationship between the current negative sample and adjacent hash centers is introduced to correct the suppression coefficient value, reducing the possibility of excessive suppression of normal negative samples. Ultimately, the mAP@5000 of the BSCDH method (64 bits) achieves 0.696, 0.833, and 0.819 respectively on the CIFAR-10, FLICKR25K, and NUS-WIDE datasets, demonstrating a significant performance advantage over the baseline. Extensive experiments conducted in this paper verify that BSCDH exhibits high retrieval accuracy in unsupervised image retrieval methods and can effectively address sampling bias.