AIPUB归智期刊联盟
LI Wen-Zhe , LI Hao-Ran , WANG Tao , MA Zi-Han , WANG Chuan-Lei , GUO Li-Xue
Online: April 01,2025 DOI: 10.15888/j.cnki.csa.009802
Abstract:The intelligent diagnosis of premium threaded connections (PTC) is crucial for ensuring the stability and sealing of oil pipes under high temperature, high pressure, and acidic gas conditions. Accurate diagnosis relies on analyzing PTC curves under different operating conditions to reflect the quality of the buckling, but obtaining a large amount of valid data in actual industrial inspections is challenging. This study introduces an end-to-end classification model that combines asynchronous optimized 2D deep convolutional generative adversarial network (AoT-DCGAN) and 2D convolutional neural network (P-CNN), aiming to improve classification performance with small sample sizes. The proposed method first utilizes AoT-DCGAN to identify the distribution pattern of original samples and generate corresponding synthetic samples. At the same time, a novel weight optimization strategy, asynchronous optimization (AO), is implemented to alleviate the gradient vanishing problem during the generator optimization phase. Subsequently, a novel P-CNN model is designed and trained on an expanded dataset to achieve automatic classification of PTC curves. The method is evaluated based on recall, specificity, F1 score, precision, and confusion matrix under different data augmentation ratios. The results indicate that as the dataset size increases, the model’s classification ability improves, peaking at a dataset size of 1200. In addition, within the same training set, the performance of the P-CNN model outperforms traditional machine learning and deep learning models, achieving optimal classification accuracies of 95.9%, 95.5%, and 96.7% on the AC, ATI, and NDT curves, respectively. Finally, research confirms that applying asynchronous optimization during the training process of DCGAN results in a more stable decrease in the loss function.
Online: April 01,2025 DOI: 10.15888/j.cnki.csa.009877
Abstract:Road damage poses a great threat to the service life and safety of roads. Early detection of road defects facilitates maintenance and repair. Traditional road defect detection methods typically rely on manual visual inspection and vehicle-mounted pavement monitoring systems. However, these methods are largely influenced by the experience of road maintenance personnel. With the advancement of deep learning, increasing numbers of researchers have applied it to road defect detection. Among these, the YOLO series of object detection methods and their various variants are the most common. However, most of these methods require post-processing operations, which hinder model optimization, impair robustness, and lead to delayed inference by the detector. To address these issues, as well as the multi-scale challenges in road defect detection, an improved RT-DETR model is proposed. The backbone network is fine-tuned, and the MSaE attention module is introduced. In the encoder, GhostConv convolution and DySample module are used to optimize upsampling, while the ADown module optimizes downsampling. Comparative experiments are conducted on the public SVRDD dataset. Experimental results show that the proposed improved method achieves a 72.5% mAP@50 on the SVRDD dataset, 3.8 percentage points higher than the benchmark RT-DETR-R18, significantly enhancing road defect detection performance.
LI Jian-Dong , JIAO Xiao-Guang , QU Hai-Cheng
Online: April 01,2025 DOI: 10.15888/j.cnki.csa.009878
Abstract:To address the low accuracy and high miss detection rates in pedestrian detection caused by complex background interference, this study proposes an adaptive dual-branch dense pedestrian detection algorithm, DACD-YOLO, incorporating improved attention mechanisms. First, the backbone network employs an adaptive dual-branch structure, which fuses different features through dynamic weighting while introducing depthwise separable convolution to reduce the computational cost, effectively mitigating the information loss present in traditional single-branch networks. Second, an adaptive vision center is proposed to enhance intra-layer feature extraction through dynamic optimization, with channel numbers reconfigured to balance accuracy and computational load. A coordinate dual-channel attention mechanism is then introduced, combining a heterogeneous convolution kernel design within a lightweight fusion module to reduce computational complexity and improve the capture of key features. Lastly, a dilation convolution detection head is utilized, fusing multi-scale features through convolutions with varying dilation rates, effectively enhancing feature extraction for small and occluded objects. Experimental results show that, compared to the original YOLOv8n, the proposed algorithm improves mAP@0.5 and mAP@0.5:0.95 by 2.3% and 2.2%, respectively, on the WiderPerson dataset, and by 3.5% and 4.6%, respectively, on the CrowdHuman dataset. The experiments demonstrate that the proposed algorithm significantly enhances accuracy in dense pedestrian detection compared to the original method.
WANG Zhe-Kai , FENG Yun-Xia , WANG Jia-Wen
Online: April 01,2025 DOI: 10.15888/j.cnki.csa.009879
Abstract:Sign language is a communication tool commonly used by people with hearing impairments or those who are unable to communicate verbally. It utilizes gestures to convey actions and simulate images or syllables that form specific meanings or words. With the continuous development of computer vision and deep learning, sign language recognition technology has emerged and continued to develop, making it possible for hearing individuals to communicate with the deaf or mute. However, the complexity and variability of dynamic sign language still pose challenges for its accurate detection and recognition. To promote research in this field, this study conducts an in-depth review of existing dynamic sign language recognition methods and technologies. First, the development history and current research status of dynamic sign language recognition technology, commonly used dynamic sign language datasets, and evaluation metrics for sign language recognition methods are reviewed. Second, deep learning models frequently used in dynamic sign language recognition are examined, and the challenges faced by dynamic sign language recognition technology, along with corresponding solutions, are discussed. Finally, based on the current status of sign language recognition, the challenges of dynamic sign language recognition are summarized, and an analysis and outlook are provided regarding the potential improvements to sign language recognition performance in the next stage.
YU Feng-Yuan , JIANG Zhong-Ding
Online: April 01,2025 DOI: 10.15888/j.cnki.csa.009890
Abstract:High-definition, low-latency display of Chinese paintings is essential for Chinese painting VR exhibition applications. Due to the limited memory and GPU resources on mobile VR headsets, it is challenging to display a large number of high-resolution Chinese painting textures simultaneously. Moreover, direct viewing of fine details is hindered by mipmap management and the low resolution of mobile VR devices. This study proposes an improved virtual texture method, optimizing both tile request calculation and tile loading stages based on existing virtual texture methods. In the tile request calculation phase, the method incorporates tile request computations for magnified perspectives. Compute Shader is utilized to parallelize the processing of tile request parameters, and hashing is applied to minimize overhead when constructing result caches. In the tile loading phase, lock-free queues are implemented to enhance loading efficiency. A direct loading strategy for request tiles, constrained by a quantity threshold, reduces display latency. The performance and texture display effects are evaluated in scenarios with single or multiple Chinese paintings, simulating user behavior. Results show that the proposed method supports high-definition, low-latency display of high-resolution Chinese painting textures on mobile VR devices. Magnification-assisted perspectives allow for clear viewing of the finest texture details. Compared to existing virtual texture methods, such as Unreal SVT, the proposed method achieves higher frame rates and reduces display latency for high-resolution texture tiles of multiple Chinese paintings.
Online: April 01,2025 DOI: 10.15888/j.cnki.csa.009894
Abstract:An improved YOLOv8 model (FCU-YOLOv8) is proposed to enhance the accuracy and efficiency of rice disease detection, addressing the challenges of diverse rice diseases, complex backgrounds, and subtle differences in characteristics between diseases. The FasterNeXt module is used to replace the C2f module in the YOLOv8 backbone network. By optimizing the network structure, the FasterNeXt module reduces computation and memory access while improving feature extraction efficiency, thus lowering the inference cost of the model. The C3K module (multi-scale convolution module) and CPSA module (convolutional attention mechanism) are designed to further enhance the model’s ability to perceive disease region features. The C3K module allows the model to adapt to disease characteristics at various scales through flexible convolutional kernel selection, while the CPSA module employs an attention mechanism to enhance the model’s ability to capture key information. To improve the quality of detection boxes and the detection performance of dense disease targets, the optimized unified intersection over union (UIoU) loss function is adopted. This function improves detection performance by balancing the accuracy and consistency of bounding boxes during the regression phase. On a custom-made image dataset of eight common rice diseases, FCU-YOLOv8 demonstrates significant improvements over the original YOLOv8 in several performance metrics. The mAP@0.5 index reaches 94.7%, a 2.4% improvement over the baseline model, and the mAP@0.5:0.95 index reaches 67.2%, a 3.3% improvement. The model’s parameters are reduced by 24.2%, and the calculated floating-point operations decrease by 28.7%, compared to the baseline model in terms of model lightweighting. Compared with mainstream algorithms, the proposed algorithm outperforms current leading algorithms, demonstrating the effectiveness of the network.
ZHANG Xiao-Rui , MO Yun-Fei , SUN Wei
Online: March 31,2025 DOI: 10.15888/j.cnki.csa.009846
Abstract:Medical image segmentation serves as a fundamental and critical component in numerous clinical applications. Recent advancements in interactive segmentation methods have attracted significant attention due to their high accuracy and robustness in complex clinical tasks. However, current deep learning-based interactive segmentation methods exhibit limitations in leveraging user interactions, particularly in interactive encoding design and pixel classification. To address these limitations, this study proposes a hybrid interaction design incorporating “near-center points” and “outer-edge points”, which ensures low interaction costs while accurately capturing user intent. Additionally, the existing geodesic distance encoding method is enhanced by a Gaussian attenuation function to mitigate image noise interference and improve the robustness and accuracy of interaction encoding. Furthermore, a Gaussian process classification method based on a hybrid kernel function is integrated to fully exploit user interaction information during pixel classification, enhancing segmentation accuracy while endowing the model with interpretability. Extensive experiments on five segmentation tasks across four representative subsets of the medical segmentation decathlon (MSD) dataset demonstrate that the proposed method achieves consistently high segmentation accuracy. In particular, for complex tasks such as pancreas tumor and colon image segmentation, this method has significantly higher Dice coefficients and ASSD values than existing methods, showing its strengths in precise segmentation and boundary refinement.
DING Chuan-Long , HUA Guo-Xiang , JIANG Liang , GUO Yong-Xin
Online: March 31,2025 DOI: 10.15888/j.cnki.csa.009861
Abstract:Rolling bearings are crucial components in mechanical systems. As low-frequency faults are less likely to occur, data samples related to those are rare, bringing difficulties to the collection and processing of related data. If not properly addressed, such faults can lead to severe safety hazards and substantial economic losses. To deal with this problem, this study proposes a dual-path fault diagnosis model that integrates traditional signal processing methods with convolutional neural network (CNN) and multilayer perceptron (MLP). In terms of feature extraction, the study employs a combination of discrete wavelet transform (DWT) and continuous wavelet transform (CWT), along with average downsampling techniques, to extract multi-scale time-frequency and time-domain features from the raw signals. The model contains two paths: one extracts time-frequency features of feature engineering by embedding the efficient channel attention (ECA) mechanism into the residual CNN, and the other uses MLP to process down-sampled multi-scale time-domain features, and finally fuses the two paths for classification. Small sample evaluation shows that the feature engineering method achieves an average diagnostic accuracy of 99.34% on the Case Western Reserve University (CWRU) dataset, which is higher than the 98.97% achieved by the traditional method. The hybrid CNN-MLP dual-path model achieves a high accuracy of 99.90% on the CWRU dataset and an accuracy of 98.38% on the Jiangnan University (JNU) dataset. It shows its application potential in small sample rolling bearing fault diagnosis.
BU Li-Ping , CHANG Gui-Yong , YU Bi-Hui , LIU Da-Wei , WEI Jing-Xuan , SUN Lin-Zhuang , LIU Long-Yi
Online: March 31,2025 DOI: 10.15888/j.cnki.csa.009881
Abstract:Large visual language model (LVLM) demonstrate remarkable capabilities in understanding visual information and generating verbal expressions. However, LVLM are often affected by the phenomenon of object hallucinations, where the outputs appear plausible but do not align with the visual information in the images. This discrepancy between the generated text and the images presents a significant challenge in achieving accurate image-to-text alignment. To address this issue, this study identifies the lack of object attention as a key factor contributing to object hallucinations. To mitigate this, the proposed image contrast enhancement (ICE) method is introduced. ICE is a simple, user-friendly approach that compares the output distributions from both the original and the augmented visual inputs. This method enhances the model’s ability to perceive images more accurately, ensuring that the generated content aligns closely with the visual input and produces contextually consistent outputs. Experimental results demonstrate that the ICE method effectively mitigates object hallucinations across various LVLM without requiring additional training or external tools. Furthermore, the method performs well on the MME benchmark test for large-scale visual language models, indicating its broad applicability and effectiveness. The code will be released at ChangGuiyong/ICE.
YUAN Xu , ZHU Yi , QIANG Ji-Peng , YUAN Yun-Hao , LI Yun
Online: March 31,2025 DOI: 10.15888/j.cnki.csa.009891
Abstract:Clickbait refers to the use of sensational or exaggerated headlines to attract users into clicking, a practice that has proliferated in recent years across online platforms such as news portals and social media. This trend has led to user dissatisfaction and, in some cases, facilitated online fraud. Large language models (LLM), known for their robust natural language understanding and text generation capabilities, have demonstrated outstanding performance across various natural language processing tasks. However, when faced with specific challenges like clickbait detection, where decision boundaries are often unclear, LLM are prone to hallucination. To address the issue, a method based on a dual-layer multi-agent large language model is proposed, which significantly enhances clickbait detection accuracy without the need to fine-tune the entire model. Specifically, internal voting within agents in the first layer and cross-voting among different agents in the second layer results in enhanced detection performance. Validation against three benchmark datasets shows that the proposed method outperforms state-of-the-art large-scale models and prompt learning techniques by nearly 13% and 10% in terms of accuracy, respectively.