• Current Issue
  • Online First
  • Archive
  • Click Rank
  • Most Downloaded
  • 综述文章
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2024,33(12):1-15, DOI: 10.15888/j.cnki.csa.009708, CSTR: 32024.14.csa.009708
    [Abstract] (101) [HTML] (36) [PDF 1.79 M] (753)
    Abstract:
    Skin cancer is one of the most common and deadliest types of cancer, with its incidence rapidly increasing worldwide. Failure to diagnose it in its early stages can lead to metastasis and high mortality rates. This study provides a systematic review of recent literature on the application of traditional machine learning and deep learning in the diagnosis of skin cancer lesions, providing valuable reference for further research in skin cancer diagnosis. Firstly, several publicly available datasets of skin diseases are compiled. Secondly, the application of different machine learning algorithms in the classification of skin cancer lesions is analyzed and compared to better understand their advantages and limitations in practical applications, with a focus on convolutional neural network in diagnosis classification. With a thorough understanding of these algorithms, their performance differences and improvement strategies in dealing with skin diseases are discussed. Ultimately, through discussions on current challenges and future directions, beneficial insights and recommendations are provided to further enhance the performance and reliability of early skin cancer diagnosis systems.
    2024,33(12):16-29, DOI: 10.15888/j.cnki.csa.009711, CSTR: 32024.14.csa.009711
    Abstract:
    In mobile edge computing (MEC), load imbalance among edge servers occurs due to irrational task offloading strategies and resource allocation, as well as a sharp increase in the number of multi-type tasks. To address the above-mentioned issues, this study proposes a load prediction and balanced assignment scheme for multi-type tasks (LBMT) in a multi-user, multi-MEC edge environment. The LBMT scheme includes three components: task type classification, task load prediction, and task adaptive mapping. Firstly, considering the diversity of task types, a task type model is designed to classify tasks. Secondly, a task load prediction model is developed, considering the varying loads imposed by different tasks on servers, and employs an improved K-nearest neighbor (KNN) algorithm for load prediction. Thirdly, taking into account the heterogeneity of MEC servers and the limitation of resources, a task allocation model is designed in conjunction with a server load balancing model. Additionally, a task allocation method based on an adaptive task mapping algorithm is proposed. Finally, the LBMT scheme optimizes resource utilization and task processing rates for MEC servers to achieve the optimal load-balanced task offloading strategy. Simulation experiments compare LBMT with improved min-min offloading, intermediate node-based offloading, and weighted bipartite graph-based offloading schemes. The results show that LBMT improves the resource utilization rate by more than 12.5% and the task processing rate by more than 20.3%. Additionally, LBMT significantly reduces the standard deviation of load balancing, more effectively achieving load balance among servers.
    2024,33(12):30-42, DOI: 10.15888/j.cnki.csa.009727, CSTR: 32024.14.csa.009727
    Abstract:
    Computed tomography (CT) scanning provides valuable material for detecting hepatic lesions in the liver. Manual detection of hepatic lesions is laborious and heavily relies on the expertise of physicians. Existing algorithms for liver lesion detection exhibit suboptimal performance in detecting subtle lesions. To address this issue, this study proposes a self-supervised liver lesion detection algorithm based on frequency-aware image restoration. Firstly, this algorithm designs a self-supervised task based on synthetic anomalies to generate a broader and more suitable set of pseudo-anomalous images, thereby alleviating the issue of insufficient abnormal data during model training. Secondly, to suppress the sensitivity of the reconstructed network to synthetic liver anomalies, a module is designed to extract high-frequency information from images. By restoring the images from their high-frequency components, the adverse generalization of the reconstructed network to anomalies is mitigated. Lastly, the algorithm adopts weight decay to train the segmented sub-networks, reducing the occurrence of trivial solutions during the early stages of training and enabling the detection of local and subtle lesions. Extensive experiments conducted on publicly available real datasets demonstrate that the proposed method achieves state-of-the-art performance in liver lesion detection.
    2024,33(12):43-54, DOI: 10.15888/j.cnki.csa.009698, CSTR: 32024.14.csa.009698
    Abstract:
    Group activity recognition (GAR) is one of the highly researched areas in the field of computer vision, aiming to detect the overall behavior performed by multiple individual actions and interactions. However, due to difficulties in determining individual interaction relationships, the tightness of connections, and the key actor, current methods often focus on individual character features, yet neglecting connections with scene context. To address that issue, a novel reasoning model for GAR, GIFFNet, is proposed based on global-individual feature fusion (GIFF). To compensate for the lack of scene information in predicting group activity, GIFFNet, on the basis of focusing on key information, effectively integrates scene context and individual character features by constructing the GIFF module, obtaining more representative fusion features. Subsequently, GIFFNet utilizes fusion features to calculate the interaction relationship graph between characters in the scene and uses graph convolutional network (GCN) for training and predicting group behavior categories. In addition, to address the issue of imbalanced samples in the dataset, GIFFNet adopts a strategy of dynamically assigning weights to optimize the loss function. Experimental results demonstrate that GIFFNet achieves a multi-class classification accuracy (MCA) of 93.8% and 96.1% on Volleyball and Collective Activity datasets, and the mean per class accuracy (MPCA) is 93.9% and 95.8%, respectively, outperforming other existing deep learning methods. GIFFNet provides features with a more powerful characterization ability for activity classification through feature fusion, which effectively improves GAR accuracy.
    2024,33(12):55-66, DOI: 10.15888/j.cnki.csa.009691, CSTR: 32024.14.csa.009691
    Abstract:
    As an Internet infrastructure, DNS is rarely subjected to deep monitoring by firewalls, allowing hackers and Asia-Pacific Telecommunity (APT) organizations to exploit DNS covert tunnels for data theft or network control and posing a significant threat to network security. In response to the easily bypassed nature of existing detection methods and their weak generalization capabilities, this study enhances the characterization method of DNS traffic and introduces the pcap features extraction CNN-Transformer (PFEC-Transformer) model. This model uses characterized decimal numerical sequences as input, conducts local feature extraction through CNN modules, and then analyzes long-distance dependency patterns between local features by using the Transformer for classification. The research builds datasets by collecting internet traffic and data packets generated by various DNS covert tunnel tools and conducts generalization testing with publicly available datasets containing traffic from unknown tunneling tools. Experimental results demonstrate that the model achieves an accuracy of 99.97% on the testing dataset and 92.12% on the generalization testing dataset, effectively showcasing its exceptional performance in detecting unknown DNS covert tunnels.
    2024,33(12):67-77, DOI: 10.15888/j.cnki.csa.009713, CSTR: 32024.14.csa.009713
    Abstract:
    Braille conversion technology is crucial for advancing information accessibility for the blind. With the rapid advancement of information globalization, the blind are increasingly exposed to bilingual information in both Chinese and English. While existing braille conversion systems have successfully translated Chinese and English into braille, they fall short in accurately converting punctuation, including poor differentiation of punctuation with multiple uses and lack of error correction for the mixed use of Chinese and English punctuation. Failure to address these issues may lead to misunderstanding of text by the blind. This study delves into these problems, designing and implementing a bilingual braille conversion system capable of distinguishing multipurpose punctuation and correcting the mixed use of punctuation. The performance of the system is evaluated by using a dataset based on BLCU Chinese Corpus. The results demonstrate that the proposed system accurately distinguishes multipurpose punctuation and corrects the mixed use of Chinese and English punctuation according to language types and context, outperforming other braille conversion systems. Overall, this research has significant potential for promoting information accessibility in China.
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009743
    Abstract:
    The lack of lighting and the complex environment in the mine, coupled with the small target size of safety helmets, lead to poor detection performance of safety helmets by general object detection models. To solve these issues, an improved mine safety helmet wearing detection model based on YOLOv8s is proposed. Firstly, the effectiveSE module is combined with the C2f module in the neck network of YOLOv8s to design a new C2f-eSE module, improving the feature extraction ability of the network structure. The CIoU loss function is replaced by the Wise-EIoU loss function to improve the model’s robustness. In addition, the spatial and channel reconstruction convolution (SCConv) module is introduced into the detection head. A new lightweight SPS detection head is designed based on the parameter sharing concept, reducing the number of parameters and computational complexity of the model. Finally, adding a P2 detection layer to the model enables the feature extraction network to incorporate more shallow information and improves the detection ability for small-sized targets. Experimental results show that the mAP50 index of the improved model increases by 3.2%, the number of parameters decreases by 1.6%, and GFLOPs decreases by 5.6%.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009763
    Abstract:
    In complex terrain conditions, UAV formation path planning based on deep reinforcement learning can optimize the path of UAV formation, with better path length and environmental adaptability than traditional heuristic algorithms. However, it still has problems such as insufficient training stability and poor real-time planning. For UAV clusters with a leader-follower mode, this study proposes a real-time 3D path planning method for UAV formation based on the SPER-TD3 algorithm. Firstly, the prioritized experience replay mechanism based on SumTree is integrated into the TD3 algorithm, and the SPER-TD3 algorithm is designed to determine the path of the UAV formation. Then, an angle formation control method is used to optimize the path of the followers, and a dynamic path smoothing algorithm is applied to optimize the steering angle. To accelerate the training convergence speed and stability of the SPER-TD3 algorithm, and solve the long-term dependence problem, a network model structure combining LSTM, self-attention mechanism, and multiple perceptrons is designed. Simulation experiments are conducted in environments with various obstacles. Results show that the method mentioned above is superior to eight mainstream deep reinforcement learning algorithms in terms of path safety coverage rate, flight path smoothness, success rate, and reward size. Its comprehensive evaluation value of importance is 8.5% to 72.9% higher than existing methods, and it has the best training stability.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009764
    Abstract:
    Key sentence extraction technology refers to using artificial intelligence to automatically find key sentences from a long text. This technology can be used for preprocessing information retrieval and is of great significance for downstream tasks such as text classification and extractive summarization. Traditional unsupervised key sentence extraction technologies are mostly based on statistics and graphical model methods, which have problems such as low accuracy and the need to build a large-scale corpus in advance. This study proposes T5KSEChinese, a method that can extract key sentences without supervision in the Chinese context. This method uses an encoder-decoder architecture to ignore the mismatch in length between the target sentence and the original text by inputting and outputting prompt words to obtain more accurate results. At the same time, a contrastive learning positive sample construction method is also proposed and combined with contrastive learning to conduct semi-supervised training on the encoder part of the model, which can improve the performance of downstream tasks. The method uses lightweight models to outperform the large language model with tens of times the number of parameters in the unsupervised downstream task. The final experimental results prove the accuracy and reliability of the proposed method.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009766
    Abstract:
    Since existing work on the task of fake news detection frequently ignores the semantic sparsity of news text and the potential relationships between rich information, which limits the model’s capacity to understand and recognize fake news, this study proposes a fake news detection method based on heterogeneous subgraph attention networks. Heterogeneous graphs are constructed to model the abundant features of fake news, such as text, party affiliation, and topic of news samples. The heterogeneous graph attention network is constructed at the feature layer to capture the correlations between different types of information, and a subgraph attention network is constructed at the sample layer to mine the interactions between news samples. Moreover, the mutual information mechanism based on self-supervised contrastive learning focuses on discriminative subgraph representations within the global graph structure to capture the specificity of news samples. Experimental results demonstrate that the method proposed in this study achieves about 9% and 12% improvement in accuracy and F1 score, respectively, compared with existing methods on the Liar dataset, which significantly improves the performance of fake news detection.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009768
    Abstract:
    In spectral 3D CT data, the traditional convolution has a poor ability to capture global features, and the full-scale self-attention mechanism consumes large resources. To solve this problem, this study introduces a new visual attention paradigm, the wave self-attention (WSA). Compared with the ViT technology, this mechanism uses fewer resources to obtain the same amount of self-attention information. In addition, to more adequately extract the relative dependency among organs and to improve the robustness and execution speed of the model, a plug-and-play module, the wave random-encoder (WRE), is designed for the WSA mechanism. The encoder is capable of generating a pair of mutually inverse asymmetric global (local) position information matrices. The global position matrix is used to globally conduct random sampling of the wave features, and the local position matrix is used to complement the local relative dependency lost due to random sampling. In this study, experiments are performed on the task of segmenting the kidney and lung parenchyma in the standard datasets Synapse and COVID-19. The results show that this method outperforms existing models such as nnFormer and Swin-UNETR in terms of accuracy, the number of parameters, and inference rate, arriving at the SOTA level.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009773
    Abstract:
    In the field of knowledge distillation (KD), feature-based methods can effectively extract the rich knowledge embedded in the teacher model. However, Logit-based methods often face issues such as insufficient knowledge transfer and low efficiency. Decoupled knowledge distillation (DKD) conducts distillation by dividing the Logits output by the teacher and student models into target and non-target classes. While this method improves distillation accuracy, its single-instance-based distillation approach fails to capture the dynamic relationships among samples within a batch. Especially when there are significant differences in the output distributions of the teacher and student models, relying solely on decoupled distillation cannot effectively bridge these differences. To address the issues inherent in DKD, this study proposes a perception reconstruction method. This method introduces a perception matrix. By utilizing the representational capabilities of the model, it recalibrates Logits, meticulously analyzes intra-class dynamic relationships, and reconstructs finer-grained inter-class relationships. Since the objective of the student model is to minimize representational disparity, this method is extended to decoupled knowledge distillation. The outputs of the teacher and student models are mapped onto the perception matrix, enabling the student model to learn richer knowledge from the teacher model. A series of validations on the CIFAR-100 and ImageNet-1K datasets demonstrate that the student model trained with this method achieves a classification accuracy of 74.98% on the CIFAR-100 dataset, which is 0.87 percentage points higher than that of baseline methods, thereby enhancing the image classification performance of the student model. Additionally, comparative experiments with various methods further verify the superiority of this method.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009778
    Abstract:
    Aiming at degraded and blurred images captured under harsh weather conditions such as haze, rain, and snow, which make accurate recognition and detection challenging, this study proposes a pedestrian and vehicle detection algorithm, lightweight blur vision network (LiteBlurVisionNet), for blurred scenes. In the backbone network, the GlobalContextEnhancer attention-improved lightweight MobileNetV3 module is used, reducing the number of parameters and making the model more efficient in image processing under harsh weather conditions such as haze and rain. The neck network adopts a lighter Ghost module and the SpectralGhostUnit module improved from the GhostBottleneck module. These modules can more effectively capture global context information, improve the discrimination and expressive ability of features, help reduce the number of parameters and computational complexity, and thereby improve the network’s processing speed and efficiency. In the prediction part, DIoU NMS based on the non-maximum suppression method is used for maximum local search to remove redundant detection boxes and improve the accuracy of the detection algorithm in blurred scenes. Experimental results show that the parameter count of the LiteBlurVisionNet algorithm model is reduced by 96.8% compared to the RTDETR-ResNet50 algorithm model, and by 55.5% compared to the YOLOv8n algorithm model. The computational load of the LiteBlurVisionNet algorithm model is reduced by 99.9% compared to the Faster R-CNN algorithm model and by 57% compared to the YOLOv8n algorithm model. The mAP0.5 of the LiteBlurVisionNet algorithm model is improved by 13.71% compared to the IAL-YOLO algorithm model and by 2.4% compared to the YOLOv5s algorithm model. This means the model is more efficient in terms of storage and computation and is particularly suitable for resource-constrained environments or mobile devices.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009779
    Abstract:
    Automatic text summarization is an important branch in the field of natural language processing (NLP), and one of its main difficulties lies in how to evaluate the quality of the generated summaries quickly, objectively, and accurately. Given the problems of low evaluation accuracy, the need for reference texts, and the large consumption of computing resources in the existing text summary quality evaluation methods, this study proposes an evaluation method for the quality of text summaries based on large language models. It designs a prompt construction method based on the principle of the chain of thought to improve the performance of large language models in the evaluation of text summary quality. At the same time, a chain of thought data set is generated and a small large language model is trained in the way of model fine-tuning, significantly reducing the computing requirements. The proposed method first determines the evaluation dimension according to the characteristics of the text summary and constructs the prompt based on the principle of chain of thought. The prompt is utilized to guide the large language model to generate the chain of thought process and evaluation results based on the summary samples. Accordingly, a chain of thought data set is generated. The generated chain of thought data set is used to fine-tune and train the small large language model. Finally, the study uses the fine-tuned small-scale large language model to complete the quality evaluation of the text summary. Comparative experiments and analyses on the Summeval dataset show that this evaluation method significantly improves the evaluation accuracy of the small-scale large language model in the task of text summary quality evaluation. The study provides a text summary quality evaluation method, which is a method with high evaluation accuracy, low computing requirements, and easy deployment without reference texts.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009781
    Abstract:
    Given the insufficient adaptability of existing polymer dosage splitting algorithms when dealing with well groups in different blocks, this study proposes a polymer flooding well group splitting method based on an improved bald eagle search algorithm. Firstly, the preliminary splitting coefficients are obtained through grey correlation analysis. Then, the difference between the cumulative injection volume and the actual fluid production volume of each extraction well is calculated, and a reasonable threshold range and constraint conditions are set. Secondly, the bald eagle search algorithm is improved by introducing Sobol sequence and ICMIC mapping, golden sine Lévy flight guidance mechanism, nonlinear convergence factor, and adaptive inertia weighting strategy, which enhances the algorithm's searching capability and convergence accuracy. Finally, the improved bald eagle search algorithm is used to solve the optimization model of well group splitting coefficients in the actual block of an oilfield. The results show that the calculated splitting injection volume has a high degree of agreement with the actual fluid production volume and has good splitting accuracy.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009774
    Abstract:
    In the contemporary field of unsupervised deep hashing research, methods predicated on contrastive learning are predominant. However, sampling bias brought about by the random extraction of negative samples in contrastive learning deteriorates image retrieval accuracy. To address the issue, this study proposes a novel unsupervised deep hashing based on bias suppressing contrastive learning (BSCDH). It proposes a bias suppression method (BSS) based on a contrastive learning framework. This method approximates incorrect negative samples as extremely hard negative samples and designs a bias suppression coefficient to suppress these extremely hard negative samples, thereby alleviating the negative impact of sampling bias. The corresponding suppression coefficient value is determined based on the similarity between the current negative sample and the query sample. Distance relationship between the current negative sample and adjacent hash centers is introduced to correct the suppression coefficient value, reducing the possibility of excessive suppression of normal negative samples. Ultimately, the mAP@5000 of the BSCDH method (64 bits) achieves 0.696, 0.833, and 0.819 respectively on the CIFAR-10, FLICKR25K, and NUS-WIDE datasets, demonstrating a significant performance advantage over the baseline. Extensive experiments conducted in this paper verify that BSCDH exhibits high retrieval accuracy in unsupervised image retrieval methods and can effectively address sampling bias.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009769
    Abstract:
    Most of the existing knowledge graph link prediction methods focus only on the semantic relationships between a head entity h, a relationship r, and a tail entity t in a single triad in learning semantic information. They do not consider the links between related entities and entity relationships in different triads. To address this problem, this study proposes the DeepE_CL model. Firstly, the study uses the DeepE model to learn the semantic information of related triads and entities with the same entity relationship pairs or entity relationship pairs with the same entities. Secondly, the extracted semantic information of the related triads is used to calculate the corresponding scoring function and cross-entropy loss, and the extracted semantic information of entities with the same entity relationship pairs or entity relationship pairs with the same entities is optimized through the comparative learning model, so as to predict the missing information of the related triads. This paper validates the proposed method through four common datasets and compares the proposed method with other baseline models by applying four evaluation indicators, including MR, MRR, Hit@1, and Hit@10. The experimental results show that the DeepE_CL model achieves the best results in all indicators. To further validate the usefulness of the model, this study also applies the model to a real TCM dataset, and the experimental results show that compared with the DeepE model, the DeepE_CL model reduces the MR indicators by 18, and improves the MRR, Hit@1 indicators by 0.8%, 1.1%, and the Hit@10 indicators remain unchanged. The experiments demonstrate that the DeepE_CL model, introducing a comparative learning model, is very effective in improving the performance of knowledge graph link prediction.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009770
    Abstract:
    The density peaks clustering (DPC) algorithm achieves clustering by identifying cluster centers based on local density and relative distance. However, it tends to overlook cluster centers in low-density regions for data with uneven density distribution and unbalanced cluster sizes. Therefore, the number of clusters needs to be set artificially. Besides, if a data point allocation occurs to be wrong in the whole strategy, it will lead to incorrect allocation of subsequent points. To address these issues, this study proposes an adaptive sparse-aware density peaks clustering algorithm. Firstly, fuzzy points are introduced to minimize their impact on the subcluster merging process. Secondly, the subtractive clustering method is used to identify the low-density regions’ center. Then, noise is identified and subcluster centers are updated based on new local density and reverse nearest neighbor. Finally, a redefined global overlap metric combined with global separability guides subcluster merging while automatically determining clustering results using these metrics. Experimental results demonstrate that compared to DPC and its improved algorithms, the proposed algorithm effectively identifies sparse clusters in both synthetic and UCI datasets while reducing chain reactions caused by non-center assignments. Also, the proposed algorithm can automatically determine the optimal clustering number, ultimately yielding more accurate clustering results.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009765
    Abstract:
    Transformer-based object detection algorithms often suffer from problems such as insufficient accuracy and slow convergence. Although many studies have proposed improvements to address these problems and have achieved certain outcomes, most of them overlook two key shortcomings when applying Transformer structure to the field of object detection. Firstly, self-attention computation results are not diversified. Secondly, due to the complexity of set prediction, the models are unstable during target matching. To overcome these deficiencies, this study proposes several enhancements. Firstly, an adaptive token pooling module is designed to increase self-attention weight diversity. Secondly, a rough-prediction-based anchor box localization module is introduced, which provides positional prior information for queries to enhance stability during bipartite matching. Lastly, a group-based denoising task is designed, which trains the model to distinguish between positive and negative queries near the target, thereby improving the model’s ability to perform set prediction. Experimental results show that the proposed improved algorithm achieves better training results on the COCO dataset. Compared with the baseline model, the improved algorithm significantly outperforms in both detection accuracy and convergence speed.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009746
    Abstract:
    This study proposes an analysis method based on association mining between historical accident reports and a root cause index system to fully leverage experts’ experience in root cause analysis of past accidents and enhance the accuracy and comprehensiveness of such analysis, thereby reducing chemical safety incidents. By constructing an association matrix between accident reports and the index system, this method utilizes a pre-trained model to represent accident and index texts. It integrates secondary and tertiary index information based on an attention mechanism and finally employs a graph convolutional neural network for root cause analysis. Validation on a dataset of 1351 samples demonstrates that this method significantly improves the accuracy of root cause prediction, effectively utilizing expert analysis of historical accidents to analyze current accidents and uncover the limitations in previous accident analysis. Additionally, this method accurately identifies the root causes of accidents even with incomplete incident descriptions. The application of this method will enhance accident prevention and risk management in occupational safety.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009751
    Abstract:
    The YOLOv8n algorithm exhibits suboptimal performance when dealing with complex backgrounds, dense targets, and small-sized objects with limited pixel information, leading to reduced precision, missed detection, and misclassification. To address these issues, this study proposes an algorithm, LNCE-YOLOv8n, for safety equipment detection. This algorithm includes a linear multi-scale fusion attention (LMSFA) mechanism, which adaptively focuses on key features to improve the extraction of information from small targets while reducing computational loads. An architecture called C2f_New networks (C2f_NewNet) is also introduced, which maintains high performance and reduces depth through an effective parallelization design. Combined with a lightweight universal up-sampling operator, content-aware reassembly of features (CARAFE), the proposed algorithm realizes efficient cross-scale feature fusion and propagation and aggregates contextual information within a large receptive field. Based on the SIoU (symmetric intersection over union) loss function, this study proposed enhanced SIoU (ESIoU) to improve the adaptability and accuracy of the model in complex environments. Tested on a safety equipment dataset, LNCE-YOLOv8n outperforms YOLOv8n, exhibiting a 5.1% increase in accuracy, a 2.7% rise in mAP50, and a 3.4% boost in mAP50-95, significantly enhancing the detection accuracy of safety equipment for workers in complex construction conditions.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009752
    Abstract:
    Pneumonia is a prevalent respiratory disease for which early diagnosis is crucial to effective treatment. This study proposes a hybrid model, CTFNet, which combines convolutional neural network (CNN) and Transformer to aid in the effective and accurate diagnosis of pneumonia. The model integrates a convolutional tokenizer and a focused linear attention mechanism. The convolutional tokenizer performs more compact feature extraction through convolution operations, retaining key local features of images while reducing computational complexity to enhance model expressiveness. The focused linear attention mechanism reduces the computational demands of the Transformer and optimizes the attention framework, significantly improving model performance. On the Chest X-ray Images dataset, CTFNet demonstrates outstanding performance in pneumonia classification tasks, achieving an accuracy of 99.32%, a precision of 99.55%, a recall of 99.55%, and an F1 score of 99.55%. The impressive performance highlights the model’s potential for clinical applications. The model is evaluated on the COVID-19 Radiography Database dataset for its generalization ability. In this dataset, CTFNet achieves an accuracy above 98% in multiple binary classification tasks. These results indicate that CTFNet exhibits strong generalization ability and reliability across various tasks in pneumonia image classification.
    Available online:  December 13, 2024 , DOI: 10.15888/j.cnki.csa.009753
    Abstract:
    Traditional algorithms for knowledge-aware propagation recommendation face challenges including low correlation of higher-order features, unbalanced information utilization, and noise introduction. To address these challenges, this study proposes a multi-level contrastive learning for knowledge-aware propagation recommender algorithm utilizing knowledge enhancement (MCLK-KE). By constructing enhanced views and utilizing mask reconstruction-based self-supervised pre-training, the algorithm extracts deeper information from key triples to effectively suppress noise signals. It achieves a balanced utilization of knowledge and interactive signals while enhancing feature representation by comparing graphs to capture effective node attributes globally. Multi-task training significantly improves model performance by incorporating recommendation prediction, contrastive learning, and mask reconstruction tasks. In tests on three publicly available datasets, MCLK-KE demonstrates a maximum increase of 3.3% in AUC and 5.3% in F1 scores compared to the best baseline model.
    Available online:  December 13, 2024 , DOI: 10.15888/j.cnki.csa.009767
    Abstract:
    Cartoon character face detection is more challenging than face detection because it involves many difficult scenarios. Given the huge differences between different cartoon characters’ faces, this study proposes a cartoon character face detection algorithm, named YOLOv8-DEL. Firstly, the DBBNCSPELAN module is designed based on GELAN fusion BDD to reduce model size and enhance detection performance. Next, a multi-scale attention mechanism called ELA is introduced to improve the SPPF structure and enhance the feature extraction ability of the backbone model. Finally, a new detection head for shared convolution is designed to make the network lighter. At the same time, the original CIoU loss function is replaced by Shape-IoU to improve the convergence efficiency of the model. Experiments are carried out on the iCartoonFace dataset, and ablation experiments are carried out to verify the proposed model. Besides, the proposed model is compared with the YOLOv3-tiny, YOLOv5n, and YOLOv6 models. The mAP of the improved model YOLO-DEL reaches 90.3%, 1.2% higher than that of YOLOv8. The parameters are 1.69M, 47% lower than YOLOv8 and 44% lower than GFLOPs. Experimental results show that the proposed method effectively improves cartoon character face detection precision while compressing the network model’s size. Thus, the proposed method has proved to be effective.
    Available online:  December 13, 2024 , DOI: 10.15888/j.cnki.csa.009762
    Abstract:
    It is a significant challenge for high-precision 3D object detection for autonomous vehicles equipped with multiple sensors in the dusty wilderness. The variable wilderness terrain aggravates the regional feature differences of detected objects. Additionally, dust particles can blur the object features. To address these issues, this study proposes a 3D object detection method based on multi-modal feature dynamic fusion and constructs a multi-level feature self-adaptive fusion module and a feature alignment augmentation module. The former module dynamically adjusts the model’s attention to global-level features and regional-level features, leveraging multi-level receptive fields to reduce the impact of regional variances on recognition performance. The latter module bolsters the feature representation of regions of interest before multi-modal feature alignment, effectively suppressing interference factors such as dust. Experimental results show that compared with the average precision of the baseline, that of this approach is improved by 2.79% in the self-built wilderness dataset and by 1.7% in the hard-level test of the KITTI dataset. This shows our method has good robustness and precision.
    Available online:  December 09, 2024 , DOI: 10.15888/j.cnki.csa.009777
    Abstract:
    In response to challenges faced in crowd counting, such as non-uniform head sizes, uneven crowd density distribution, and complex background interference, a convolutional neural network (CNN) model (multi-scale feature weighted fusion attention convolutional neural network, MSFANet) that focuses on crowd regions and addresses multi-scale changes is proposed. The front end of the network adopts an improved VGG-16 model to perform the first step of coarse-grained feature extraction on the input crowd image. A multi-scale feature extraction module is added in the middle to extract the multi-scale feature information of the image. Then, an attention module is added to weigh the multi-scale features. At the back end, a sawtooth shaped dilated convolution module is adopted to increase the receptive field, extract the detailed features of the image, and generate high-quality crowd density maps. Experiments on this model are conducted on three public datasets. The results show that on the Shanghai Tech Part B dataset, the mean absolute error (MAE) is reduced to 7.8, and the mean squared error (MSE) decreases to 12.5. On the Shanghai Tech Part A dataset, the MAE is reduced to 64.9, and the MSE decreases to 108.4. On the UCF_CC_50 dataset, the MAE is reduced to 185.1, and the MSE decreases to 249.8. These experimental results affirm that the proposed model exhibits strong accuracy and robustness.
    Available online:  December 09, 2024 , DOI: 10.15888/j.cnki.csa.009780
    Abstract:
    Aiming at the existing image dehazing algorithms which still have problems such as incomplete dehazing, blurred edges of dehazed images, and detail information loss, this study presents an image dehazing algorithm based on Transformer and gated fusion mechanism. Global features of the image are extracted by the improved channel self-attention mechanism to improve the efficiency of the model in processing images. A multi-scale gated fusion block is designed to capture features of different scales. The gated fusion mechanism improves the adaptability of the model to different degrees of dehazing by dynamically adjusting weights while better preserving the image edges and detail information. Residual connections are used to enhance the reusability of features and improve the generalization ability of the model. Experimental verification shows that the proposed dehazing algorithm can effectively restore the content information in real hazy images. On the synthesized hazy image dataset SOTS, the peak signal-to-noise ratio reaches 34.841 dB, and the structural similarity reaches 0.984. The dehazed image has complete content information without blurred detail information and incomplete dehazing.
    Available online:  December 09, 2024 , DOI: 10.15888/j.cnki.csa.009784
    Abstract:
    Faced with insufficient labeled data in the field of video quality assessment, researchers begin to turn to self-supervised learning methods, aiming to learn video quality assessment models with the help of large amounts of unlabeled data. However, existing self-supervised learning methods primarily focus on video distortion types and content information, while ignoring dynamic information and spatiotemporal features of videos changing over time. This leads to unsatisfactory evaluation performance in complex dynamic scenes. To address these issues, a new self-supervised learning method is proposed. By taking playback speed prediction as an auxiliary pretraining task, the model can better capture dynamic changes and spatiotemporal features of videos. Combined with distortion type prediction and contrastive learning, the model’s sensitivity to video quality differences is enhanced. At the same time, to more comprehensively capture the spatiotemporal features of videos, a multi-scale spatiotemporal feature extraction module is further designed to enhance the model’s spatiotemporal modeling capability. Experimental results demonstrate that the proposed method significantly outperforms existing self-supervised learning-based approaches on the LIVE, CSIQ, and LIVE-VQC datasets. On the LIVE-VQC dataset, the proposed method achieves an average improvement of 7.90% and a maximum improvement of 17.70% in the PLCC index. Similarly, it also shows considerable competitiveness on the KoNViD-1k dataset. These results indicate that the proposed self-supervised learning framework effectively enhances the dynamic feature capture ability of the video quality assessment model and exhibits unique advantages in processing complex dynamic videos.
    Available online:  December 06, 2024 , DOI: 10.15888/j.cnki.csa.009747
    Abstract:
    Existing super-resolution reconstruction methods based on convolutional neural networks are limited by their receptive fields, which makes it difficult to fully utilize the rich contextual information and auto-correlation in remote sensing images, resulting in suboptimal reconstruction performance. To address this issue, this study proposes a novel network, termed MDT, a remote sensing image super-resolution rebuilding method based on multi-distillation and Transformer. Firstly, the network combines multiple distillations with a dual attention mechanism to progressively extract multi-scale features from low-resolution images, thereby reducing feature loss. Next, a convolutional modulation-based Transformer is constructed to capture global information in the images, recovering more complex texture details and enhancing the visual quality of the reconstructed images. Finally, a global residual path is added during upsampling to improve the propagation efficiency of features within the network, effectively reducing image distortion and artifacts. Experiments conducted on the AID and UCMerced datasets demonstrate that the proposed method achieves a peak signal-to-noise ratio (PSNR) and a peak structural similarity index (SSIM) of 29.10 dB and 0.7807, respectively, on ×4 super-resolution tasks. The quality of the reconstructed images is significantly improved, with better visual effects in terms of detail preservation.
    Available online:  December 06, 2024 , DOI: 10.15888/j.cnki.csa.009748
    Abstract:
    In computation-intensive and latency-sensitive tasks, unmanned aerial vehicle (UAV)-assisted mobile edge computing has been extensively studied due to its high mobility and low deployment costs. However, the energy consumption of UAVs limits their ability to work for extended periods, and there are often dependencies among different modules within offloading tasks. To address these issues, directed acyclic graph (DAG) is utilized to model the dependencies among internal modules of tasks. Considering the impacts of system latency and energy consumption, an optimal offloading strategy is derived to minimize system costs. To achieve optimization, a binary grey wolf optimization algorithm based on subpopulation, Gaussian mutation, and reverse learning (BGWOSGR) is proposed. Simulation results show that the proposed algorithm reduces system costs by around 19%, 27%, 16%, and 13% compared to four other methods, with a faster convergence speed.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009738
    Abstract:
    Existing methods for detecting atmospheric visibility are easily influenced by subjective factors and equipment complexity. To address this issue, this study proposes a new algorithm for estimating atmospheric visibility based on image processing. First, combined with the dark channel prior theory, a method for estimating global atmospheric light values, based on the difference between image brightness and saturation, is introduced to obtain the atmospheric transmittance. Next, curvature filtering is used to refine the transmittance. Then, atmospheric visibility is estimated through the lane line detection technology and the extinction coefficient. Finally, a visibility correction model based on a linear regression equation is established to correct the estimated atmospheric visibility. Experimental results show that the proposed algorithm is accurate and practical for visibility estimation in traffic monitoring scenes in foggy weather.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009740
    Abstract:
    Algorithms for the instance segmentation of urban street scenes can significantly improve the accuracy and efficiency of urban environment perception and intelligent transportation system. To address mutual occlusions between pedestrians and vehicles and significant background interference in urban street scenes, this study proposes an instance segmentation model, FMInst, based on a frequency attention mechanism and multi-scale feature fusion. Firstly, a high and low-frequency attention mechanism is constructed for interactive coding to increase high-resolution detail information. Secondly, a soft pooling operation is introduced into the Patch Merging layer of the Swin Transformer backbone network to reduce the loss of feature information and effectively improve the segmentation of small-scale targets. Finally, an MLP layer is combined to construct multi-scale deep convolution, which effectively enhances the extraction of local information and improves the segmentation accuracy. Comparison experiments conducted on the public dataset Cityscapes show that FMInst reaches an mAP of 35.6%, with an improvement of 1.2%, and an AP50 of 61.4%, with an improvement of 2.2%. The mask quality and the segmentation effect of the instance segmentation are greatly improved.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009741
    Abstract:
    Narrowing the difference between modalities is always challenging in cross-modal person re-identification from images to texts. To address this challenge, this study proposes an improved method based on contrastive language-image pretraining-person re-identification (CLIP-ReID) by integrating a context adjustment network module and a cross-modal attention mechanism module. The former module performs a deep nonlinear transformation on image features and effectively combines with learnable context vectors to enhance the semantic relevance between images and texts. The latter module dynamically weights and fuses features from images and texts so that the model can take into account the other modality when processing the information of one modality, improving the interaction between different modalities. The method is evaluated on three public datasets. Experimental results show that the mAP on the MSMT17 dataset is increased by 2.2 % and R1 is increased by 1.1 %. On the Market1501 dataset, there is a 0.5% increase in mAP and a 0.1% rise in R1. The DukeMTMC dataset sees a 0.4% enhancement in mAP and a 1.2% increase in R1. The results show that the proposed method effectively improves the accuracy of person re-identification.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009754
    Abstract:
    Considering the unique domain-specific information inherent in software requirement texts, as well as the important contextual relationships and inherent ambiguities they contain, this study proposes a model that integrates graph convolutional network (GCN) with BERT for automatic software requirements classification, named BERT-FGCN (BERT-FusionGCN). This model leverages the advantages of GCN in propagating information and aggregating features from neighboring nodes to capture the contextual relationships between words or sentences in requirement statements, thereby improving the classification results. Initially, a text co-occurrence graph and a dependency syntax graph of requirement texts are constructed. These graphs are then fused to capture the structural information of the sentences. The GCN is then employed to perform convolution on the graph structure of the modeled requirement statements to obtain graph vectors. Finally, these graph vectors are fused with the vectors obtained from BERT feature extraction to achieve automatic classification of software requirement texts. Experiments conducted on the PROMISE dataset demonstrate that BERT-FGCN achieves an F1-score of 95% in binary classification, and increases the F1-score by 2% in multi-class classification tasks.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009757
    Abstract:
    The uncertain execution order of asynchronous messages in Android applications is the main reason for their flakiness. Most existing flaky test studies trigger instability testing by randomly determining the execution order of asynchronous messages, which is ineffective and inefficient. This study proposes a concurrent flaky test detection based on the Happens-Before (HB) relationship for Android applications. After analyzing the HB relationship between asynchronous messages in the execution trace of Android application test cases, the proposed method determines the asynchronous message workscope. Then, it designs a scheduling strategy with maximum differentiation to determine the asynchronous message execution order under guidance to maximize the difference between the asynchronous message execution order and the original test execution trace on the test execution trace after scheduling. Then, the method tries to change test execution results to detect flakiness in the test. For effectiveness verification of the method, experiments are conducted on 50 test cases of 40 Android applications, and the experimental results show that the method can detect all the flaky tests, improving the detection effect by 6% and shortening the average detection time by 31.78% compared with the current state-of-the-art techniques.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009771
    Abstract:
    There are two problems in existing hierarchical text classification model: underutilization of the label information across hierarchical instances, and lack of handling unbalanced label distribution. To solve these problems, this study proposes a hierarchical text classification method for label co-occurrence and long-tail distribution (LC-LTD) to study the global semantic of text based on shared labels and balanced loss function for long-tail distribution. First, a contrastive learning objective based on shared labels is devised to narrow the semantic distance between text representations with more shared labels in feature space and to guide the model to generate discriminative semantic representations. Second, the distribution balanced loss function is introduced to replace binary cross-entropy loss to alleviate the long-tail distribution problem inherent in hierarchical classification, improving the generalization ability of the model. LC-LTD is compared with various mainstream models on WOS and BGC public datasets, and the results show that the proposed method achieves better classification performance and is more suitable for hierarchical text classification.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009772
    Abstract:
    Image steganalysis aims to detect whether an image undergoes steganography processing and thus carries secret information. Steganalysis algorithm based on Siamese networks determines whether an image carries secret information by calculating the dissimilarity between the left and right partitions of the image to be detected. This approach currently boasts relatively high accuracy among deep learning image steganalysis algorithms. However, Siamese network-based image steganalysis algorithms still have certain limitations. First, the convolutional blocks stacked in the preprocessing and feature extraction layers of the Siamese network overlook the issue of steganographic signals easily being lost as they are transmitted from shallow to deep layers. Second, SRM filters used in existing Siamese networks still employ high-pass filters from other networks to suppress image content, ignoring single-sized generated residual maps. To address the above problems, this study proposes a Siamese network image steganalysis method based on enhanced residual features. The proposed method designs an attention-based inverted residual module. By adding the attention-based inverted residual module after the convolutional blocks in the preprocessing and feature extraction layers, it reuses image features, introduces an attention mechanism, and enables the network to assign more weights to feature maps of complex-textured image regions. Meanwhile, to better suppress image content, a multi-scale filter is proposed, adjusting the residual types to operate with convolutional kernels of different sizes, thereby enriching residual features. Experimental results show that the proposed attention-based inverted residual module and multi-scale filter provide better classification performance compared to existing methods.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009761
    Abstract:
    Distributed storage systems achieve high-reliability and low-overhead data storage by erasure code. To provide different reliability and access performance, storage systems need to perform redundancy transitions on erasure code data by changing coding parameters. The stripe merging mechanism provides a way for redundancy transitioning in storage systems. However, the stripe merging process based on traditional erasure code can result in a large amount of data block redistribution and checksum block re-computation I/O overhead. Worst still, the I/Os will be amplified in multiple merging operations. In response to these issues, this study proposes new Tree Reed-Solomon (TRS) codes that eliminate data block redistribution I/Os by decentralizing data blocks, and save checksum block re-computation I/Os by designing coding matrices. TRS codes further design storage units to organize the stripes taking part in merging into a tree, enabling multiple merging operations to be efficiently completed from bottom to top based on tree structure. To test the performance of TRS codes, this study designs and implements a distributed storage prototype. Experiments have shown that compared to other erasure codes, TRS codes can greatly reduce stripe merging operation time.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009724
    Abstract:
    Existing methods fail to effectively leverage check-in information to provide precise location recommendation services. To address this problem, this study introduces a novel model for the next point-of-interest (POI) recommendation based on dual-granularity sequence fusion. Firstly, the model integrates fine-grained spatio-temporal sequence information with naturally occurring coarse-grained categorical sequence information in real life. It effectively captures long-term dependency relationships using gated recurrent units to enrich the context of check-ins. Subsequently, the model uses the extracted information to transform the “hard” segmentation of long sequences into a “soft” segmentation, enabling the extraction of complete semantic information from local sub-sequences. Finally, the recommendation model aggregates salient information from each local sub-sequence. Experimental results on the Foursquare and Gowalla datasets show that the proposed model improves the recall by 9.07% and 9.37%, respectively, and enhances the normalized discounted cumulative gain by 9.72% and 10.24%, respectively. These results indicate that the proposed model exhibits superior recommendation performance.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009735
    Abstract:
    Considering the balance among economic, environmental, and social benefits in ride-hailing operations, this study proposes a multi-objective schedule model that balances these three benefits, as well as an algorithm based on dynamic space programming. The model integrates traditional taxi services and shared transport for the first time, comprehensively covering four different interaction scenarios between drivers and passengers, to achieve synergistic improvement of the three benefits through optimization strategies. The algorithm creatively combines the lapjv algorithm and the branch and bound method to ensure that the optimal matching strategy satisfying multi-objective optimization can be efficiently explored and determined under the given threshold constraints. Compared with SCIP, the average error of the algorithm is within 4%, and the average solving speed is improved by 99.1%. This study systematically applies this algorithm to solve and generate Pareto frontier graphs for different threshold constraints, intuitively displaying the trade-offs and changing trends of one of the three objectives (economic, environmental, and social benefits) under the constraints of the other two objectives. This study provides a decision-making basis for ride-hailing operations.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009736
    Abstract:
    Atmospheric fog and aerosols can significantly reduce visibility and distort colors in images, bringing great difficulties to advanced image recognition. Existing image dehazing algorithms often face problems such as excessive enhancement, loss of details, and insufficient dehazing. To avoid excessive enhancement and insufficient dehazing, this study proposes an image dehazing algorithm based on frequency and attention mechanisms. The algorithm adopts an encoder-decoder structure and constructs a dual-branch frequency extraction module to obtain both global and local high and low-frequency information. A frequency fusion module is then constructed to adjust the weight proportions of the high and low-frequency information. To optimize the dehazing effect, the algorithm introduces an additional channel-pixel module and a channel-pixel attention module during down sampling. Experimental results show that FANet achieves a PSNR of 40.07 dB and an SSIM of 0.9958 on the SOTS-indoor dataset, and a PSNR of 39.77 dB and an SSIM of 0.9958 on the SOTS-outdoor dataset. The proposed algorithm also achieves good results on the HSTS and Haze4k test sets. It effectively alleviates color distortion and incomplete dehazing compared with other dehazing algorithms.
    Available online:  November 25, 2024 , DOI: 10.15888/j.cnki.csa.009731
    Abstract:
    The rise of large language models has profoundly impacted natural language processing. With the growth of computational resources and the expansion of model sizes, the potential applications of large language models in natural language processing are increasingly evident. However, the widely used low-rank adaptation (LoRA) method faces challenges related to fine-tuning efficiency and storage costs as model sizes increase. To address this issue, this study proposes a singular value decomposition-based adaptation fine-tuning method. This method only requires the diagonal matrix and scaling vector obtained from singular value decomposition to be trainable parameters, achieving performance improvement in multiple natural language processing tasks while reducing training costs. Experimental results show that the proposed method outperforms other methods of the same order of magnitude in GLUE and E2E benchmark tests. Compared with commonly used parameter-efficient fine-tuning methods, it demonstrates significant advantages in reducing the number of trainable parameters and improving fine-tuning efficiency, achieving the highest performance gains in experiments on the fine-tuning efficiency of trainable parameters. Future research will focus on optimizing the proposed method to achieve more efficient fine-tuning in a wider range of tasks and larger-scale models.
    Available online:  November 25, 2024 , DOI: 10.15888/j.cnki.csa.009733
    Abstract:
    The research on the classification and identification of microscopic residual oil occurrence states plays a vital role in residual oil exploitation and is of great significance for improving oil field recovery. In recent years, a large number of studies in this field have promoted the development of technologies for identifying microscopic residual oil by introducing deep learning. However, deep learning has not yet established a unified framework for microscopic residual oil identification, nor has it formed a standardized operation process. To guide future research, this study reviews existing methods for identifying residual oil and introduces the identification technologies for microscopic residual oil based on machine vision from several aspects, including image acquisition and classification standards, image processing, and residual oil identification methods. Residual oil identification methods are categorized into traditional and deep learning-based methods. The traditional methods are further divided into those based on manual feature extraction and those based on machine learning classification. The deep learning-based methods are divided into single-stage and two-stage methods. Detailed summaries are provided for data enhancement, pre-training, image segmentation, and image classification. Finally, this study discusses the challenges of applying deep learning to microscopic residual oil identification and explores future development trends.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009729
    Abstract:
    Cigarette laser code recognition is an important tool for tobacco inspection. This study proposes a method for recognizing cigarette codes based on a dual-state asymmetric network. Insufficient training on samples of distorted cigarette codes leads to the weak generalization ability of the model. To address this issue, a nonlinear local augmentation (NLA) method is designed, which generates effective training samples with distortion to enhance the generalization ability of the model through spatial transformation using controllable datums at the edges of cigarette codes. To address the problem of low recognition accuracy due to the similarity between cigarette codes and their background patterns, a dual-state asymmetric network (DSANet) is proposed, which divides the convolutional layers of the CRNN into training and deployment modes. The training mode enhances the key feature extraction capability of the model by introducing asymmetric convolution for optimizing feature weight distribution. For real-time performance, the deployment mode designs BN fusion and branch fusion methods. By calculating fusion weights and initializing convolutional kernels, convolutional layers are equivalently converted back to their original structures, which reduces user-side inference time. Finally, a self-attention mechanism is introduced into the loop layer to enhance the extraction capability of the model for cigarette code features by dynamically adjusting the weights of sequence features. Comparative experiments show that this method has higher recognition accuracy and speed, with the recognition accuracy reaching 87.34%.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009730
    Abstract:
    A lesion of the sacroiliac joint is one of the primary signs for the early warning of ankylosing spondylitis. Accurate and efficient automatic segmentation of the sacroiliac joint is crucial for assisting doctors in clinical diagnosis and treatment. The limitations in feature extraction in sacroiliac joint CT images, due to diverse gray levels, complex backgrounds, and volume effects resulting from the narrow sacroiliac joint gap, hinder the improvement of segmentation accuracy. To address these problems, this study proposes the first U-shaped network for sacroiliac joint segmentation diagnosis, utilizing the concept of hierarchical cascade compensation for downsampling information loss and parallel attention preservation of cross-dimensional information features. Moreover, to enhance the efficiency of clinical diagnosis, the traditional convolutions in the U-shaped network are replaced with efficient partial convolution blocks. The experiment, conducted on a sacroiliac joint CT dataset provided by Shanxi Bethune Hospital, validates the effectiveness of the proposed network in balancing segmentation accuracy and efficiency. The network achieves a DICE value of 91.52% and an IoU of 84.41%. The results indicate that the improved U-shaped segmentation network effectively enhances the accuracy of sacroiliac joint segmentation and reduces the workload of medical professionals.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009700
    Abstract:
    With the rapid development and application of Artificial Intelligence and the Internet of Things (AIoT), new challenges are posed to the network’s useful life, reliability, and coverage. The current wireless sensor network (WSN) consists of a large number of self-organizing sensor nodes deployed in monitoring areas, exhibiting advantages such as low cost, energy efficiency, self-organization, and large-scale deployment. However, how to further extend the network life and enhance the coverage reliability of wireless sensor networks remains a primary challenge in current research. To address these challenges, a coverage reliability assessment model is proposed by integrating the backbone network with coverage models, collaborative sensing of sensor nodes, and spatial correlation. Subsequently, a coverage reliability optimization algorithm based on the confident information coverage model is proposed. On one hand, the algorithm utilizes the confident information coverage model to ensure collaborative sensing of data, enhancing network service quality. On the other hand, it employs backbone network optimization for routing to conserve energy consumption. Furthermore, to validate the superiority of the proposed algorithm, sensor multi-states, and coverage rate are taken as evaluation metrics, with RMSE threshold and energy consumption as performance indicators. The proposed algorithm is compared with ACR and CICR algorithms. Finally, a verification model is built on Matlab simulation software. Simulation results demonstrate that the proposed algorithm significantly improves coverage reliability.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009709
    Abstract:
    Acute ischemic stroke is the most common type of stroke in clinical practice. Due to its sudden onset and short treatment time window, it becomes one of the important factors leading to disability and death world wide. With the rapid development of artificial intelligence, deep learning technology shows great potential in the diagnosis and treatment of acute ischemic stroke. Deep learning models can quickly and efficiently segment and detect lesions based on patients’ brain images. This study introduces the development history of deep learning models and commonly used public datasets for stroke research. For various modalities and scanning sequences derived from computerized tomography (CT) and magnetic resonance imaging (MRI), it elaborates on the research progress of deep learning technology in the field of lesion segmentation and detection in acute ischemic stroke and summarizes and analyzes the improvement ideas of related research. Finally, it points out existing challenges of deep learning in this field and proposes possible solutions.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009721
    Abstract:
    Existing few-shot relational triple extraction methods often struggle with handling multiple triples in a single sentence and fail to consider the semantic similarity between the support set and the query set. To address these issues, this study proposes a few-shot relational triple extraction method based on module transfer and semantic similarity inference. The method uses a mechanism that constantly transfers among three modules, namely relation extraction, entity recognition, and triple discrimination, to extract multiple relational triples efficiently from a query instance. In the relation extraction module, BiLSTM and a self-attention mechanism are integrated to better capture the sequence information of the emergency plan text. In addition, a method based on semantic similarity inference is designed to recognize emergency organizational entities in sentences. Finally, extensive experiments are conducted on ERPs+, a dataset for emergency response plans. Experimental results show that the proposed model is more suitable for relational triple extraction in the field of emergency plans compared with other baseline models.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009722
    Abstract:
    In the current electricity market, the volume of daily spot market clearing data has reached millions or tens of millions. With the increase in trading activities and the complexity of the market structure, ensuring the integrity, transparency, and traceability of trading data has become a key issue to be studied in the field of market clearing in China. Therefore, this study proposes a data provenance method for power market clearing based on the PROV model and smart contracts, aiming to automate the storage and updating of provenance information through smart contracts to improve the transparency of the clearing process and the trust of the participants. The proposed method utilizes the elements of entities, activities, and agents in the PROV model, combined with the hierarchical storage and immutability of blockchain technology, to record and track trading activities and rule changes in the electricity market. The method not only enhances data transparency and trust among market participants but also optimizes data management and storage strategies, reducing operational costs. In addition, the method provides proof of compliance for power market clearing, helping market participants meet increasing regulatory requirements.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009723
    Abstract:
    In recent years, with the development of deep learning techniques, convolutional neural network (CNN) and Transformers have made significant progress in image super-resolution. However, for the extraction of global features of an image, it is common to stack individual operators and repeat the computation to gradually expand the receptive field. To better utilize global information, this study proposes that local, regional, and global features should be explicitly modeled. Specifically, local information, regional-local information, and global-regional information of an image are extracted and fused hierarchically and progressively through channel attention-enhanced convolution, a dual-branch parallel architecture consisting of a window-based Transformer and CNN, and a dual-branch parallel architecture consisting of a standard Transformer and a window-based Transformer. In addition, a hierarchical feature fusion method is designed to fuse the local information extracted from the CNN branch and the regional information extracted from the window-based Transformer. Extensive experiments show that the proposed network achieves better results in lightweight SR. For example, in the 4× upscaling experiments on the Manga109 dataset, the peak signal-to-noise ratio (PSNR) of the proposed network is improved by 0.51 dB compared to SwinIR.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009758
    Abstract:
    In autonomous driving, the task of using bird’s eye view (BEV) for 3D object detection has attracted significant attention. Existing camera-to-BEV transformation methods are facing challenges of insufficient real-time performance and high deployment complexity. To address these issues, this study proposes a simple and efficient view transformation method that can be deployed without any special engineering operations. First, to address the redundancy in complete image features, a width feature extractor is introduced and supplemented by a monocular 3D detection task to refine the key features of the image. In this way, the minimal information loss in the process can be ensured. Second, a feature-guided polar coordinate positional encoding method is proposed to enhance the mapping relationship between the camera view and the BEV representation, as well as the spatial understanding of the model. Lastly, the study has achieved the interaction between learnable BEV embeddings and width image features through a single-layer cross-attention mechanism, thus generating high-quality BEV features. Experimental results show that, compared to lift, splat, shoot (LSS), on the nuScenes validation set, this network structure improves mAP from 29.5% to 32.0%, an increase of 8.5%, and NDS from 37.1% to 38.0%, an increase of 2.4%. This demonstrates the effectiveness of the model in 3D object detection tasks in autonomous driving scenarios. Additionally, compared to LSS, it reduces latency by 41.12%.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009759
    Abstract:
    This study introduces a knee cartilage segmentation method based on semi-supervised learning and conditional probability, to address the scarcity and quality issues of annotated samples in medical image segmentation. As it is difficult for existing embedded deep learning models to effectively model the hierarchical relationships among network outputs, the study proposes an approach combining conditional-to-unconditional mixed training and task-level consistency. In this way, the hierarchical relationships and relevance among labels are efficiently utilized, and the segmentation accuracy is enhanced. Specifically, the study employs a dual-task deep network predicting both pixel-level segmentation images and geometric perception level set representations of the target. The level set is shifted into an approximate segmentation map through a differentiable task transformation layer. Meanwhile, the study also introduces task-level consistency regularization between level line-based and directly predicted segmentation maps on labeled and unlabeled data. Extensive experiments on two public datasets demonstrate that this approach can significantly improve performance through the incorporation of unlabeled data.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009760
    Abstract:
    Deformable 3D medical image registration remains challenging due to irregular deformations of human organs. This study proposes a multi-scale deformable 3D medical image registration method based on Transformer. Firstly, the method adopts a multi-scale strategy to realize multi-level connections to capture different levels of information. Self-attention mechanism is employed to extract global features, and dilated convolution is used to capture broader context information and more detailed local features, so as to enhance the registration network’s fusion capacity for global and local features. Secondly, according to the sparse prior of the image gradient, the normalized total gradient is introduced as a loss function, effectively reducing the interference of noise and artifacts on the registration process, and better adapting to different modes of medical images. The performance of the proposed method is evaluated on publicly available brain MRI datasets (OASIS and LPBA). The results show that the proposed method can not only maintain the advantages of the learning-based method in run-time but also well performs in mean square error and structural similarity. In addition, ablation experiment results further prove the validity of the method and normalized total gradient loss function design proposed in this study.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009755
    Abstract:
    Unmanned aerial vehicle (UAV) is equipped with an edge server to constitute a mobile edge server. It can provide computing services for user equipment (UE) in some scenarios where base stations are difficult to deploy. With the help of deep reinforcement learning to train the intelligent body, it can formulate reasonable offloading decisions in a continuous and complex state space. It can also offload partial computing-intensive missions produced by users to edge servers for execution, thus improving the working and responding time of the system. However, at the moment, the fully connected neural networks used by the deep reinforcement learning algorithm are unable to handle the time-series data in the scenarios of UAV-assisted mobile edge computing (MEC). In addition, the training efficiency of the algorithm is low, and the decision-making performance is poor. To address the above problems, this study proposes a twin delayed deep deterministic policy gradient algorithm based on long short term memory (LSTM-TD3), using LSTM to improve the Actor-Critic network structure of the TD3 algorithm. In this way, the network is divided into three parts: the memory extraction unit containing LSTM, the current feature extraction unit, and the perceptual integration unit. Besides, the sample data in the experience pool are improved, and the historical data are defined, which provides the memory extraction unit with a better training effect. Simulation results show that, compared with the AC algorithm, the DQN algorithm, and the DDPG algorithm, the LSTM-TD3 algorithm has the best performance when optimizing the offloading strategy with the minimum total delay of the system as the target.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009749
    Abstract:
    This study proposes a lightweight apple detection algorithm based on an improved YOLOv8n model for apple fruit recognition in natural orchard environments. Firstly, the study uses a combination of DSConv and FEM feature extraction modules to replace some regular convolutions in the backbone network for lightweight improvements. In this way, the floating-point numbers and computational quantity during the convolution process can be reduced. To maintain performance during the lightweight process, a structured state space model is introduced to construct the CBAMamba module, which efficiently processes features through the Mamba structure, during the feature processing procedure. Subsequently, the convolutions at the detecting head are replaced with RepConv and the convolution layer is reduced. Finally, the bounding box loss function is changed to the dynamic non-monotonic focusing mechanism WIoU to accelerate model convergence and further enhance model detection performance. The experiments show that, on the public dataset, the improved YOLOv8 algorithm outperforms the original YOLOv8n algorithm by 1.6% in mAP@0.5 and 1.2% in mAP@0.5:0.95. Meanwhile, it also increases FPS by 8.0% and reduces model parameters by 13.3%. The lightweight design makes it highly practical in robotics and embedded system deployment fields.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009750
    Abstract:
    Aiming at the problems that mechanical equipment signals in actual operation are susceptible to noise interference, making it difficult to accurately extract fault features, and that the information from a single position of the equipment cannot fully reflect operational status, this study proposes an improved spatio-temporal fault classification method of signal adaptive decomposition and multi-source data fusion. Firstly, an improved signal adaptive decomposition algorithm named signal adaptive variational mode decomposition (SAVMD) is proposed, and a weighted kurtosis sparsity index named weighted kurtosis sparsity (WKS) is constructed to filter out intrinsic mode function (IMF) components rich in feature information for signal reconstruction. Secondly, multi-source data from different position sensors are fused, and the data set obtained by periodic sampling is used as the input of the model. Finally, a spatio-temporal fault classification model is built to process multi-source data, which reduces noise interference through an improved sparse self-attention mechanism and effectively processes time step and spatial channel information by using a dual-encoder mechanism. Experiments on three public mechanical equipment fault datasets achieve average accuracy rates of 99.1%, 98.5%, and 99.4% respectively. Compared with other fault classification methods, it has better performance, good adaptability and robustness, and provides a feasible method for fault diagnosis of mechanical equipment.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009782
    Abstract:
    Prompt engineering plays a crucial role in unlocking the potential of large language model. This method guides the model’s response by designing prompt instructions to ensure the relevance, coherence, and accuracy of the response. Prompt engineering does not require fine-tuning model parameters and can be seamlessly connected with downstream tasks. Therefore, various prompt engineering techniques have become a research hotspot in recent years. Accordingly, this study introduces the key steps for creating effective prompts, summarizes basic and advanced prompt engineering techniques, such as chain of thought and tree of thought, and deeply explores the advantages and limitations of each method. At the same time, it discusses how to evaluate the effectiveness of prompt methods from different perspectives and using different methods. The rapid development of these technologies enables large language models to succeed in a variety of applications, ranging from education and healthcare to code generation. Finally, future research directions of prompt engineering technology are prospected.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009744
    Abstract:
    In the field of visual tracking, most deep learning-based trackers overemphasize accuracy while overlooking efficiency, thereby hindering their deployment on mobile platforms such as drones. In this study, a deep cross guidance Siamese network (SiamDCG) is put forward. To better deploy on edge computing devices, a unique backbone structure based on MobileNetV3-small is devised. Given the complexity of drone scenarios, the traditional method of regressing target boxes using Dirac δ distribution has significant drawbacks. To overcome the blurring effects inherent in bounding boxes, the regression branch is converted into predicting offset distribution, and the learned distribution is used to guide classification accuracy. Excellent performances on multiple aerial tracking benchmarks demonstrate the proposed approach’s robustness and efficiency. On an Intel i5 12th generation CPU, SiamDCG runs 167 times faster than SiamRPN++, while using 98 times fewer parameters and 410 times fewer FLOPs.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009742
    Abstract:
    Embodied AI requires the ability to interact with and perceive the environment, and capabilities such as autonomous planning, decision making, and action taking. Behavior trees (BTs) become a widely used approach in robotics due to their modularity and efficient control. However, existing behavior tree generation techniques still face certain challenges when dealing with complex tasks. These methods typically rely on domain expertise and have a limited capacity to generate behavior trees. In addition, many existing methods have language comprehension deficiencies or are theoretically unable to guarantee the success of the behavior tree, leading to difficulties in practical robotic applications. In this study, a new method for automatic behavior tree generation is proposed, which generates an initial behavior tree with task goals based on large language models (LLMs) and scene semantic perception. The method in this study designs robot action primitives and related condition nodes based on the robot’s capabilities. It then uses these to design prompts to make the LLMs output a behavior plan (generated plan), which is then transformed into an initial behavior tree. Although this paper takes this as an example, the method has wide applicability and can be applied to other types of robotic tasks according to different needs. Meanwhile, this study applies this method to robot tasks and gives specific implementation methods and examples. During the process of the robot performing a task, the behavior tree can be dynamically updated in response to the robot’s operation errors and environmental changes and has a certain degree of robustness to changes in the external environment. In this study, the first validation experiments on behavior tree generation are carried out and verified in the simulated robot environment, which demonstrates the effectiveness of the proposed method.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009739
    Abstract:
    To solve the vehicle routing problem with time windows (VRPTW), this study establishes a mixed-integer programming model aimed at minimizing total distance and proposes a hybrid ant colony optimization algorithm with relaxed time window constraints. Firstly, an improved ant colony algorithm, combined with TSP-Split encoding and decoding, is proposed to construct a routing solution that allows time-window constraints to be violated, to improve the global optimization ability of the algorithm. Then, a repair strategy based on variable neighborhood search is proposed to repair infeasible solutions using the principle of return in time and the penalty function method. Finally, 56 Solomon and 12 Homberger benchmark instances are tested. The results show that the proposed algorithm is superior to the comparative algorithms from references. The known optimal solution can be obtained in 50 instances, and quasi-optimal solutions can be obtained in the remaining instances within acceptable computing time. The results prove the effectiveness of the proposed algorithm.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009717
    Abstract:
    The piecewise linear representation algorithm of the time series represents the whole series with fewer points according to trend changes in the series. However, most of these algorithms focus on the information of local sequence points and rarely pay attention to global data. Some algorithms only focus on fitting on datasets instead of being applied to classification. To solve these problems, this study proposes an algorithm for extracting trend features from time series based on angle key points and inflection points. The algorithm selects angle key points according to the angle change values of the sequence data and then extracts inflection points based on these key points. It determines whether interpolation is needed according to segmentation requirements, so as to obtain a segmentation sequence meeting the requirements. Fitting and classification experiments are conducted on simulated data and 40 public datasets. Experimental results show that the proposed algorithm exhibits better fitting on the simulated data, compared with other algorithms such as piecewise aggregate approximation (PAA), the TD algorithm, the BU algorithm, the FFTO algorithm based on inflection points, the Trend algorithm based on turning points and trend segments, and the ITTP algorithm based on trend turning points. On the UCR public datasets, the proposed algorithm achieves an average fitting error of 1.165. Its classification accuracy is 2.8% higher than the DTW-1NN algorithm published by Keogh.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009719
    Abstract:
    The temperature in knowledge distillation (KD) is set as a fixed value during the distillation process in most previous work. However, when the temperature is reexamined, it is found that the fixed temperature restricts inherent knowledge utilization in each sample. This study divides the dataset into low-energy and high-energy samples based on energy scores. Through experiments, it is confirmed that the confidence score of low-energy samples is high, indicating that predictions are deterministic, while the confidence score of high-energy samples is low, indicating that predictions are uncertain. To extract the best knowledge by adjusting non-target class predictions, this study applies higher temperatures to low-energy samples to generate smoother distributions and applies lower temperatures to high-energy samples to obtain clearer distributions. In addition, to address the imbalanced dependence of students on prominent features and their neglect of dark knowledge, this study introduces entropy-reweighted knowledge distillation, which utilizes the entropy predicted by teachers to reweight the energy distillation loss on a sample basis. This method can be easily applied to other logic-based knowledge distillation methods and achieve better performance, which can be closer or even better than feature-based methods. This study conducts extensive experiments on image classification datasets (CIFAR-100, ImageNet) to validate the effectiveness of this method.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009703
    Abstract:
    Existing methods for binary fuzzing are difficult to dive into programs to find vulnerabilities. To address this problem, this study proposes a multi-angle optimization method integrating hardware-assisted program tracing, static analysis, and concolic execution. Firstly, static analysis and hardware-assisted tracing are used to calculate program path complexity and execution probability. Then, seed selection and mutation energy allocation are performed according to the path complexity and execution probability. Meanwhile, concolic execution is leveraged to assist seed generation and record key bytes for targeted variations. Experimental results show that this method finds more program paths as well as crashes in most cases, compared to other fuzzing methods.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009686
    Abstract:
    Traditional object detection algorithms often face challenges such as poor detection performance and low detection efficiency. To address these problems, this study proposes a method for detecting small objects based on an improved YOLOv7 network. This method adds more paths to the efficient layer aggregation module (ELAN) of the original network and effectively integrates the feature information from different paths before introducing the selective kernel network (SKNet). This allows the model to pay more attention to features of different scales in the network and extract more useful information. To enhance the model’s perception of spatial information for small objects, an eSE module is designed and connected to the end of ELAN, thus forming a new efficient layer aggregation network module (EF-ELAN). This module preserves image feature information more completely and improves the generalization ability of the network. Additionally, a cross stage-adaptively spatial feature fusion module (CS-ASFF) is designed to address the issue of inconsistent feature scales in small object detection. This module is improved based on the ASFF network and the Nest connection method. It extracts weights through operations such as convolution and pooling on each image of the feature pyramid, applies the feature information to a specific layer, and utilizes other feature layers to enhance the network’s feature processing capabilities. Experimental results show that the proposed algorithm improves the average precision rate by 1.5% and 2.1% on the DIOR and DOTA datasets, respectively, validating its effectiveness in enhancing the detection performance of small objects.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009725
    Abstract:
    When firefighting robots are deployed for medium to long-distance emergency tasks in urban areas, they often struggle with the inability to obtain a global prior map of the environment in advance. Consequently, they require manual remote control to reach the fire location, which involves cumbersome operations and significantly reduces firefighting efficiency. To address these issues, this study designs a new autonomous navigation system for firefighting robots in urban areas. This system is based on commercial electronic maps (such as Amap, Baidu Maps, and other 2D electronic maps) and effectively integrates the global navigation satellite system (GNSS) with local laser-based environmental sensing technologies. Firstly, commercial electronic maps are used to plan rough global sub-goal points. The sequence of global goal points is then registered with the actual positioning information and sent to the local planner. Subsequently, local planning tasks are performed within the local grid map established by laser sensing, following the sequence of sub-goal points. The improved local planner updates the sub-goal points dynamically based on real-time environmental changes during movement. Multiple simulations are conducted in a simulated environment, and validation is performed using a tracked vehicle in real-world scenarios. The results indicate that the designed system can accurately execute long-distance outdoor navigation tasks without a global prior map of the environment, providing an efficient and safe solution for the outdoor navigation of firefighting robots.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009734
    Abstract:
    This study aims to delve into the joint detection of traffic signs and signals under complex and variable traffic conditions, analyzing and resolving the detrimental effects of harsh weather, low lighting, and image background interference on detection accuracy. To this end, an improved RT-DETR network is proposed. Based on a resource-limited operating environment, this study introduces a network, ResNet with PConv and efficient multi-scale attention (PE-ResNet), as the backbone to enhance the model’s capability to detect occlusions and small targets. To augment the feature fusion capability, a new cross-scale feature-fusion module (NCFM) is introduced, which facilitates better integration of semantic and detailed information within images, offering a more comprehensive understanding of complex scenes. Additionally, the MPDIoU loss function is introduced to more accurately measure the positional relationships among target boxes. The improved network reduces the parameter count by approximately 14% compared to the baseline model. On the CCTSDB 2021 dataset, S2TLD dataset, and the self-developed multi-scene traffic signs (MTST) dataset, the mAP50:95 increases by 1.9%, 2.2%, and 3.7%, respectively. Experimental results demonstrate that the enhanced RT-DETR model effectively improves target detection accuracy in complex scenarios.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009737
    Abstract:
    This study proposes an algorithm for road damage detection based on an improved YOLOv8 to address challenges in road damage detection, including multi-scale targets, complex target structures, uneven sample distribution, and the impact of hard and easy samples on bounding box regression. The algorithm introduces dynamic snake convolution (DSConv) to replace some of the Conv modules in the original faster implementation of CSP bottleneck with 2 convolutions (C2f) module, aiming to adaptively focus on small and intricate local features, thereby enhancing the perception of geometric structures. By incorporating an efficient multi-scale attention (EMA) module before each detection head, the algorithm achieves cross-dimensional interaction and captures pixel-level relationships, improving its generalization capability for complex global features. Additionally, an extra small object detection layer is added to enhance the precision of small object detection. Finally, a strategy termed Flex-PIoUv2 is proposed, which alleviates sample distribution imbalance and anchor box inflation through linear interval mapping and size-adaptive penalty factors. Experimental results demonstrate that the improved model increases the F1 score, mAP50, and mAP50-95 on the RDD2022 dataset by 1.5%, 2.1%, and 1.2%, respectively. Additionally, results on the GRDDC2020 and China road damage datasets validate the strong generalization of the proposed algorithm.
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2000,9(2):38-41, DOI:
    [Abstract] (12721) [HTML] (0) [PDF ] (22418)
    Abstract:
    本文详细讨论了VRML技术与其他数据访问技术相结合 ,实现对数据库实时交互的技术实现方法 ,并简要阐述了相关技术规范的语法结构和技术要求。所用技术手段安全可靠 ,具有良好的实际应用表现 ,便于系统移植。
    1993,2(8):41-42, DOI:
    [Abstract] (9772) [HTML] (0) [PDF ] (32206)
    Abstract:
    本文介绍了作者近年来应用工具软件NU清除磁盘引导区和硬盘主引导区病毒、修复引导区损坏磁盘的 经验,经实践检验,简便有效。
    1995,4(5):2-5, DOI:
    [Abstract] (9319) [HTML] (0) [PDF ] (14476)
    Abstract:
    本文简要介绍了海关EDI自动化通关系统的定义概况及重要意义,对该EDI应用系统下的业务运作模式所涉及的法律问题,采用EDIFACT国际标准问题、网络与软件技术问题,以及工程管理问题进行了结合实际的分析。
    2016,25(8):1-7, DOI: 10.15888/j.cnki.csa.005283
    [Abstract] (8958) [HTML] () [PDF 1167952] (39213)
    Abstract:
    从2006年开始,深度神经网络在图像/语音识别、自动驾驶等大数据处理和人工智能领域中都取得了巨大成功,其中无监督学习方法作为深度神经网络中的预训练方法为深度神经网络的成功起到了非常重要的作用. 为此,对深度学习中的无监督学习方法进行了介绍和分析,主要总结了两类常用的无监督学习方法,即确定型的自编码方法和基于概率型受限玻尔兹曼机的对比散度等学习方法,并介绍了这两类方法在深度学习系统中的应用,最后对无监督学习面临的问题和挑战进行了总结和展望.
    2008,17(5):122-126, DOI:
    [Abstract] (7968) [HTML] (0) [PDF ] (48925)
    Abstract:
    随着Internet的迅速发展,网络资源越来越丰富,人们如何从网络上抽取信息也变得至关重要,尤其是占网络资源80%的Deep Web信息检索更是人们应该倍加关注的难点问题。为了更好的研究Deep Web爬虫技术,本文对有关Deep Web爬虫的内容进行了全面、详细地介绍。首先对Deep Web爬虫的定义及研究目标进行了阐述,接着介绍了近年来国内外关于Deep Web爬虫的研究进展,并对其加以分析。在此基础上展望了Deep Web爬虫的研究趋势,为下一步的研究奠定了基础。
    2011,20(11):80-85, DOI:
    [Abstract] (7688) [HTML] () [PDF 863160] (43070)
    Abstract:
    在研究了目前主流的视频转码方案基础上,提出了一种分布式转码系统。系统采用HDFS(HadoopDistributed File System)进行视频存储,利用MapReduce 思想和FFMPEG 进行分布式转码。详细讨论了视频分布式存储时的分段策略,以及分段大小对存取时间的影响。同时,定义了视频存储和转换的元数据格式。提出了基于MapReduce 编程框架的分布式转码方案,即Mapper 端进行转码和Reducer 端进行视频合并。实验数据显示了转码时间随视频分段大小和转码机器数量不同而变化的趋势。结
    1999,8(7):43-46, DOI:
    [Abstract] (7362) [HTML] (0) [PDF ] (24089)
    Abstract:
    用较少的颜色来表示较大的色彩空间一直是人们研究的课题,本文详细讨论了半色调技术和抖动技术,并将它们扩展到实用的真彩色空间来讨论,并给出了实现的算法。
    2022,31(5):1-20, DOI: 10.15888/j.cnki.csa.008463
    [Abstract] (6630) [HTML] (4064) [PDF 2584043] (6039)
    Abstract:
    深度学习方法的提出使得机器学习研究领域得到了巨大突破, 但是却需要大量的人工标注数据来辅助完成. 在实际问题中, 受限于人力成本, 许多应用需要对从未见过的实例类别进行推理判断. 为此, 零样本学习(zero-shot learning, ZSL)应运而生. 图作为一种表示事物之间联系的自然数据结构, 目前在零样本学习中受到了越来越多的关注. 本文对零样本图学习方法进行了系统综述. 首先概述了零样本学习和图学习的定义, 并总结了零样本学习现有的解决方案思想. 然后依据图的不同利用方式对目前零样本图学习的方法体系进行了分类. 接下来讨论了零样本图学习所涉及到的评估准则和数据集. 最后指明了零样本图学习进一步研究中需要解决的问题以及未来可能的发展方向.
    2012,21(3):260-264, DOI:
    [Abstract] (6574) [HTML] () [PDF 336300] (45381)
    Abstract:
    开放平台的核心问题是用户验证和授权问题,OAuth 是目前国际通用的授权方式,它的特点是不需要用户在第三方应用输入用户名及密码,就可以申请访问该用户的受保护资源。OAuth 最新版本是OAuth2.0,其认证与授权的流程更简单、更安全。研究了OAuth2.0 的工作原理,分析了刷新访问令牌的工作流程,并给出了OAuth2.0 服务器端的设计方案和具体的应用实例。
    2007,16(9):22-25, DOI:
    [Abstract] (6540) [HTML] (0) [PDF ] (7213)
    Abstract:
    本文结合物流遗留系统的实际安全状态,分析了面向对象的编程思想在横切关注点和核心关注点处理上的不足,指出面向方面的编程思想解决方案对系统进行分离关注点处理的优势,并对面向方面的编程的一种具体实现AspectJ进行分析,提出了一种依据AspectJ对遗留物流系统进行IC卡安全进化的方法.
    2011,20(7):184-187,120, DOI:
    [Abstract] (6414) [HTML] () [PDF 731903] (33972)
    Abstract:
    针对智能家居、环境监测等的实际要求,设计了一种远距离通讯的无线传感器节点。该系统采用集射频与控制器于一体的第二代片上系统CC2530 为核心模块,外接CC2591 射频前端功放模块;软件上基于ZigBee2006 协议栈,在ZStack 通用模块基础上实现应用层各项功能。介绍了基于ZigBee 协议构建无线数据采集网络,给出了传感器节点、协调器节点的硬件设计原理图及软件流程图。实验证明节点性能良好、通讯可靠,通讯距离较TI 第一代产品有明显增大。
    (), DOI:
    [Abstract] (6320) [HTML] (19) [PDF ] (14)
    Abstract:
    2019,28(6):1-12, DOI: 10.15888/j.cnki.csa.006915
    [Abstract] (6120) [HTML] (19367) [PDF 672566] (26404)
    Abstract:
    知识图谱是以图的形式表现客观世界中的概念和实体及其之间关系的知识库,是语义搜索、智能问答、决策支持等智能服务的基础技术之一.目前,知识图谱的内涵还不够清晰;且因建档不全,已有知识图谱的使用率和重用率不高.为此,本文给出知识图谱的定义,辨析其与本体等相关概念的关系.本体是知识图谱的模式层和逻辑基础,知识图谱是本体的实例化;本体研究成果可以作为知识图谱研究的基础,促进知识图谱的更快发展和更广应用.本文罗列分析了国内外已有的主要通用知识图谱和行业知识图谱及其构建、存储及检索方法,以提高其使用率和重用率.最后指出知识图谱未来的研究方向.
    2004,13(10):7-9, DOI:
    [Abstract] (6068) [HTML] (0) [PDF ] (12291)
    Abstract:
    本文介绍了车辆监控系统的组成,研究了如何应用Rockwell GPS OEM板和WISMOQUIKQ2406B模块进行移动单元的软硬件设计,以及监控中心 GIS软件的设计.重点介绍嵌入TCP/IP协议处理的Q2406B模块如何通过AT指令接入Internet以及如何和监控中心传输TCP数据.
    2008,17(1):113-116, DOI:
    [Abstract] (6007) [HTML] (0) [PDF ] (50403)
    Abstract:
    排序是计算机程序设计中一种重要操作,本文论述了C语言中快速排序算法的改进,即快速排序与直接插入排序算法相结合的实现过程。在C语言程序设计中,实现大量的内部排序应用时,所寻求的目的就是找到一个简单、有效、快捷的算法。本文着重阐述快速排序的改进与提高过程,从基本的性能特征到基本的算法改进,通过不断的分析,实验,最后得出最佳的改进算法。
    2008,17(8):87-89, DOI:
    [Abstract] (5939) [HTML] (0) [PDF ] (42377)
    Abstract:
    随着面向对象软件开发技术的广泛应用和软件测试自动化的要求,基于模型的软件测试逐渐得到了软件开发人员和软件测试人员的认可和接受。基于模型的软件测试是软件编码阶段的主要测试方法之一,具有测试效率高、排除逻辑复杂故障测试效果好等特点。但是误报、漏报和故障机理有待进一步研究。对主要的测试模型进行了分析和分类,同时,对故障密度等参数进行了初步的分析;最后,提出了一种基于模型的软件测试流程。
    2008,17(8):2-5, DOI:
    [Abstract] (5810) [HTML] (0) [PDF ] (33049)
    Abstract:
    本文介绍了一个企业信息门户中单点登录系统的设计与实现。系统实现了一个基于Java EE架构的结合凭证加密和Web Services的单点登录系统,对门户用户进行统一认证和访问控制。论文详细阐述了该系统的总体结构、设计思想、工作原理和具体实现方案,目前系统已在部分省市的广电行业信息门户平台中得到了良好的应用。
    2004,13(8):58-59, DOI:
    [Abstract] (5762) [HTML] (0) [PDF ] (28579)
    Abstract:
    本文介绍了Visual C++6.0在对话框的多个文本框之间,通过回车键转移焦点的几种方法,并提出了一个改进方法.
    2009,18(5):182-185, DOI:
    [Abstract] (5730) [HTML] (0) [PDF ] (34952)
    Abstract:
    DICOM 是医学图像存储和传输的国际标准,DCMTK 是免费开源的针对DICOM 标准的开发包。解读DICOM 文件格式并解决DICOM 医学图像显示问题是医学图像处理的基础,对医学影像技术的研究具有重要意义。解读了DICOM 文件格式并介绍了调窗处理的原理,利用VC++和DCMTK 实现医学图像显示和调窗功能。
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2007,16(10):48-51, DOI:
    [Abstract] (4869) [HTML] (0) [PDF 0.00 Byte] (89117)
    Abstract:
    论文对HDF数据格式和函数库进行研究,重点以栅格图像为例,详细论述如何利用VC++.net和VC#.net对光栅数据进行读取与处理,然后根据所得到的象素矩阵用描点法显示图像.论文是以国家气象中心开发Micaps3.0(气象信息综合分析处理系统)的课题研究为背景的.
    2002,11(12):67-68, DOI:
    [Abstract] (4160) [HTML] (0) [PDF 0.00 Byte] (60049)
    Abstract:
    本文介绍非实时操作系统Windows 2000下,利用VisualC++6.0开发实时数据采集的方法.所用到的数据采集卡是研华的PCL-818L.借助数据采集卡PCL-818L的DLLs中的API函数,提出三种实现高速实时数据采集的方法及优缺点.
    2008,17(1):113-116, DOI:
    [Abstract] (6007) [HTML] (0) [PDF 0.00 Byte] (50403)
    Abstract:
    排序是计算机程序设计中一种重要操作,本文论述了C语言中快速排序算法的改进,即快速排序与直接插入排序算法相结合的实现过程。在C语言程序设计中,实现大量的内部排序应用时,所寻求的目的就是找到一个简单、有效、快捷的算法。本文着重阐述快速排序的改进与提高过程,从基本的性能特征到基本的算法改进,通过不断的分析,实验,最后得出最佳的改进算法。
    2008,17(5):122-126, DOI:
    [Abstract] (7968) [HTML] (0) [PDF 0.00 Byte] (48924)
    Abstract:
    随着Internet的迅速发展,网络资源越来越丰富,人们如何从网络上抽取信息也变得至关重要,尤其是占网络资源80%的Deep Web信息检索更是人们应该倍加关注的难点问题。为了更好的研究Deep Web爬虫技术,本文对有关Deep Web爬虫的内容进行了全面、详细地介绍。首先对Deep Web爬虫的定义及研究目标进行了阐述,接着介绍了近年来国内外关于Deep Web爬虫的研究进展,并对其加以分析。在此基础上展望了Deep Web爬虫的研究趋势,为下一步的研究奠定了基础。

External Links

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063