• Current Issue
  • Online First
  • Archive
  • Click Rank
  • Most Downloaded
  • 综述文章
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2025,34(1):1-10, DOI: 10.15888/j.cnki.csa.009782, CSTR: 32024.14.csa.009782
    [Abstract] (186) [HTML] (78) [PDF 1.23 M] (406)
    Abstract:
    Prompt engineering plays a crucial role in unlocking the potential of large language model. This method guides the model’s response by designing prompt instructions to ensure the relevance, coherence, and accuracy of the response. Prompt engineering does not require fine-tuning model parameters and can be seamlessly connected with downstream tasks. Therefore, various prompt engineering techniques have become a research hotspot in recent years. Accordingly, this study introduces the key steps for creating effective prompts, summarizes basic and advanced prompt engineering techniques, such as chain of thought and tree of thought, and deeply explores the advantages and limitations of each method. At the same time, it discusses how to evaluate the effectiveness of prompt methods from different perspectives and using different methods. The rapid development of these technologies enables large language models to succeed in a variety of applications, ranging from education and healthcare to code generation. Finally, future research directions of prompt engineering technology are prospected.
    2025,34(1):11-25, DOI: 10.15888/j.cnki.csa.009709, CSTR: 32024.14.csa.009709
    [Abstract] (115) [HTML] (62) [PDF 1.79 M] (353)
    Abstract:
    Acute ischemic stroke is the most common type of stroke in clinical practice. Due to its sudden onset and short treatment time window, it becomes one of the important factors leading to disability and death world wide. With the rapid development of artificial intelligence, deep learning technology shows great potential in the diagnosis and treatment of acute ischemic stroke. Deep learning models can quickly and efficiently segment and detect lesions based on patients’ brain images. This study introduces the development history of deep learning models and commonly used public datasets for stroke research. For various modalities and scanning sequences derived from computerized tomography (CT) and magnetic resonance imaging (MRI), it elaborates on the research progress of deep learning technology in the field of lesion segmentation and detection in acute ischemic stroke and summarizes and analyzes the improvement ideas of related research. Finally, it points out existing challenges of deep learning in this field and proposes possible solutions.
    2025,34(1):26-36, DOI: 10.15888/j.cnki.csa.009733, CSTR: 32024.14.csa.009733
    Abstract:
    The research on the classification and identification of microscopic residual oil occurrence states plays a vital role in residual oil exploitation and is of great significance for improving oil field recovery. In recent years, a large number of studies in this field have promoted the development of technologies for identifying microscopic residual oil by introducing deep learning. However, deep learning has not yet established a unified framework for microscopic residual oil identification, nor has it formed a standardized operation process. To guide future research, this study reviews existing methods for identifying residual oil and introduces the identification technologies for microscopic residual oil based on machine vision from several aspects, including image acquisition and classification standards, image processing, and residual oil identification methods. Residual oil identification methods are categorized into traditional and deep learning-based methods. The traditional methods are further divided into those based on manual feature extraction and those based on machine learning classification. The deep learning-based methods are divided into single-stage and two-stage methods. Detailed summaries are provided for data enhancement, pre-training, image segmentation, and image classification. Finally, this study discusses the challenges of applying deep learning to microscopic residual oil identification and explores future development trends.
    2025,34(1):37-46, DOI: 10.15888/j.cnki.csa.009742, CSTR: 32024.14.csa.009742
    Abstract:
    Embodied AI requires the ability to interact with and perceive the environment, and capabilities such as autonomous planning, decision making, and action taking. Behavior trees (BTs) become a widely used approach in robotics due to their modularity and efficient control. However, existing behavior tree generation techniques still face certain challenges when dealing with complex tasks. These methods typically rely on domain expertise and have a limited capacity to generate behavior trees. In addition, many existing methods have language comprehension deficiencies or are theoretically unable to guarantee the success of the behavior tree, leading to difficulties in practical robotic applications. In this study, a new method for automatic behavior tree generation is proposed, which generates an initial behavior tree with task goals based on large language models (LLMs) and scene semantic perception. The method in this study designs robot action primitives and related condition nodes based on the robot’s capabilities. It then uses these to design prompts to make the LLMs output a behavior plan (generated plan), which is then transformed into an initial behavior tree. Although this paper takes this as an example, the method has wide applicability and can be applied to other types of robotic tasks according to different needs. Meanwhile, this study applies this method to robot tasks and gives specific implementation methods and examples. During the process of the robot performing a task, the behavior tree can be dynamically updated in response to the robot’s operation errors and environmental changes and has a certain degree of robustness to changes in the external environment. In this study, the first validation experiments on behavior tree generation are carried out and verified in the simulated robot environment, which demonstrates the effectiveness of the proposed method.
    2025,34(1):47-57, DOI: 10.15888/j.cnki.csa.009760, CSTR: 32024.14.csa.009760
    [Abstract] (103) [HTML] (58) [PDF 2.29 M] (484)
    Abstract:
    Deformable 3D medical image registration remains challenging due to irregular deformations of human organs. This study proposes a multi-scale deformable 3D medical image registration method based on Transformer. Firstly, the method adopts a multi-scale strategy to realize multi-level connections to capture different levels of information. Self-attention mechanism is employed to extract global features, and dilated convolution is used to capture broader context information and more detailed local features, so as to enhance the registration network’s fusion capacity for global and local features. Secondly, according to the sparse prior of the image gradient, the normalized total gradient is introduced as a loss function, effectively reducing the interference of noise and artifacts on the registration process, and better adapting to different modes of medical images. The performance of the proposed method is evaluated on publicly available brain MRI datasets (OASIS and LPBA). The results show that the proposed method can not only maintain the advantages of the learning-based method in run-time but also well performs in mean square error and structural similarity. In addition, ablation experiment results further prove the validity of the method and normalized total gradient loss function design proposed in this study.
    2025,34(1):58-68, DOI: 10.15888/j.cnki.csa.009722, CSTR: 32024.14.csa.009722
    Abstract:
    In the current electricity market, the volume of daily spot market clearing data has reached millions or tens of millions. With the increase in trading activities and the complexity of the market structure, ensuring the integrity, transparency, and traceability of trading data has become a key issue to be studied in the field of market clearing in China. Therefore, this study proposes a data provenance method for power market clearing based on the PROV model and smart contracts, aiming to automate the storage and updating of provenance information through smart contracts to improve the transparency of the clearing process and the trust of the participants. The proposed method utilizes the elements of entities, activities, and agents in the PROV model, combined with the hierarchical storage and immutability of blockchain technology, to record and track trading activities and rule changes in the electricity market. The method not only enhances data transparency and trust among market participants but also optimizes data management and storage strategies, reducing operational costs. In addition, the method provides proof of compliance for power market clearing, helping market participants meet increasing regulatory requirements.
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009800
    Abstract:
    Facing the complex marine environment, it is extremely challenging to utilize ship radiated noise for hydroacoustic target feature extraction and recognition. In this study, 3D dynamic Mel-frequency cepstrum coefficient (3D-MFCC) features of ship audio signals are fused with 3D dynamic Mel-spectrogram (3D-Mel) features as model inputs. Based on this, a new deep neural network model for underwater target recognition is proposed. The model is based on the serial architectures of convolutional neural network (CNN) and long short-term memory (LSTM). Here, the traditional CNN is replaced by multi-scale depthwise convolutional network (MSDC), and multi-scale channel attention (MSCA) is added. The experimental results show that the average recognition rate of this method on DeepShip and ShipsEar datasets reaches 85.92% and 97.32% respectively, which demonstrates a good classification effect.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009803
    Abstract:
    Currently, research on multi-label text classification integrates label information. However, in the field of sentiment analysis, existing methods often overlook the correlations of labels based on the intensity and polarity of emotions themselves, which are crucial for accurate classification. To address these issues, this study proposes the MGE-BERT model which features multi-label interaction, graph enhancement, and emotion perception. The model first prioritizes sentiment label sorting through the correlations of sentiment intensity and hierarchy and then combines these sorted labels with text data as inputs into the BERT model. During this process, syntactic analysis techniques and sentiment lexicons are employed, and through a unique graph construction method, intricate dependency and emotion graphs are built. To further enhance the in-depth integration of label information and text features, the study uses BERT outputs as inputs to graph convolutional network (GCN), enabling it to capture and transmit contextual relationships between nodes more precisely. Experimental results demonstrate that the proposed MGE-BERT model outperforms state-of-the-art models, achieving improvements in Macro-F1 scores by 1.6% and 2.0% on the SemEval2018 Task-1C and GoEmotions datasets, respectively.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009823
    Abstract:
    Currently, most explainable multimodal fake news detection methods overlook the further research and utilization of explanation data and cross-modal features. As a result, while these explainable fake news detection methods provide explanations for model decisions, their detection performance does not surpass that of advanced multimodal detection methods. To address these issues, this study proposes an iterative explainable multimodal fake news detection framework. This method consists of a main model and an explanation module, both of which receive multimodal news as input. First, the explanation module uses the explanation data calculated by the DeepLIFT algorithm as one of the inputs to the main model, contributing to the decision-making process. Next, the main model calculates cross-modal relevant features and cross-modal supplementary features through a multi-task network framework. It refines the cross-modal supplementary features by re-weighting them with the coarse prediction scores from the cross-modal relevant features and combines multiple features to make the final model decision. Finally, the explanation module trains by transferring decision knowledge from the main model by using knowledge distillation. The main model and the explanation module are trained alternately, forming an iterative framework that enhances model detection performance while providing decision explanations. Extensive experiments on two publicly available fake news detection datasets demonstrate that the proposed method outperforms state-of-the-art multimodal fake news detection methods.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009824
    Abstract:
    Learning-based multi-view stereo matching algorithms have achieved remarkable results, but still have the problems of limited convolutional receptive field and ignoration of image frequency information, which lead to insufficient matching performance on low-texture, repetitive, and non-Lambertian surfaces. To address these problems, this study proposes CAF-MVSNet, a context-enhanced and image-frequency-guided multi-view stereo matching network. First, the context enhancement module is fused into the feature pyramid network in the feature extraction stage to effectively expand the receptive field of the network. Then the image-frequency-guided attention module is introduced to obtain the information of lines, shapes, textures, and colors of the images by encoding different frequencies of the images, which enhances the remote contextual connection of the images and further solves the problem of accurate matching of low-texture, repetitive, and non-Lambertian surfaces for reliable feature matching. Experimental results on the DTU dataset show that CAF-MVSNet has a 12.3% improvement in the combined error compared to the classical cascade model CasMVSNet, demonstrating excellent performance. In addition, good results are achieved on the Tanks and Temples dataset, reflecting the good generalization performance of CAF-MVSNet.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009830
    Abstract:
    This study researches improving the UnifiedGesture model to enhance the realism of audio-driven human body animation generation. Firstly, an encoder-decoder architecture is introduced to extract facial features from audio, compensating for the deficiencies of the original model in facial expression generation. Secondly, the cross-local attention mechanism and the multi-head attention mechanism based on Transform-XL are combined to enhance the temporal dependency within long sequences. Simultaneously, the vector quantized variational autoencoder (VQVAE) is utilized to integrate and generate full-body motion sequences, enhancing the diversity and integrity of the generated motions. Finally, experiments are conducted on the BEAT dataset. The quantitative and qualitative analysis results demonstrate that the improved UnifiedGesture-F model achieves a significant improvement in the synchronicity between audio and human body movements as well as in the overall realism compared to the original model.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009831
    Abstract:
    TensorGCN model is one of the state-of-the-art (SOTA) models applied by graph neural networks in the field of text classification. However, in terms of processing text semantic information, the long short-term memory (LSTM) used by the model has difficulty in completely extracting the semantic features of short text and performs poorly in handling complex semantic information. At the same time, due to the large number of semantic and syntactic features contained in long texts, feature sharing is incomplete when heterogeneous information is shared among graphs, which affects the accuracy of text classification. To solve these two problems, the TensorGCN model is improved, and a text classification method based on the tensor graph convolutional network fusing BERT and the self-attention mechanism (BTSGCN) is proposed. Firstly, BERT is used to replace the LSTM module in the TensorGCN architecture for semantic feature extraction. It captures the dependencies between words by considering the surrounding words on both sides of a given word, thus extracting the semantic features of short texts more accurately. Then, the self-attention mechanism is added during the propagation among graphs to help the model better capture the features among different graphs and complete the feature fusion. Experimental results on MR, R8, R52, and 20NG datasets show that BTSGCN has higher classification accuracy than other comparison methods.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009829
    Abstract:
    Transformer method, relying on a self-attention mechanism, exhibits remarkable performance in the field of image super-resolution reconstruction. Nevertheless, the self-attention mechanism also brings about a very high computational cost. To address this issue, a lightweight image super-resolution reconstruction model based on a hybrid generalized Transformer is proposed. This model is built based on the SwinIR network architecture. Firstly, the rectangular window self-attention (RWSA) mechanism is adopted. It utilizes horizontal and vertical rectangular windows with different heads to replace the traditional square window pattern, integrating features across different windows. Secondly, the recursive generalized self-attention (RGSA) mechanism is introduced to recursively aggregate input features into representative feature maps, followed by the application of cross-attention to extract global information. Meanwhile, RWSA and RGSA are alternately combined to make more effective use of global context information. Finally, to activate more pixels for better recovery, the channel attention mechanism and self-attention mechanism are used in parallel to extract features from the input image. Test results of five benchmark datasets show that this model achieves better reconstruction performance while keeping the model parameters lightweight.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009798
    Abstract:
    Core images, as a crucial digital image resource in the fields of geology, oil, and gas, are essential for scientific research and engineering practices. Their security is often ensured by adding digital watermarks. During digitization, core images frequently undergo JPEG compression when they are stored, transmitted, or published on Web pages. However, existing deep learning-based image digital watermarking algorithms still have significant shortcomings in terms of visual quality and robustness under JPEG compression. This study proposes an end-to-end image robust watermarking algorithm to address the issue of robust watermark embedding in core images under JPEG compression conditions. To efficiently integrate the features of the host image and the watermark, the study introduces a pyramid efficient multi-scale attention (PEMA) module. Through a unique cross-spatial interaction strategy and channel-wise relationship construction, the module effectively captures long-range dependencies in different directions and features information at various scales. To achieve visual imperceptibility, the study embeds the digital watermark into the low-frequency components of the host image using discrete wavelet transform (DWT) and introduces the DWT LL sub-band loss (DLL) loss function to improve the visual quality of the watermark image. Experimental results demonstrate that the proposed algorithm outperforms existing mainstream algorithms in both robustness against JPEG compression and visual imperceptibility.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009810
    Abstract:
    Explainable recommendation algorithms utilize behavioral and other relevant information to not only generate recommendation results but also provide recommendation explanations, thereby increasing the transparency and credibility of recommendations. Traditional explainable recommendation algorithms are often limited to analyzing rating data and text data and fail to fully utilize data such as images. They also do not consider effective fusion methods between modalities, making it difficult to fully unearth the intrinsic relationships between different modalities. An explainable recommendation model that fuses multimodal features is proposed to address the above-mentioned issues. This model improves the quality and personalization of recommendation explanations from a multimodal perspective through feature fusion technology. Firstly, a multimodal feature extraction method is designed based on CLIP image encoder and text encoder to extract text and image features of users and items, respectively. Secondly, cross attention technology is used to achieve cross modal fusion of text and images, enhancing semantic correlation between modalities. Finally, multimodal information is combined with interactive information to jointly optimize modal alignment, rating prediction, and explanation generation. Experimental results show that the proposed method exhibits significant advantages in the three multimodal recommendation datasets, especially in improving explanation quality.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009794
    Abstract:
    Aiming at the difficult balance between the global receptive field and efficient computation and unclear details of image reconstruction, an attribute guided network based on CNN-Mamba (CMANet) is proposed. Firstly, when the model is reconstructed, attribute information is introduced and interrelationships among these attributes are considered, which helps the model to improve the reliability and accuracy of the whole reconstruction process. Secondly, the hourglass state space module is introduced to explore the key features of face images and maintain the advantage of linear complexity in long-distance dependency modeling. Finally, an adaptive Mamba fusion module is introduced. When image features learn long-distance dependencies in multiple directions, attributes are adaptively supplemented in different directions, and features supplemented in different directions are adaptively fused, making the model more flexible and efficient in processing diverse images. A large number of experiments prove the superiority of the proposed method.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009795
    Abstract:
    Ineffective object recognition models occur in remote sensing images through complex background interference and dense target integration. To this end, this study improves the YOLOv5s object output model. First, a mixed attention menu is utilized to improve the convolutional attention model (CBAM) and add it to backbone networks. Accordingly, the extracted features of the model contain local and global information to enhance the model’s ability to identify targets in complex backgrounds. Then the study uses the ultra-light sampler DySample to reduce model parameters and improve model performance. Finally, the study employs the EIoU loss function to improve the positioning level of the target to be detected. Experimental verification of RSOD and DIOR data sets shows that the improved YOLOv5s has a 7.8% higher accuracy than the original model in detecting targets in remote sensing images, meeting the real-time detection requirements of targets in remote sensing images. In addition, the improved model retains the advantages it has in comparison to other object recognition models.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009786
    Abstract:
    Terrain classification is a crucial research direction in remote sensing imagery. The technology of joint hyperspectral images and LiDAR data classification has drawn much attention in recent years. The classification performance of existing deep learning models significantly depends on the richness and quality of labeled samples, which often poses a major challenge in practical applications. In addition, many models fail to effectively utilize the information complementarity between hyperspectral images and LiDAR data. To solve the above problems, this study proposes a semi-supervised double-branch classification network with cross-modal channel weight adjustment. Through the attention mechanism, the similarity between two data channels is analyzed deeply, and the weight of each channel is adaptively adjusted accordingly. At the same time, the semi-supervised method of consistency regularization and pseudo-labeling is combined to effectively utilize the information of unlabeled samples. In the experiment of joint classification of hyperspectral images and LiDAR data on the two iconic joint datasets of Houston and MUUFL, the proposed method shows significant advantages over existing classification models, effectively improving classification accuracy and efficiency.
    Available online:  January 21, 2025 , DOI: 10.15888/j.cnki.csa.009792
    Abstract:
    Federated learning is a distributed machine learning technique that allows participants to train models locally and upload updates to a central server. The central server aggregates the updates to generate a better global model, ensuring data privacy and solving the problem of data silos. However, the gradient aggregation relies on a central server, which may lead to a single point of failure, and the central server is also a potential malicious attacker. Therefore, federated learning needs to be decentralized. The existing decentralized solutions ignore external adversaries and the performance bottlenecks issues caused by data communication. To address the above issues, this study proposes a decentralized federated learning method considering external adversaries. The method applies Shamir’s secret sharing scheme to divide model updates into multiple shares to protect gradient privacy. The method proposes a flooding consensus protocol that randomly selects a participant as the central server in each round to complete global aggregation, efficiently achieving the decentralization of federated learning. At the same time, the method introduces BLS aggregate signatures to prevent external adversary attacks and improve verification efficiency. Theoretical analysis and experimental results indicate that this method is safe and efficient, having higher efficiency than similar federated learning methods.
    Available online:  January 17, 2025 , DOI: 10.15888/j.cnki.csa.009790
    Abstract:
    Aiming at noise and pseudo-edge interference in the edge extraction process of automobile coating images caused by the complex environment and uneven lighting in production plants, an edge extraction algorithm for automobile coating images with an improved Canny operator is proposed. Firstly, the algorithm adopts a cascade filter composed of a multi-level median rational hybrid filter and a guided filter to denoise and smooth the image, while retaining the target edge information during noise reduction. Secondly, the improved Sobel operator convolution template is applied to extract the gradient vectors from four directions of horizontal, vertical, 45°, and 135°, so as to improve edge localization accuracy. Finally, in the edge connection stage, the improved Otsu method (maximum interclass variance method) is used to select high and low thresholds, increasing the adaptability of the algorithm. Experimental results show that in terms of image denoising, compared with traditional median filtering, the algorithm ensures that the peak signal-to-noise ratio of the denoised image is higher than 35 dB, and the structural similarity is greater than 0.9. The overall peak signal-to-noise ratio increases by more than 6%, and the structural similarity is improved by more than 6.5%. In the aspect of edge extraction, it can effectively reduce the interference of the pseudo-edge and has a high degree of edge connectivity.
    Available online:  January 17, 2025 , DOI: 10.15888/j.cnki.csa.009787
    Abstract:
    In response to the problem that current plug-and-play image restoration methods cannot accurately model image degradation models in blind image restoration tasks such as low-light image enhancement, this study constructs a solution that combines a plug-and-play splitting algorithm with a guided diffusion model. This solution cleverly avoids directly solving complex data sub-problems caused by complex degradation models. Instead, it uses real degraded images to solve data sub-problems and takes the solutions of data sub-problems as “anchor points” to indirectly constrain and optimize the solving process of prior sub-problems. This ensures that the image restoration results can be more closely approximated to the real image restoration target. This method is validated on multiple public datasets. The results show that the proposed algorithm achieves an average improvement of 4.89% in PSNR and 9.48% in SSIM compared to current representative methods. Experiments prove that the proposed method performs better in repair metrics, validating its effectiveness.
    Available online:  January 17, 2025 , DOI: 10.15888/j.cnki.csa.009821
    Abstract:
    To solve the problems that denial of service (DoS) attacks in the Internet of Vehicles are difficult to prevent and the existing supervised learning methods cannot effectively detect zero-day attacks, this study proposes a hybrid DoS attack intrusion detection system. Firstly, the dataset is preprocessed to improve data quality. Secondly, feature selection is used to filter out redundant features, which aims to obtain more representative features. Thirdly, the ensemble learning method is used to integrate five tree-based supervised classifiers through stacking to detect known DoS attacks. Finally, an unsupervised anomaly detection method is proposed, which combines the convolutional denoising autoencoder with the attention mechanism to establish a normal behavior model. It is used to detect unknown DoS attacks that are missed by stacking ensemble models. Experimental results show that for the detection of known DoS attacks, the detection accuracy of the proposed system on the Car-Hacking dataset and the CICIDS2017 dataset is 100% and 99.967%, respectively. For the detection of unknown DoS attacks, the detection accuracy of the proposed system on the above two datasets is 100% and 83.953%, respectively, and the average test time on the two datasets is 0.072 ms and 0.157 ms, respectively, which verifies the effectiveness and feasibility of the proposed system.
    Available online:  January 17, 2025 , DOI: 10.15888/j.cnki.csa.009801
    Abstract:
    As a core component of urban transportation, the improvement of safety and efficiency of the subway system is of great significance in ensuring the safety of passengers’ lives and property. Pedestrian gate-breaking behavior can not only cause equipment damage and traffic delays but also pose a threat to the safety of other passengers. Therefore, accurately detecting and recognizing the behavior of pedestrians breaking through subway gates has become an important task in intelligent transportation management. This study proposes a pedestrian gate-breaking threat detection algorithm. Firstly, the algorithm uses the mobile network convolution module in the feature extractor of the RAFT optical flow method and adds the ECA channel attention mechanism. At the same time, the 3D structure is used in the related volume building block and the field radius is reduced, to reduce the number of model parameters and improve the detection speed. Experimental results show that the average endpoint error of the proposed algorithm for pedestrian detection is 0.79. The detection speed can reach 55.98 frames per second, and the number of model parameters is reduced by 35.3%. To obtain the threat value of passengers breaking through subway gates, this paper uses the improved optical flow method to calculate the motion information of adjacent picture frames and combines the gate-breaking threat calculation formula proposed in this study to obtain the threat value of passengers in the current picture frame. This method meets the requirements of real-time performance, accuracy, and lightweight design, and can be effectively deployed to better meet the engineering practice requirements of pedestrian threat detection and emergency management for large passenger flows within the station.
    Available online:  January 16, 2025 , DOI: 10.15888/j.cnki.csa.009796
    Abstract:
    3D object recognition and detection based on point clouds is an important research topic in the fields of computer vision and autonomous navigation. Nowadays, deep learning algorithms have greatly improved the accuracy and robustness of 3D point cloud classification. However, deep learning networks usually have problems such as complex network structure and time-consuming training. This study proposes a three-dimensional point cloud classification network named Point-GBLS, which combines deep learning and a broad learning system. The network structure is simple and the training time is short. Firstly, point cloud features are extracted by a deep learning-based feature extraction network. Then, an improved broad learning system is used to classify them. Experiments on the ModelNet40 and ScanObjectNN dataset show that the recognition accuracy of Point-GBLS is more than 92% and 78% respectively. The training time is less than 50% of that of similar deep learning methods. It is superior to deep learning networks with the same backbone.
    Available online:  January 16, 2025 , DOI: 10.15888/j.cnki.csa.009797
    Abstract:
    To solve unclear boundaries and incoherent, incomplete, or even lost segmentation results in the semantic segmentation task of colon polyp images, a colon polyp image segmentation network named colon polyp image segmentation network based on multi-scale features and contextual aggregation (MFCA-Net) is proposed. The network selects PvTv2 as the backbone network for feature extraction. The multi-scale feature complement module (MFCM) is designed to extract rich multi-scale local information and reduce the influence of polyp morphology changes on segmentation results. The global information enhancement module (GIEM) is designed. A large-kernel deep convolution embedded with positional attention is constructed to accurately locate polyps and improve the network’s ability to distinguish complex backgrounds. The high-level semantic-guided context aggregation module (HSCAM) is designed. It guides local features with global features, complements differences, and cross-fuses shallow details and deep semantic information to improve the coherence and integrity of segmentation. The boundary perception module (BPM) is designed. Boundary features are optimized by combining traditional image processing methods and deep learning methods to achieve fine-grained segmentation and obtain clearer boundaries. Experiments show that the proposed network obtains higher mDice and mIoU scores compared with current mainstream algorithms on the publicly available colon polyp image datasets such as Kvasir, ClinicDB, ColonDB, and ETIS, and has higher segmentation accuracy and robustness.
    Available online:  January 16, 2025 , DOI: 10.15888/j.cnki.csa.009788
    Abstract:
    To address the problems that the traditional artificial potential field (APF) does not fully consider the variability of vehicle collision avoidance risk distribution and that falling into local extremum leads to path planning failure, this study proposes an adaptive elliptic scope APF based on gradient statistical mutation quantum genetic algorithm (GSM-QGA). Based on the traditional circular scope of the repulsive field, the study designs a calculation method for the dynamic elliptic scope of the repulsive potential field by analyzing the relative motion state of vehicles and obstacles. At the same time, through the analysis of the influencing factors of the potential field function, the velocity factor is introduced to complete the design of the repulsive potential field and gravitational potential field function. The GSM-QGA is used as the local optimum correction strategy for the improved artificial potential field. When the vehicle falls into the local extremum and moves back and forth, a pseudo-global map is constructed according to the current position of the vehicle, and a feasible path is planned to jump out of the local extremum range. The simulation results show that the path planned by the improved algorithm not only can effectively prevent vehicles from getting stuck in local extremum and reduce unnecessary obstacle avoidance operations of vehicles but also has advantages over traditional APF algorithm and APF algorithm based on fixed elliptic scope in terms of path smoothness and path length. The length of the planned path is shortened by 6.37% and 9.14%, respectively.
    Available online:  January 16, 2025 , DOI: 10.15888/j.cnki.csa.009789
    Abstract:
    Digital watermarking algorithms attract widespread attention due to their important application value in the fields of copyright protection, content authentication, and data hiding. In practical applications, images with embedded watermarks are often affected by differentiable noises such as image distortion and sharpening blurring. At the same time, they also face interference from non-differentiable noises such as JPEG compression and transmission errors. Existing studies mostly focus on scheme design in a single noise environment, or attempt to use differentiable models to approximately simulate non-differentiable noises. These methods limit the robustness of watermarking algorithms to a certain extent. To solve this problem, this study proposes an end-to-end one-stage digital watermarking scheme based on an invertible neural network. The scheme uses an invertible neural network to simulate non-differentiable noise, enhancing the algorithm’s adaptability and robustness to actual noisy environments. Compared with existing algorithms, this algorithm improves the peak signal-to-noise ratio (PSNR) by 3.12 dB and the average extraction accuracy (ACC) by 35.36% in the case of multiple noise superposition.
    Available online:  January 16, 2025 , DOI: 10.15888/j.cnki.csa.009791
    Abstract:
    The matrix factorization model is one of the classic models in recommendation systems. It can be used to predict users’ ratings on items, and then make recommendations to users to improve user experience. Current matrix factorization models cannot effectively extract the local similarity relationship between users, which leads to poor rating prediction and the cold start problem. With the development of social networks, the trust relationship between users has become an important research tool for recommendation systems. Therefore, this study proposes a local Bayesian probabilistic matrix factorization model based on user trust relationship (TLBPMF) for rating prediction. The model studies users’ ratings by combining the trust relationship information of users. It identifies user groups with similar preferences and clusters them. According to the clustering results, rating submatrixes are obtained. A probabilistic matrix factorization model is established for each submatrix to deeply explore the local similarity relationship between users. The parameters of this model are estimated by the Gibbs sampling algorithm. A rating dataset from a film website is selected for experiments. The results show that the model is superior to the benchmark model in prediction accuracy and has better performance on cold start users.
    Available online:  January 16, 2025 , DOI: 10.15888/j.cnki.csa.009793
    Abstract:
    To address the issue of image quality decline caused by existing reflection removal algorithms when handling complex scenes, this study proposes a color-aware dual-channel reflection removal algorithm. First, a background color generator is designed to accurately predict the background color information of an image, provide background support for the basic reflection removal network, and generate preliminary reflection removal results. Subsequently, a dual-channel reflection removal network is proposed to further optimize these preliminary results. Additionally, the algorithm designs a sparse Transformer module, a channel attention module, and a feature fusion module within the dual-channel reflection removal network, thereby enhancing the precision and effect of reflection removal. Experimental results demonstrate that this method performs excellently on the RRID and Flash datasets, effectively removing reflected light and significantly enhancing image realism.
    Available online:  January 16, 2025 , DOI: 10.15888/j.cnki.csa.009775
    Abstract:
    In the research on low-light image enhancement, although existing technologies make progress in improving image brightness, the issues of insufficient detail restoration and color distortion still persist. To tackle these problems, this study introduces a dual-attention Retinex-based Transformer network—DARFormer. The network consists of an illumination estimation network and corruption restoration network, which aims to enhance the brightness of low light images while preserving more details and preventing color distortion. Illumination estimation network uses an image prior to estimate the brightness mapping, which is used to enhance the brightness of low-light images. The corruption restoration network optimizes the quality of the brightness-enhanced image, employing a Transformer architecture with spatial attention and channel attention. Experiments carried out on public datasets LOL_v1, LOL_v2, and SID show that compared with the prevalent enhancement methods, DARFormer achieves better enhancement results in quantitative and qualitative indicators.
    Available online:  January 16, 2025 , DOI: 10.15888/j.cnki.csa.009776
    Abstract:
    With the development of information technology, back translation plagiarism, such as through the use of translation tools, becomes increasingly complex and covert, posing higher requirements for plagiarism detection methods. For this reason, a plagiarism detection method based on prompt engineering is proposed. This method guides large language model (LLM) to pay attention to potential similarities in sentence texts at the semantic level by designing prompt words, which can effectively identify highly semantically similar content. Firstly, the existing plagiarism detection technologies and the application of prompt engineering are reviewed. Based on this, a backtracking plagiarism behavior detection process based on prompt engineering is designed. Secondly, a prompt template is designed to propose a plagiarism detection index based on sentence compression ratio by merging and reducing the pairs of sentences to be detected. Finally, experiments demonstrate that the plagiarism detection method based on prompt engineering has significant advantages over traditional methods in detecting back translation plagiarism behavior.
    Available online:  January 16, 2025 , DOI: 10.15888/j.cnki.csa.009783
    Abstract:
    Remote sensing hyperspectral image single super-resolution (HSISR) tasks have made considerable progress in recent years. Methods using deep convolutional neural network (CNN) technology are widely employed. However, most CNN-based super-resolution models tend to ignore the spectral structure of remote sensing hyperspectral images. Meanwhile, due to the limitation of convolutional networks by the size of convolutional kernels, long-distance feature dependencies are ignored, which in turn affects the reconstruction accuracy. To solve these problems, this study proposes adual-branch remote sensing hyperspectral image super-resolution network based on grouped ConvLSTM and Transformer (DGCTNet), which combines the advantages of Transformer in capturing long-distance dependencies and ConvLSTM in extracting sequential features. It enhances the reconstructed image effect by extracting spatial features while maintaining spectral orderliness. In addition, DGCTNet also designs an edge learning network to diffuse edge information into the image space. At the same time, to recalibrate the spectral response, the proposed dual-group level channel self-attention mechanism (DSA) is added. Experiments on the Houston dataset show that the proposed DGCTNet method outperforms the current state-of-the-art comparison models in terms of quantitative evaluation metrics and visual quality in a wide variety of scenarios.
    Available online:  January 16, 2025 , DOI: 10.15888/j.cnki.csa.009785
    Abstract:
    The audio-visual event localization (AVEL) task locates events in a video by observing audio information and corresponding visual information. In this paper, a cross-modal time alignment network named CMTAN is designed for the AVEL task. The network consists of four parts: preprocessing, cross-modal interaction, time alignment, and feature fusion. Specifically, in the preprocessing part, the background and noise in the modal information are reduced by the processing of a new cross-modal audio guidance module and a noise reduction module. Then, in the cross-modal interaction part, the information reinforcement and information complementation modules based on the multi-head attention mechanism are used for cross-modal interaction, and the unimodal information is optimized with global information. In the time alignment part, a time alignment module focusing on the unimodal global information before and after cross-modal interaction is designed to perform feature alignment of modal information. Finally, in the feature fusion process, two kinds of modal information are fused from shallow to deep by a multi-stage fusion module. The fused modal information is ultimately used for event localization. Extensive experiments demonstrate that CMTAN has excellent performance in both weakly and fully supervised AVEL tasks.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009781
    Abstract:
    Given the insufficient adaptability of existing polymer dosage splitting algorithms when dealing with well groups in different blocks, this study proposes a polymer flooding well group splitting method based on an improved bald eagle search algorithm. Firstly, the preliminary splitting coefficients are obtained through grey correlation analysis. Then, the difference between the cumulative injection volume and the actual fluid production volume of each extraction well is calculated, and a reasonable threshold range and constraint conditions are set. Secondly, the bald eagle search algorithm is improved by introducing Sobol sequence and ICMIC mapping, golden sine Lévy flight guidance mechanism, nonlinear convergence factor, and adaptive inertia weighting strategy, which enhances the algorithm's searching capability and convergence accuracy. Finally, the improved bald eagle search algorithm is used to solve the optimization model of well group splitting coefficients in the actual block of an oilfield. The results show that the calculated splitting injection volume has a high degree of agreement with the actual fluid production volume and has good splitting accuracy.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009778
    Abstract:
    Aiming at degraded and blurred images captured under harsh weather conditions such as haze, rain, and snow, which make accurate recognition and detection challenging, this study proposes a pedestrian and vehicle detection algorithm, lightweight blur vision network (LiteBlurVisionNet), for blurred scenes. In the backbone network, the GlobalContextEnhancer attention-improved lightweight MobileNetV3 module is used, reducing the number of parameters and making the model more efficient in image processing under harsh weather conditions such as haze and rain. The neck network adopts a lighter Ghost module and the SpectralGhostUnit module improved from the GhostBottleneck module. These modules can more effectively capture global context information, improve the discrimination and expressive ability of features, help reduce the number of parameters and computational complexity, and thereby improve the network’s processing speed and efficiency. In the prediction part, DIoU NMS based on the non-maximum suppression method is used for maximum local search to remove redundant detection boxes and improve the accuracy of the detection algorithm in blurred scenes. Experimental results show that the parameter count of the LiteBlurVisionNet algorithm model is reduced by 96.8% compared to the RTDETR-ResNet50 algorithm model, and by 55.5% compared to the YOLOv8n algorithm model. The computational load of the LiteBlurVisionNet algorithm model is reduced by 99.9% compared to the Faster R-CNN algorithm model and by 57% compared to the YOLOv8n algorithm model. The mAP0.5 of the LiteBlurVisionNet algorithm model is improved by 13.71% compared to the IAL-YOLO algorithm model and by 2.4% compared to the YOLOv5s algorithm model. This means the model is more efficient in terms of storage and computation and is particularly suitable for resource-constrained environments or mobile devices.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009779
    Abstract:
    Automatic text summarization is an important branch in the field of natural language processing (NLP), and one of its main difficulties lies in how to evaluate the quality of the generated summaries quickly, objectively, and accurately. Given the problems of low evaluation accuracy, the need for reference texts, and the large consumption of computing resources in the existing text summary quality evaluation methods, this study proposes an evaluation method for the quality of text summaries based on large language models. It designs a prompt construction method based on the principle of the chain of thought to improve the performance of large language models in the evaluation of text summary quality. At the same time, a chain of thought data set is generated and a small large language model is trained in the way of model fine-tuning, significantly reducing the computing requirements. The proposed method first determines the evaluation dimension according to the characteristics of the text summary and constructs the prompt based on the principle of chain of thought. The prompt is utilized to guide the large language model to generate the chain of thought process and evaluation results based on the summary samples. Accordingly, a chain of thought data set is generated. The generated chain of thought data set is used to fine-tune and train the small large language model. Finally, the study uses the fine-tuned small-scale large language model to complete the quality evaluation of the text summary. Comparative experiments and analyses on the Summeval dataset show that this evaluation method significantly improves the evaluation accuracy of the small-scale large language model in the task of text summary quality evaluation. The study provides a text summary quality evaluation method, which is a method with high evaluation accuracy, low computing requirements, and easy deployment without reference texts.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009768
    Abstract:
    In spectral 3D CT data, the traditional convolution has a poor ability to capture global features, and the full-scale self-attention mechanism consumes large resources. To solve this problem, this study introduces a new visual attention paradigm, the wave self-attention (WSA). Compared with the ViT technology, this mechanism uses fewer resources to obtain the same amount of self-attention information. In addition, to more adequately extract the relative dependency among organs and to improve the robustness and execution speed of the model, a plug-and-play module, the wave random-encoder (WRE), is designed for the WSA mechanism. The encoder is capable of generating a pair of mutually inverse asymmetric global (local) position information matrices. The global position matrix is used to globally conduct random sampling of the wave features, and the local position matrix is used to complement the local relative dependency lost due to random sampling. In this study, experiments are performed on the task of segmenting the kidney and lung parenchyma in the standard datasets Synapse and COVID-19. The results show that this method outperforms existing models such as nnFormer and Swin-UNETR in terms of accuracy, the number of parameters, and inference rate, arriving at the SOTA level.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009766
    Abstract:
    Since existing work on the task of fake news detection frequently ignores the semantic sparsity of news text and the potential relationships between rich information, which limits the model’s capacity to understand and recognize fake news, this study proposes a fake news detection method based on heterogeneous subgraph attention networks. Heterogeneous graphs are constructed to model the abundant features of fake news, such as text, party affiliation, and topic of news samples. The heterogeneous graph attention network is constructed at the feature layer to capture the correlations between different types of information, and a subgraph attention network is constructed at the sample layer to mine the interactions between news samples. Moreover, the mutual information mechanism based on self-supervised contrastive learning focuses on discriminative subgraph representations within the global graph structure to capture the specificity of news samples. Experimental results demonstrate that the method proposed in this study achieves about 9% and 12% improvement in accuracy and F1 score, respectively, compared with existing methods on the Liar dataset, which significantly improves the performance of fake news detection.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009773
    Abstract:
    In the field of knowledge distillation (KD), feature-based methods can effectively extract the rich knowledge embedded in the teacher model. However, Logit-based methods often face issues such as insufficient knowledge transfer and low efficiency. Decoupled knowledge distillation (DKD) conducts distillation by dividing the Logits output by the teacher and student models into target and non-target classes. While this method improves distillation accuracy, its single-instance-based distillation approach fails to capture the dynamic relationships among samples within a batch. Especially when there are significant differences in the output distributions of the teacher and student models, relying solely on decoupled distillation cannot effectively bridge these differences. To address the issues inherent in DKD, this study proposes a perception reconstruction method. This method introduces a perception matrix. By utilizing the representational capabilities of the model, it recalibrates Logits, meticulously analyzes intra-class dynamic relationships, and reconstructs finer-grained inter-class relationships. Since the objective of the student model is to minimize representational disparity, this method is extended to decoupled knowledge distillation. The outputs of the teacher and student models are mapped onto the perception matrix, enabling the student model to learn richer knowledge from the teacher model. A series of validations on the CIFAR-100 and ImageNet-1K datasets demonstrate that the student model trained with this method achieves a classification accuracy of 74.98% on the CIFAR-100 dataset, which is 0.87 percentage points higher than that of baseline methods, thereby enhancing the image classification performance of the student model. Additionally, comparative experiments with various methods further verify the superiority of this method.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009743
    Abstract:
    The lack of lighting and the complex environment in the mine, coupled with the small target size of safety helmets, lead to poor detection performance of safety helmets by general object detection models. To solve these issues, an improved mine safety helmet wearing detection model based on YOLOv8s is proposed. Firstly, the effectiveSE module is combined with the C2f module in the neck network of YOLOv8s to design a new C2f-eSE module, improving the feature extraction ability of the network structure. The CIoU loss function is replaced by the Wise-EIoU loss function to improve the model’s robustness. In addition, the spatial and channel reconstruction convolution (SCConv) module is introduced into the detection head. A new lightweight SPS detection head is designed based on the parameter sharing concept, reducing the number of parameters and computational complexity of the model. Finally, adding a P2 detection layer to the model enables the feature extraction network to incorporate more shallow information and improves the detection ability for small-sized targets. Experimental results show that the mAP50 index of the improved model increases by 3.2%, the number of parameters decreases by 1.6%, and GFLOPs decreases by 5.6%.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009763
    Abstract:
    In complex terrain conditions, UAV formation path planning based on deep reinforcement learning can optimize the path of UAV formation, with better path length and environmental adaptability than traditional heuristic algorithms. However, it still has problems such as insufficient training stability and poor real-time planning. For UAV clusters with a leader-follower mode, this study proposes a real-time 3D path planning method for UAV formation based on the SPER-TD3 algorithm. Firstly, the prioritized experience replay mechanism based on SumTree is integrated into the TD3 algorithm, and the SPER-TD3 algorithm is designed to determine the path of the UAV formation. Then, an angle formation control method is used to optimize the path of the followers, and a dynamic path smoothing algorithm is applied to optimize the steering angle. To accelerate the training convergence speed and stability of the SPER-TD3 algorithm, and solve the long-term dependence problem, a network model structure combining LSTM, self-attention mechanism, and multiple perceptrons is designed. Simulation experiments are conducted in environments with various obstacles. Results show that the method mentioned above is superior to eight mainstream deep reinforcement learning algorithms in terms of path safety coverage rate, flight path smoothness, success rate, and reward size. Its comprehensive evaluation value of importance is 8.5% to 72.9% higher than existing methods, and it has the best training stability.
    Available online:  December 19, 2024 , DOI: 10.15888/j.cnki.csa.009764
    Abstract:
    Key sentence extraction technology refers to using artificial intelligence to automatically find key sentences from a long text. This technology can be used for preprocessing information retrieval and is of great significance for downstream tasks such as text classification and extractive summarization. Traditional unsupervised key sentence extraction technologies are mostly based on statistics and graphical model methods, which have problems such as low accuracy and the need to build a large-scale corpus in advance. This study proposes T5KSEChinese, a method that can extract key sentences without supervision in the Chinese context. This method uses an encoder-decoder architecture to ignore the mismatch in length between the target sentence and the original text by inputting and outputting prompt words to obtain more accurate results. At the same time, a contrastive learning positive sample construction method is also proposed and combined with contrastive learning to conduct semi-supervised training on the encoder part of the model, which can improve the performance of downstream tasks. The method uses lightweight models to outperform the large language model with tens of times the number of parameters in the unsupervised downstream task. The final experimental results prove the accuracy and reliability of the proposed method.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009774
    Abstract:
    In the contemporary field of unsupervised deep hashing research, methods predicated on contrastive learning are predominant. However, sampling bias brought about by the random extraction of negative samples in contrastive learning deteriorates image retrieval accuracy. To address the issue, this study proposes a novel unsupervised deep hashing based on bias suppressing contrastive learning (BSCDH). It proposes a bias suppression method (BSS) based on a contrastive learning framework. This method approximates incorrect negative samples as extremely hard negative samples and designs a bias suppression coefficient to suppress these extremely hard negative samples, thereby alleviating the negative impact of sampling bias. The corresponding suppression coefficient value is determined based on the similarity between the current negative sample and the query sample. Distance relationship between the current negative sample and adjacent hash centers is introduced to correct the suppression coefficient value, reducing the possibility of excessive suppression of normal negative samples. Ultimately, the mAP@5000 of the BSCDH method (64 bits) achieves 0.696, 0.833, and 0.819 respectively on the CIFAR-10, FLICKR25K, and NUS-WIDE datasets, demonstrating a significant performance advantage over the baseline. Extensive experiments conducted in this paper verify that BSCDH exhibits high retrieval accuracy in unsupervised image retrieval methods and can effectively address sampling bias.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009769
    Abstract:
    Most of the existing knowledge graph link prediction methods focus only on the semantic relationships between a head entity h, a relationship r, and a tail entity t in a single triad in learning semantic information. They do not consider the links between related entities and entity relationships in different triads. To address this problem, this study proposes the DeepE_CL model. Firstly, the study uses the DeepE model to learn the semantic information of related triads and entities with the same entity relationship pairs or entity relationship pairs with the same entities. Secondly, the extracted semantic information of the related triads is used to calculate the corresponding scoring function and cross-entropy loss, and the extracted semantic information of entities with the same entity relationship pairs or entity relationship pairs with the same entities is optimized through the comparative learning model, so as to predict the missing information of the related triads. This paper validates the proposed method through four common datasets and compares the proposed method with other baseline models by applying four evaluation indicators, including MR, MRR, Hit@1, and Hit@10. The experimental results show that the DeepE_CL model achieves the best results in all indicators. To further validate the usefulness of the model, this study also applies the model to a real TCM dataset, and the experimental results show that compared with the DeepE model, the DeepE_CL model reduces the MR indicators by 18, and improves the MRR, Hit@1 indicators by 0.8%, 1.1%, and the Hit@10 indicators remain unchanged. The experiments demonstrate that the DeepE_CL model, introducing a comparative learning model, is very effective in improving the performance of knowledge graph link prediction.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009770
    Abstract:
    The density peaks clustering (DPC) algorithm achieves clustering by identifying cluster centers based on local density and relative distance. However, it tends to overlook cluster centers in low-density regions for data with uneven density distribution and unbalanced cluster sizes. Therefore, the number of clusters needs to be set artificially. Besides, if a data point allocation occurs to be wrong in the whole strategy, it will lead to incorrect allocation of subsequent points. To address these issues, this study proposes an adaptive sparse-aware density peaks clustering algorithm. Firstly, fuzzy points are introduced to minimize their impact on the subcluster merging process. Secondly, the subtractive clustering method is used to identify the low-density regions’ center. Then, noise is identified and subcluster centers are updated based on new local density and reverse nearest neighbor. Finally, a redefined global overlap metric combined with global separability guides subcluster merging while automatically determining clustering results using these metrics. Experimental results demonstrate that compared to DPC and its improved algorithms, the proposed algorithm effectively identifies sparse clusters in both synthetic and UCI datasets while reducing chain reactions caused by non-center assignments. Also, the proposed algorithm can automatically determine the optimal clustering number, ultimately yielding more accurate clustering results.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009765
    Abstract:
    Transformer-based object detection algorithms often suffer from problems such as insufficient accuracy and slow convergence. Although many studies have proposed improvements to address these problems and have achieved certain outcomes, most of them overlook two key shortcomings when applying Transformer structure to the field of object detection. Firstly, self-attention computation results are not diversified. Secondly, due to the complexity of set prediction, the models are unstable during target matching. To overcome these deficiencies, this study proposes several enhancements. Firstly, an adaptive token pooling module is designed to increase self-attention weight diversity. Secondly, a rough-prediction-based anchor box localization module is introduced, which provides positional prior information for queries to enhance stability during bipartite matching. Lastly, a group-based denoising task is designed, which trains the model to distinguish between positive and negative queries near the target, thereby improving the model’s ability to perform set prediction. Experimental results show that the proposed improved algorithm achieves better training results on the COCO dataset. Compared with the baseline model, the improved algorithm significantly outperforms in both detection accuracy and convergence speed.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009746
    Abstract:
    This study proposes an analysis method based on association mining between historical accident reports and a root cause index system to fully leverage experts’ experience in root cause analysis of past accidents and enhance the accuracy and comprehensiveness of such analysis, thereby reducing chemical safety incidents. By constructing an association matrix between accident reports and the index system, this method utilizes a pre-trained model to represent accident and index texts. It integrates secondary and tertiary index information based on an attention mechanism and finally employs a graph convolutional neural network for root cause analysis. Validation on a dataset of 1351 samples demonstrates that this method significantly improves the accuracy of root cause prediction, effectively utilizing expert analysis of historical accidents to analyze current accidents and uncover the limitations in previous accident analysis. Additionally, this method accurately identifies the root causes of accidents even with incomplete incident descriptions. The application of this method will enhance accident prevention and risk management in occupational safety.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009751
    Abstract:
    The YOLOv8n algorithm exhibits suboptimal performance when dealing with complex backgrounds, dense targets, and small-sized objects with limited pixel information, leading to reduced precision, missed detection, and misclassification. To address these issues, this study proposes an algorithm, LNCE-YOLOv8n, for safety equipment detection. This algorithm includes a linear multi-scale fusion attention (LMSFA) mechanism, which adaptively focuses on key features to improve the extraction of information from small targets while reducing computational loads. An architecture called C2f_New networks (C2f_NewNet) is also introduced, which maintains high performance and reduces depth through an effective parallelization design. Combined with a lightweight universal up-sampling operator, content-aware reassembly of features (CARAFE), the proposed algorithm realizes efficient cross-scale feature fusion and propagation and aggregates contextual information within a large receptive field. Based on the SIoU (symmetric intersection over union) loss function, this study proposed enhanced SIoU (ESIoU) to improve the adaptability and accuracy of the model in complex environments. Tested on a safety equipment dataset, LNCE-YOLOv8n outperforms YOLOv8n, exhibiting a 5.1% increase in accuracy, a 2.7% rise in mAP50, and a 3.4% boost in mAP50-95, significantly enhancing the detection accuracy of safety equipment for workers in complex construction conditions.
    Available online:  December 16, 2024 , DOI: 10.15888/j.cnki.csa.009752
    Abstract:
    Pneumonia is a prevalent respiratory disease for which early diagnosis is crucial to effective treatment. This study proposes a hybrid model, CTFNet, which combines convolutional neural network (CNN) and Transformer to aid in the effective and accurate diagnosis of pneumonia. The model integrates a convolutional tokenizer and a focused linear attention mechanism. The convolutional tokenizer performs more compact feature extraction through convolution operations, retaining key local features of images while reducing computational complexity to enhance model expressiveness. The focused linear attention mechanism reduces the computational demands of the Transformer and optimizes the attention framework, significantly improving model performance. On the Chest X-ray Images dataset, CTFNet demonstrates outstanding performance in pneumonia classification tasks, achieving an accuracy of 99.32%, a precision of 99.55%, a recall of 99.55%, and an F1 score of 99.55%. The impressive performance highlights the model’s potential for clinical applications. The model is evaluated on the COVID-19 Radiography Database dataset for its generalization ability. In this dataset, CTFNet achieves an accuracy above 98% in multiple binary classification tasks. These results indicate that CTFNet exhibits strong generalization ability and reliability across various tasks in pneumonia image classification.
    Available online:  December 13, 2024 , DOI: 10.15888/j.cnki.csa.009753
    Abstract:
    Traditional algorithms for knowledge-aware propagation recommendation face challenges including low correlation of higher-order features, unbalanced information utilization, and noise introduction. To address these challenges, this study proposes a multi-level contrastive learning for knowledge-aware propagation recommender algorithm utilizing knowledge enhancement (MCLK-KE). By constructing enhanced views and utilizing mask reconstruction-based self-supervised pre-training, the algorithm extracts deeper information from key triples to effectively suppress noise signals. It achieves a balanced utilization of knowledge and interactive signals while enhancing feature representation by comparing graphs to capture effective node attributes globally. Multi-task training significantly improves model performance by incorporating recommendation prediction, contrastive learning, and mask reconstruction tasks. In tests on three publicly available datasets, MCLK-KE demonstrates a maximum increase of 3.3% in AUC and 5.3% in F1 scores compared to the best baseline model.
    Available online:  December 13, 2024 , DOI: 10.15888/j.cnki.csa.009762
    Abstract:
    It is a significant challenge for high-precision 3D object detection for autonomous vehicles equipped with multiple sensors in the dusty wilderness. The variable wilderness terrain aggravates the regional feature differences of detected objects. Additionally, dust particles can blur the object features. To address these issues, this study proposes a 3D object detection method based on multi-modal feature dynamic fusion and constructs a multi-level feature self-adaptive fusion module and a feature alignment augmentation module. The former module dynamically adjusts the model’s attention to global-level features and regional-level features, leveraging multi-level receptive fields to reduce the impact of regional variances on recognition performance. The latter module bolsters the feature representation of regions of interest before multi-modal feature alignment, effectively suppressing interference factors such as dust. Experimental results show that compared with the average precision of the baseline, that of this approach is improved by 2.79% in the self-built wilderness dataset and by 1.7% in the hard-level test of the KITTI dataset. This shows our method has good robustness and precision.
    Available online:  December 13, 2024 , DOI: 10.15888/j.cnki.csa.009767
    Abstract:
    Cartoon character face detection is more challenging than face detection because it involves many difficult scenarios. Given the huge differences between different cartoon characters’ faces, this study proposes a cartoon character face detection algorithm, named YOLOv8-DEL. Firstly, the DBBNCSPELAN module is designed based on GELAN fusion BDD to reduce model size and enhance detection performance. Next, a multi-scale attention mechanism called ELA is introduced to improve the SPPF structure and enhance the feature extraction ability of the backbone model. Finally, a new detection head for shared convolution is designed to make the network lighter. At the same time, the original CIoU loss function is replaced by Shape-IoU to improve the convergence efficiency of the model. Experiments are carried out on the iCartoonFace dataset, and ablation experiments are carried out to verify the proposed model. Besides, the proposed model is compared with the YOLOv3-tiny, YOLOv5n, and YOLOv6 models. The mAP of the improved model YOLO-DEL reaches 90.3%, 1.2% higher than that of YOLOv8. The parameters are 1.69M, 47% lower than YOLOv8 and 44% lower than GFLOPs. Experimental results show that the proposed method effectively improves cartoon character face detection precision while compressing the network model’s size. Thus, the proposed method has proved to be effective.
    Available online:  December 09, 2024 , DOI: 10.15888/j.cnki.csa.009780
    Abstract:
    Aiming at the existing image dehazing algorithms which still have problems such as incomplete dehazing, blurred edges of dehazed images, and detail information loss, this study presents an image dehazing algorithm based on Transformer and gated fusion mechanism. Global features of the image are extracted by the improved channel self-attention mechanism to improve the efficiency of the model in processing images. A multi-scale gated fusion block is designed to capture features of different scales. The gated fusion mechanism improves the adaptability of the model to different degrees of dehazing by dynamically adjusting weights while better preserving the image edges and detail information. Residual connections are used to enhance the reusability of features and improve the generalization ability of the model. Experimental verification shows that the proposed dehazing algorithm can effectively restore the content information in real hazy images. On the synthesized hazy image dataset SOTS, the peak signal-to-noise ratio reaches 34.841 dB, and the structural similarity reaches 0.984. The dehazed image has complete content information without blurred detail information and incomplete dehazing.
    Available online:  December 09, 2024 , DOI: 10.15888/j.cnki.csa.009777
    Abstract:
    In response to challenges faced in crowd counting, such as non-uniform head sizes, uneven crowd density distribution, and complex background interference, a convolutional neural network (CNN) model (multi-scale feature weighted fusion attention convolutional neural network, MSFANet) that focuses on crowd regions and addresses multi-scale changes is proposed. The front end of the network adopts an improved VGG-16 model to perform the first step of coarse-grained feature extraction on the input crowd image. A multi-scale feature extraction module is added in the middle to extract the multi-scale feature information of the image. Then, an attention module is added to weigh the multi-scale features. At the back end, a sawtooth shaped dilated convolution module is adopted to increase the receptive field, extract the detailed features of the image, and generate high-quality crowd density maps. Experiments on this model are conducted on three public datasets. The results show that on the Shanghai Tech Part B dataset, the mean absolute error (MAE) is reduced to 7.8, and the mean squared error (MSE) decreases to 12.5. On the Shanghai Tech Part A dataset, the MAE is reduced to 64.9, and the MSE decreases to 108.4. On the UCF_CC_50 dataset, the MAE is reduced to 185.1, and the MSE decreases to 249.8. These experimental results affirm that the proposed model exhibits strong accuracy and robustness.
    Available online:  December 09, 2024 , DOI: 10.15888/j.cnki.csa.009784
    Abstract:
    Faced with insufficient labeled data in the field of video quality assessment, researchers begin to turn to self-supervised learning methods, aiming to learn video quality assessment models with the help of large amounts of unlabeled data. However, existing self-supervised learning methods primarily focus on video distortion types and content information, while ignoring dynamic information and spatiotemporal features of videos changing over time. This leads to unsatisfactory evaluation performance in complex dynamic scenes. To address these issues, a new self-supervised learning method is proposed. By taking playback speed prediction as an auxiliary pretraining task, the model can better capture dynamic changes and spatiotemporal features of videos. Combined with distortion type prediction and contrastive learning, the model’s sensitivity to video quality differences is enhanced. At the same time, to more comprehensively capture the spatiotemporal features of videos, a multi-scale spatiotemporal feature extraction module is further designed to enhance the model’s spatiotemporal modeling capability. Experimental results demonstrate that the proposed method significantly outperforms existing self-supervised learning-based approaches on the LIVE, CSIQ, and LIVE-VQC datasets. On the LIVE-VQC dataset, the proposed method achieves an average improvement of 7.90% and a maximum improvement of 17.70% in the PLCC index. Similarly, it also shows considerable competitiveness on the KoNViD-1k dataset. These results indicate that the proposed self-supervised learning framework effectively enhances the dynamic feature capture ability of the video quality assessment model and exhibits unique advantages in processing complex dynamic videos.
    Available online:  December 06, 2024 , DOI: 10.15888/j.cnki.csa.009747
    Abstract:
    Existing super-resolution reconstruction methods based on convolutional neural networks are limited by their receptive fields, which makes it difficult to fully utilize the rich contextual information and auto-correlation in remote sensing images, resulting in suboptimal reconstruction performance. To address this issue, this study proposes a novel network, termed MDT, a remote sensing image super-resolution rebuilding method based on multi-distillation and Transformer. Firstly, the network combines multiple distillations with a dual attention mechanism to progressively extract multi-scale features from low-resolution images, thereby reducing feature loss. Next, a convolutional modulation-based Transformer is constructed to capture global information in the images, recovering more complex texture details and enhancing the visual quality of the reconstructed images. Finally, a global residual path is added during upsampling to improve the propagation efficiency of features within the network, effectively reducing image distortion and artifacts. Experiments conducted on the AID and UCMerced datasets demonstrate that the proposed method achieves a peak signal-to-noise ratio (PSNR) and a peak structural similarity index (SSIM) of 29.10 dB and 0.7807, respectively, on ×4 super-resolution tasks. The quality of the reconstructed images is significantly improved, with better visual effects in terms of detail preservation.
    Available online:  December 06, 2024 , DOI: 10.15888/j.cnki.csa.009748
    Abstract:
    In computation-intensive and latency-sensitive tasks, unmanned aerial vehicle (UAV)-assisted mobile edge computing has been extensively studied due to its high mobility and low deployment costs. However, the energy consumption of UAVs limits their ability to work for extended periods, and there are often dependencies among different modules within offloading tasks. To address these issues, directed acyclic graph (DAG) is utilized to model the dependencies among internal modules of tasks. Considering the impacts of system latency and energy consumption, an optimal offloading strategy is derived to minimize system costs. To achieve optimization, a binary grey wolf optimization algorithm based on subpopulation, Gaussian mutation, and reverse learning (BGWOSGR) is proposed. Simulation results show that the proposed algorithm reduces system costs by around 19%, 27%, 16%, and 13% compared to four other methods, with a faster convergence speed.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009761
    Abstract:
    Distributed storage systems achieve high-reliability and low-overhead data storage by erasure code. To provide different reliability and access performance, storage systems need to perform redundancy transitions on erasure code data by changing coding parameters. The stripe merging mechanism provides a way for redundancy transitioning in storage systems. However, the stripe merging process based on traditional erasure code can result in a large amount of data block redistribution and checksum block re-computation I/O overhead. Worst still, the I/Os will be amplified in multiple merging operations. In response to these issues, this study proposes new Tree Reed-Solomon (TRS) codes that eliminate data block redistribution I/Os by decentralizing data blocks, and save checksum block re-computation I/Os by designing coding matrices. TRS codes further design storage units to organize the stripes taking part in merging into a tree, enabling multiple merging operations to be efficiently completed from bottom to top based on tree structure. To test the performance of TRS codes, this study designs and implements a distributed storage prototype. Experiments have shown that compared to other erasure codes, TRS codes can greatly reduce stripe merging operation time.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009757
    Abstract:
    The uncertain execution order of asynchronous messages in Android applications is the main reason for their flakiness. Most existing flaky test studies trigger instability testing by randomly determining the execution order of asynchronous messages, which is ineffective and inefficient. This study proposes a concurrent flaky test detection based on the Happens-Before (HB) relationship for Android applications. After analyzing the HB relationship between asynchronous messages in the execution trace of Android application test cases, the proposed method determines the asynchronous message workscope. Then, it designs a scheduling strategy with maximum differentiation to determine the asynchronous message execution order under guidance to maximize the difference between the asynchronous message execution order and the original test execution trace on the test execution trace after scheduling. Then, the method tries to change test execution results to detect flakiness in the test. For effectiveness verification of the method, experiments are conducted on 50 test cases of 40 Android applications, and the experimental results show that the method can detect all the flaky tests, improving the detection effect by 6% and shortening the average detection time by 31.78% compared with the current state-of-the-art techniques.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009771
    Abstract:
    There are two problems in existing hierarchical text classification model: underutilization of the label information across hierarchical instances, and lack of handling unbalanced label distribution. To solve these problems, this study proposes a hierarchical text classification method for label co-occurrence and long-tail distribution (LC-LTD) to study the global semantic of text based on shared labels and balanced loss function for long-tail distribution. First, a contrastive learning objective based on shared labels is devised to narrow the semantic distance between text representations with more shared labels in feature space and to guide the model to generate discriminative semantic representations. Second, the distribution balanced loss function is introduced to replace binary cross-entropy loss to alleviate the long-tail distribution problem inherent in hierarchical classification, improving the generalization ability of the model. LC-LTD is compared with various mainstream models on WOS and BGC public datasets, and the results show that the proposed method achieves better classification performance and is more suitable for hierarchical text classification.
    Available online:  November 28, 2024 , DOI: 10.15888/j.cnki.csa.009772
    Abstract:
    Image steganalysis aims to detect whether an image undergoes steganography processing and thus carries secret information. Steganalysis algorithm based on Siamese networks determines whether an image carries secret information by calculating the dissimilarity between the left and right partitions of the image to be detected. This approach currently boasts relatively high accuracy among deep learning image steganalysis algorithms. However, Siamese network-based image steganalysis algorithms still have certain limitations. First, the convolutional blocks stacked in the preprocessing and feature extraction layers of the Siamese network overlook the issue of steganographic signals easily being lost as they are transmitted from shallow to deep layers. Second, SRM filters used in existing Siamese networks still employ high-pass filters from other networks to suppress image content, ignoring single-sized generated residual maps. To address the above problems, this study proposes a Siamese network image steganalysis method based on enhanced residual features. The proposed method designs an attention-based inverted residual module. By adding the attention-based inverted residual module after the convolutional blocks in the preprocessing and feature extraction layers, it reuses image features, introduces an attention mechanism, and enables the network to assign more weights to feature maps of complex-textured image regions. Meanwhile, to better suppress image content, a multi-scale filter is proposed, adjusting the residual types to operate with convolutional kernels of different sizes, thereby enriching residual features. Experimental results show that the proposed attention-based inverted residual module and multi-scale filter provide better classification performance compared to existing methods.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009758
    Abstract:
    In autonomous driving, the task of using bird’s eye view (BEV) for 3D object detection has attracted significant attention. Existing camera-to-BEV transformation methods are facing challenges of insufficient real-time performance and high deployment complexity. To address these issues, this study proposes a simple and efficient view transformation method that can be deployed without any special engineering operations. First, to address the redundancy in complete image features, a width feature extractor is introduced and supplemented by a monocular 3D detection task to refine the key features of the image. In this way, the minimal information loss in the process can be ensured. Second, a feature-guided polar coordinate positional encoding method is proposed to enhance the mapping relationship between the camera view and the BEV representation, as well as the spatial understanding of the model. Lastly, the study has achieved the interaction between learnable BEV embeddings and width image features through a single-layer cross-attention mechanism, thus generating high-quality BEV features. Experimental results show that, compared to lift, splat, shoot (LSS), on the nuScenes validation set, this network structure improves mAP from 29.5% to 32.0%, an increase of 8.5%, and NDS from 37.1% to 38.0%, an increase of 2.4%. This demonstrates the effectiveness of the model in 3D object detection tasks in autonomous driving scenarios. Additionally, compared to LSS, it reduces latency by 41.12%.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009755
    Abstract:
    Unmanned aerial vehicle (UAV) is equipped with an edge server to constitute a mobile edge server. It can provide computing services for user equipment (UE) in some scenarios where base stations are difficult to deploy. With the help of deep reinforcement learning to train the intelligent body, it can formulate reasonable offloading decisions in a continuous and complex state space. It can also offload partial computing-intensive missions produced by users to edge servers for execution, thus improving the working and responding time of the system. However, at the moment, the fully connected neural networks used by the deep reinforcement learning algorithm are unable to handle the time-series data in the scenarios of UAV-assisted mobile edge computing (MEC). In addition, the training efficiency of the algorithm is low, and the decision-making performance is poor. To address the above problems, this study proposes a twin delayed deep deterministic policy gradient algorithm based on long short term memory (LSTM-TD3), using LSTM to improve the Actor-Critic network structure of the TD3 algorithm. In this way, the network is divided into three parts: the memory extraction unit containing LSTM, the current feature extraction unit, and the perceptual integration unit. Besides, the sample data in the experience pool are improved, and the historical data are defined, which provides the memory extraction unit with a better training effect. Simulation results show that, compared with the AC algorithm, the DQN algorithm, and the DDPG algorithm, the LSTM-TD3 algorithm has the best performance when optimizing the offloading strategy with the minimum total delay of the system as the target.
    Available online:  November 15, 2024 , DOI: 10.15888/j.cnki.csa.009739
    Abstract:
    To solve the vehicle routing problem with time windows (VRPTW), this study establishes a mixed-integer programming model aimed at minimizing total distance and proposes a hybrid ant colony optimization algorithm with relaxed time window constraints. Firstly, an improved ant colony algorithm, combined with TSP-Split encoding and decoding, is proposed to construct a routing solution that allows time-window constraints to be violated, to improve the global optimization ability of the algorithm. Then, a repair strategy based on variable neighborhood search is proposed to repair infeasible solutions using the principle of return in time and the penalty function method. Finally, 56 Solomon and 12 Homberger benchmark instances are tested. The results show that the proposed algorithm is superior to the comparative algorithms from references. The known optimal solution can be obtained in 50 instances, and quasi-optimal solutions can be obtained in the remaining instances within acceptable computing time. The results prove the effectiveness of the proposed algorithm.
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2000,9(2):38-41, DOI:
    [Abstract] (12747) [HTML] (0) [PDF ] (22864)
    Abstract:
    本文详细讨论了VRML技术与其他数据访问技术相结合 ,实现对数据库实时交互的技术实现方法 ,并简要阐述了相关技术规范的语法结构和技术要求。所用技术手段安全可靠 ,具有良好的实际应用表现 ,便于系统移植。
    1993,2(8):41-42, DOI:
    [Abstract] (9795) [HTML] (0) [PDF ] (32612)
    Abstract:
    本文介绍了作者近年来应用工具软件NU清除磁盘引导区和硬盘主引导区病毒、修复引导区损坏磁盘的 经验,经实践检验,简便有效。
    1995,4(5):2-5, DOI:
    [Abstract] (9354) [HTML] (0) [PDF ] (14957)
    Abstract:
    本文简要介绍了海关EDI自动化通关系统的定义概况及重要意义,对该EDI应用系统下的业务运作模式所涉及的法律问题,采用EDIFACT国际标准问题、网络与软件技术问题,以及工程管理问题进行了结合实际的分析。
    2016,25(8):1-7, DOI: 10.15888/j.cnki.csa.005283
    [Abstract] (9006) [HTML] () [PDF 1167952] (39729)
    Abstract:
    从2006年开始,深度神经网络在图像/语音识别、自动驾驶等大数据处理和人工智能领域中都取得了巨大成功,其中无监督学习方法作为深度神经网络中的预训练方法为深度神经网络的成功起到了非常重要的作用. 为此,对深度学习中的无监督学习方法进行了介绍和分析,主要总结了两类常用的无监督学习方法,即确定型的自编码方法和基于概率型受限玻尔兹曼机的对比散度等学习方法,并介绍了这两类方法在深度学习系统中的应用,最后对无监督学习面临的问题和挑战进行了总结和展望.
    2008,17(5):122-126, DOI:
    [Abstract] (8011) [HTML] (0) [PDF ] (49426)
    Abstract:
    随着Internet的迅速发展,网络资源越来越丰富,人们如何从网络上抽取信息也变得至关重要,尤其是占网络资源80%的Deep Web信息检索更是人们应该倍加关注的难点问题。为了更好的研究Deep Web爬虫技术,本文对有关Deep Web爬虫的内容进行了全面、详细地介绍。首先对Deep Web爬虫的定义及研究目标进行了阐述,接着介绍了近年来国内外关于Deep Web爬虫的研究进展,并对其加以分析。在此基础上展望了Deep Web爬虫的研究趋势,为下一步的研究奠定了基础。
    2011,20(11):80-85, DOI:
    [Abstract] (7709) [HTML] () [PDF 863160] (43506)
    Abstract:
    在研究了目前主流的视频转码方案基础上,提出了一种分布式转码系统。系统采用HDFS(HadoopDistributed File System)进行视频存储,利用MapReduce 思想和FFMPEG 进行分布式转码。详细讨论了视频分布式存储时的分段策略,以及分段大小对存取时间的影响。同时,定义了视频存储和转换的元数据格式。提出了基于MapReduce 编程框架的分布式转码方案,即Mapper 端进行转码和Reducer 端进行视频合并。实验数据显示了转码时间随视频分段大小和转码机器数量不同而变化的趋势。结
    1999,8(7):43-46, DOI:
    [Abstract] (7382) [HTML] (0) [PDF ] (24523)
    Abstract:
    用较少的颜色来表示较大的色彩空间一直是人们研究的课题,本文详细讨论了半色调技术和抖动技术,并将它们扩展到实用的真彩色空间来讨论,并给出了实现的算法。
    2022,31(5):1-20, DOI: 10.15888/j.cnki.csa.008463
    [Abstract] (6956) [HTML] (4363) [PDF 2584043] (6512)
    Abstract:
    深度学习方法的提出使得机器学习研究领域得到了巨大突破, 但是却需要大量的人工标注数据来辅助完成. 在实际问题中, 受限于人力成本, 许多应用需要对从未见过的实例类别进行推理判断. 为此, 零样本学习(zero-shot learning, ZSL)应运而生. 图作为一种表示事物之间联系的自然数据结构, 目前在零样本学习中受到了越来越多的关注. 本文对零样本图学习方法进行了系统综述. 首先概述了零样本学习和图学习的定义, 并总结了零样本学习现有的解决方案思想. 然后依据图的不同利用方式对目前零样本图学习的方法体系进行了分类. 接下来讨论了零样本图学习所涉及到的评估准则和数据集. 最后指明了零样本图学习进一步研究中需要解决的问题以及未来可能的发展方向.
    2012,21(3):260-264, DOI:
    [Abstract] (6603) [HTML] () [PDF 336300] (45792)
    Abstract:
    开放平台的核心问题是用户验证和授权问题,OAuth 是目前国际通用的授权方式,它的特点是不需要用户在第三方应用输入用户名及密码,就可以申请访问该用户的受保护资源。OAuth 最新版本是OAuth2.0,其认证与授权的流程更简单、更安全。研究了OAuth2.0 的工作原理,分析了刷新访问令牌的工作流程,并给出了OAuth2.0 服务器端的设计方案和具体的应用实例。
    2007,16(9):22-25, DOI:
    [Abstract] (6555) [HTML] (0) [PDF ] (7717)
    Abstract:
    本文结合物流遗留系统的实际安全状态,分析了面向对象的编程思想在横切关注点和核心关注点处理上的不足,指出面向方面的编程思想解决方案对系统进行分离关注点处理的优势,并对面向方面的编程的一种具体实现AspectJ进行分析,提出了一种依据AspectJ对遗留物流系统进行IC卡安全进化的方法.
    (), DOI:
    [Abstract] (6535) [HTML] (19) [PDF ] (14)
    Abstract:
    2011,20(7):184-187,120, DOI:
    [Abstract] (6454) [HTML] () [PDF 731903] (34531)
    Abstract:
    针对智能家居、环境监测等的实际要求,设计了一种远距离通讯的无线传感器节点。该系统采用集射频与控制器于一体的第二代片上系统CC2530 为核心模块,外接CC2591 射频前端功放模块;软件上基于ZigBee2006 协议栈,在ZStack 通用模块基础上实现应用层各项功能。介绍了基于ZigBee 协议构建无线数据采集网络,给出了传感器节点、协调器节点的硬件设计原理图及软件流程图。实验证明节点性能良好、通讯可靠,通讯距离较TI 第一代产品有明显增大。
    2019,28(6):1-12, DOI: 10.15888/j.cnki.csa.006915
    [Abstract] (6166) [HTML] (19691) [PDF 672566] (27067)
    Abstract:
    知识图谱是以图的形式表现客观世界中的概念和实体及其之间关系的知识库,是语义搜索、智能问答、决策支持等智能服务的基础技术之一.目前,知识图谱的内涵还不够清晰;且因建档不全,已有知识图谱的使用率和重用率不高.为此,本文给出知识图谱的定义,辨析其与本体等相关概念的关系.本体是知识图谱的模式层和逻辑基础,知识图谱是本体的实例化;本体研究成果可以作为知识图谱研究的基础,促进知识图谱的更快发展和更广应用.本文罗列分析了国内外已有的主要通用知识图谱和行业知识图谱及其构建、存储及检索方法,以提高其使用率和重用率.最后指出知识图谱未来的研究方向.
    2004,13(10):7-9, DOI:
    [Abstract] (6095) [HTML] (0) [PDF ] (12770)
    Abstract:
    本文介绍了车辆监控系统的组成,研究了如何应用Rockwell GPS OEM板和WISMOQUIKQ2406B模块进行移动单元的软硬件设计,以及监控中心 GIS软件的设计.重点介绍嵌入TCP/IP协议处理的Q2406B模块如何通过AT指令接入Internet以及如何和监控中心传输TCP数据.
    2008,17(1):113-116, DOI:
    [Abstract] (6034) [HTML] (0) [PDF ] (50868)
    Abstract:
    排序是计算机程序设计中一种重要操作,本文论述了C语言中快速排序算法的改进,即快速排序与直接插入排序算法相结合的实现过程。在C语言程序设计中,实现大量的内部排序应用时,所寻求的目的就是找到一个简单、有效、快捷的算法。本文着重阐述快速排序的改进与提高过程,从基本的性能特征到基本的算法改进,通过不断的分析,实验,最后得出最佳的改进算法。
    2008,17(8):87-89, DOI:
    [Abstract] (5959) [HTML] (0) [PDF ] (42799)
    Abstract:
    随着面向对象软件开发技术的广泛应用和软件测试自动化的要求,基于模型的软件测试逐渐得到了软件开发人员和软件测试人员的认可和接受。基于模型的软件测试是软件编码阶段的主要测试方法之一,具有测试效率高、排除逻辑复杂故障测试效果好等特点。但是误报、漏报和故障机理有待进一步研究。对主要的测试模型进行了分析和分类,同时,对故障密度等参数进行了初步的分析;最后,提出了一种基于模型的软件测试流程。
    2008,17(8):2-5, DOI:
    [Abstract] (5826) [HTML] (0) [PDF ] (33501)
    Abstract:
    本文介绍了一个企业信息门户中单点登录系统的设计与实现。系统实现了一个基于Java EE架构的结合凭证加密和Web Services的单点登录系统,对门户用户进行统一认证和访问控制。论文详细阐述了该系统的总体结构、设计思想、工作原理和具体实现方案,目前系统已在部分省市的广电行业信息门户平台中得到了良好的应用。
    2004,13(8):58-59, DOI:
    [Abstract] (5786) [HTML] (0) [PDF ] (29043)
    Abstract:
    本文介绍了Visual C++6.0在对话框的多个文本框之间,通过回车键转移焦点的几种方法,并提出了一个改进方法.
    2009,18(5):182-185, DOI:
    [Abstract] (5745) [HTML] (0) [PDF ] (35445)
    Abstract:
    DICOM 是医学图像存储和传输的国际标准,DCMTK 是免费开源的针对DICOM 标准的开发包。解读DICOM 文件格式并解决DICOM 医学图像显示问题是医学图像处理的基础,对医学影像技术的研究具有重要意义。解读了DICOM 文件格式并介绍了调窗处理的原理,利用VC++和DCMTK 实现医学图像显示和调窗功能。
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2007,16(10):48-51, DOI:
    [Abstract] (4887) [HTML] (0) [PDF 0.00 Byte] (89607)
    Abstract:
    论文对HDF数据格式和函数库进行研究,重点以栅格图像为例,详细论述如何利用VC++.net和VC#.net对光栅数据进行读取与处理,然后根据所得到的象素矩阵用描点法显示图像.论文是以国家气象中心开发Micaps3.0(气象信息综合分析处理系统)的课题研究为背景的.
    2002,11(12):67-68, DOI:
    [Abstract] (4200) [HTML] (0) [PDF 0.00 Byte] (60527)
    Abstract:
    本文介绍非实时操作系统Windows 2000下,利用VisualC++6.0开发实时数据采集的方法.所用到的数据采集卡是研华的PCL-818L.借助数据采集卡PCL-818L的DLLs中的API函数,提出三种实现高速实时数据采集的方法及优缺点.
    2008,17(1):113-116, DOI:
    [Abstract] (6034) [HTML] (0) [PDF 0.00 Byte] (50861)
    Abstract:
    排序是计算机程序设计中一种重要操作,本文论述了C语言中快速排序算法的改进,即快速排序与直接插入排序算法相结合的实现过程。在C语言程序设计中,实现大量的内部排序应用时,所寻求的目的就是找到一个简单、有效、快捷的算法。本文着重阐述快速排序的改进与提高过程,从基本的性能特征到基本的算法改进,通过不断的分析,实验,最后得出最佳的改进算法。
    2008,17(5):122-126, DOI:
    [Abstract] (8011) [HTML] (0) [PDF 0.00 Byte] (49419)
    Abstract:
    随着Internet的迅速发展,网络资源越来越丰富,人们如何从网络上抽取信息也变得至关重要,尤其是占网络资源80%的Deep Web信息检索更是人们应该倍加关注的难点问题。为了更好的研究Deep Web爬虫技术,本文对有关Deep Web爬虫的内容进行了全面、详细地介绍。首先对Deep Web爬虫的定义及研究目标进行了阐述,接着介绍了近年来国内外关于Deep Web爬虫的研究进展,并对其加以分析。在此基础上展望了Deep Web爬虫的研究趋势,为下一步的研究奠定了基础。

External Links

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063