AIPUB归智期刊联盟
ZHAO Ya , GAO Ming-Chao , YAO Wen-Da , XU Feng
2025, 34(4):1-17. DOI: 10.15888/j.cnki.csa.009839 CSTR: 32024.14.csa.009839
Abstract:In recent years, as the forged face technology rapidly develops, the face synthesized has been extremely hard for the human eyes to identify, and the application of this technology by some criminals has badly threatened social stability and personal privacy, so the importance of forged face detection technology has become increasingly prominent. This review systematically discusses the current status of forged face detection technology, mainly from two aspects of forged face image detection and forged face video detection. In the aspect of forged face image detection, the methods based on the image spatial domain and frequency domain, identity consistency detection, and the application of face region localization technology are discussed. In the field of forged face video detection, the research focuses on the integration of spatio-temporal features, the utilization of physiological features, and the combination of audiovisual information. In addition, the study introduces the commonly used evaluation indicators and systematically analyzes a variety of important data sets, including their characteristics and application scenarios. At the same time, it also points out the limitations in the current literature, such as the lack of robustness of adversarial samples and the poor adaptability of detection methods to new forgery techniques. Based on these analyses, this study puts forward the possible research directions in the future, including the optimization of cross-domain detection technology, the exploration of new algorithms, and the study of the model interpretability. This review not only provides researchers with a comprehensive understanding of fake face detection technology but also points out the development direction for subsequent research, possessing high theoretical value and practical application significance.
YU Wei-Dong , LU Jing , CHENG Han-Lei
2025, 34(4):18-33. DOI: 10.15888/j.cnki.csa.009865 CSTR: 32024.14.csa.009865
Abstract:The neural radiation field (NeRF) has significant advantages in generating high-fidelity maps thanks to its neural implicit representation-based scene. The application of NeRF in simultaneous localization and mapping (SLAM), namely the NeRF-based SLAM method, enables continuous 3D modeling while achieving high-precision localization to enhance the quality and detail of the scene reconstruction by rendering new perspectives and predicting unknown regions. To track the latest research results in this field, this study reviews and summarizes the key algorithms of NeRF-based SLAM in recent years. Firstly, the core principle of NeRF technology is introduced and a comprehensive overview of the framework of NeRF-based SLAM methods is given, followed by focusing on the improvements and optimizations of NeRF-based SLAM, including improving the efficiency of neural implicit representation, solving the large-scale scene building problem, adding loopback and global optimization to achieve global consistency and solving the dynamic interference problem. Finally, an outlook on the NeRF-based SLAM method is presented to provide valuable references for related researchers to promote more innovative research.
2025, 34(4):34-44. DOI: 10.15888/j.cnki.csa.009819 CSTR: 32024.14.csa.009819
Abstract:Aiming at the poor accuracy of monocular 3D object detection algorithms caused by the scale differences of objects with different depths in monocular images, a detection algorithm based on fused sampling and depth-scale constraints is proposed. Firstly, to enhance the ability of the sampled features to represent objects at different scales, a multi-scale fusion module (MFM) is constructed. It fuses the sampled features at different levels and scales through hierarchical aggregation and iterative aggregation, thereby improving the ability to extract implicit scale features of the objects. In addition, a depth-scale correlation module (DSCM) is constructed. It uses the linear projection constraint between depth and scale for compensatory scaling of objects at different scales to the same feature level, balancing the model's focus on objects at different distances. Quantitative results based on the KITTI dataset and Waymo dataset show that for both types of datasets, the proposed algorithm improves the overall average accuracy AP3D by 1.56 percentage points and 3.07 percentage points, respectively, compared to similar algorithms under multiple difficulties, which verifies the effectiveness and generalization of the algorithm. Meanwhile, qualitative results based on the two datasets validate that the algorithm significantly mitigates the impact of the object scale differences on detection performance.
XI Ren-Na , ZHANG Tai-Hong , YAO Zhi-Xin
2025, 34(4):45-54. DOI: 10.15888/j.cnki.csa.009838 CSTR: 32024.14.csa.009838
Abstract:To address the issues of limited sample size and imbalanced categories in existing rural road image datasets, a data augmentation method based on an improved StyleGAN is proposed. This approach introduces a decoupled mapping network into the original StyleGAN framework to reduce the coupling degree of the W-space latent code. By integrating the advantages of convolution and Transformer, this study designs a convolution-coupled transfer block (CCTB). The core cross-window self-attention mechanism within this module enhances the network’s ability to capture complex context and spatial layouts. These two improvements significantly boost network performance. Ablation experiments comparing the original and improved StyleGAN networks show that the IS index increases from 42.38 to 77.31, and the FID value decreases from 25.09 to 12.42, demonstrating a substantial improvement in data generation quality and authenticity. To verify the impact of data augmentation on model performance, two classic and mainstream object detection algorithms are used for testing. Performance differences between the original and augmented datasets are compared, further confirming the effectiveness of the improved methods.
CHEN Can-Peng , WU Gui-Xing , GUO Yan , LI Chun-Jie
2025, 34(4):55-63. DOI: 10.15888/j.cnki.csa.009825 CSTR: 32024.14.csa.009825
Abstract:Currently, there are various methods for identifying lies, including the use of lie detectors. However, these methods have limited effectiveness in execution, as they not only require contact with the subject being tested for lies but also require relevant personnel to possess professional knowledge, making them inconvenient and less effective. Psychological research shows that micro-expressions are subtle muscle movements on the face with an extremely short duration, which can reflect a person’s true inner state when they occur. Related studies show that micro-expression features can serve as clues for deception recognition. This study focuses on deception recognition based on micro-expression features. Firstly, a dataset called MED, which contains micro-expression data when people are lying, is constructed. Secondly, a micro-expression feature learning model named MEDR based on a multi-layer self-attention mechanism is designed. It can recognize lies based on the learned micro-expression features in both lying and non-lying situations. Finally, experimental comparisons between the proposed model and some existing models are conducted on the newly constructed dataset. Experimental results show that the proposed model achieves an accuracy of 94.33% on the self-made high-quality dataset, indicating its excellent performance in deception recognition.
YAO Jia-Peng , ZI Ling-Ling , XIE Yi-Sha
2025, 34(4):64-75. DOI: 10.15888/j.cnki.csa.009808 CSTR: 32024.14.csa.009808
Abstract:With the application of network video platform (NVP), network videos often face copyright infringement and cross-platform copyright detection issues when shared across different video platforms. Therefore, this study proposes a blockchain-based cross-platform network video copyright protection scheme (BCVCP), which aims to protect network video copyrights across platforms by means of blockchain and through ownership sequence (OS) generation and detection. This study includes identity authentication, keyframe extraction, ownership sequence generation and detection, and network video control management. Specifically, before operations such as video uploading or access, identity authentication needs to be carried out to ensure identity information security. Secondly, during the process of uploading network videos, an ownership sequence is generated and stored in distributed nodes. Then, the keyframes of the video are extracted and the generated ownership sequence is embedded into these keyframes. Finally, smart contracts are invoked for cross-platform ownership sequence detection and network video dissemination management to avoid infringement behaviors. In the experiments, the robustness of ownership encoding quality and ownership recognition during cross-platform network video transmission is verified, thereby protecting the copyright of network videos.
SHI Jing-Ye , LUO Ya-Lu , ZHANG Meng-Ge , ZHI Rui-Cong , LIU Ji-Qiang
2025, 34(4):76-89. DOI: 10.15888/j.cnki.csa.009816 CSTR: 32024.14.csa.009816
Abstract:A majority of research methods neglect significant variations exhibited in the style of bi-temporal remote sensing images acquired at different times for the same area, leading to unsatisfactory model performance indexes and visualization when the model is applied to stylistically diverse datasets. To address this issue, a style transfer module is used in this article to generate an image with a style similar to that of another moment for the original image at a certain moment. Subsequently, a symmetrical difference feature pyramid network (SDFPNet) based on bi-directional style transfer is proposed to determine the influence degree of different style transfer directions on the improvement of change detection accuracy. Specifically, two lightweight Siamese networks and difference feature pyramid network (DFPNet) are used to conduct parameter optimization on the inputted original and stylized as SDFPNet, producing the change maps predicted by two parallel branches. To reduce the misclassification of changed pixels, the two prediction results are merged to improve the accuracy of change detection. Experiments on three datasets, LEVIR-CD, CDD, and SYSU-CD, demonstrate that the proposed SDFPNet based on bi-directional style transfer outperforms state-of-the-art (SOTA) methods in remote sensing change detection task, with results of CDD and SYSU-CD datasets more convincing, which have large style differences due to seasonal changes. The detection accuracy reaches 99.37% and F2 score reaches 94.19% on the CDD dataset, and the detection accuracy reaches 92.31% on the SYSU-CD dataset. The proposed method in this article effectively solves the problem of poor change detection accuracy caused by large style differences in bi-temporal images.
2025, 34(4):90-103. DOI: 10.15888/j.cnki.csa.009804 CSTR: 32024.14.csa.009804
Abstract:Considering the lack of extraction and utilization of higher-order features and data sparsity in current graph neural network-based session recommendation methods, a self-supervised session recommendation incorporating dynamic multi-level gated graph neural network (GGNN) and hypergraph convolution (SDMHC-GNN) is proposed. Firstly, different graph structures are used to model the session sequence into three different views: session view, hypergraph view, and relational view. The session view uses dynamic multi-level gated graph neural networks, sparse self-attention, and sparse global attention mechanisms to generate local sequential session representations. The hypergraph view uses hypergraph convolution and soft attention mechanisms to generate higher-order session representations. The relational view uses graph convolution and sparse cross-attention mechanisms to generate session relational representations. Secondly, the mutual features among different session representations are maximized by self-supervised learning. Finally, the current session representation is filtered and enhanced by the intentional neighbor collaboration module. Multiple experiments are conducted on two public data sets, Diginetica and Tmall, and compared with advanced baseline models. The experimental results indicate that the performance of the proposed model is superior to that of the baseline model, proving the effectiveness of the model.
ZHANG Wei , YIN Yi , LIN Yu-Bin
2025, 34(4):104-114. DOI: 10.15888/j.cnki.csa.009849 CSTR: 32024.14.csa.009849
Abstract:To address the inadequate restoration of textures and edge details in super-resolution reconstruction of rock CT images, along with the high resource consumption of traditional Transformer models, this study proposes a lightweight hybrid architecture, the pixel difference convolution and lightweight Transformer (PDCLT) model. The model integrates a detail-enhancement convolutional neural network (CNN) module based on pixel difference convolution and a lightweight Transformer module to efficiently extract both local and global features. Specifically, the model first introduces a detail enhancement module that combines pixel difference convolution with residual enhanced attention. It also proposes an adaptive path weight scaling method to dynamically adjust the weights of feature extraction paths, which enhances the capture of subtle structures and key features. Secondly, the lightweight Transformer module incorporates efficient multi-head self-attention and a multi-scale feature fusion network to reduce GPU memory demands while extracting global and multi-scale features. Finally, porosity loss is added to the loss function to optimize the preservation of pore structures. Experimental results show that the PDCLT model excels in reconstruction quality and detail restoration, significantly improving the super-resolution reconstruction quality of rock CT images.
SUN Si-Yu , ZHAI Gao-Shou , YU Zhao-Yang
2025, 34(4):115-124. DOI: 10.15888/j.cnki.csa.009814 CSTR: 32024.14.csa.009814
Abstract:Linux and other large-scale software usually use configuration files to adjust system functions. When the number of configuration items is large, the dependencies between them will become complex and error-prone. If the configuration dependency constraints are not properly defined, under certain conditions, even if the corresponding configuration item is selected, it will not take effect due to potential dependency problems, or even lead to system compilation or operation errors. Existing studies focus on Kconfig files and only consider configuration errors caused by reverse dependencies. This study comprehensively analyzes Kconfig and Makefile and investigates four scenarios of direct and reverse dependencies of Kconfig, inconsistent dependencies of the two, and the lack of definition of the latter’s configuration item in the former, in order to find as many potential problems as possible. On this basis, the study designs a configuration error detection tool for the Linux 6.7 kernel source code and identifies 52 configuration errors, which verifies the effectiveness and practicality of the methodology and prototype system in this study.
2025, 34(4):125-135. DOI: 10.15888/j.cnki.csa.009818 CSTR: 32024.14.csa.009818
Abstract:In recent years, the deployment rate of resource public key infrastructure (RPKI) has been increasing year by year, which challenges the performance and efficiency of the original monolithic synchronization architecture of the relying party software. Hence, its architectural design needs to be reevaluated to adapt to the evolution of RPKI technology. This study sorts out and analyzes the RPKI synchronization tasks, and then designs an RPKI relying party synchronization system based on the above analysis. Compared with the monolithic architecture, this distributed architecture boasts higher synchronization performance and node fault tolerance. At the same time, this study designs a variety of scheduling algorithms for the system. To further optimize the performance of the system, this study carries out groups of control analysis experiments of these scheduling algorithms and task scheduling strategies. From the experimental results, the dynamic scheduling algorithm under the large job first (LJF) task scheduling strategy has the best performance in this distributed system.
XIE Yi-Sha , ZI Ling-Ling , YAO Jia-Peng
2025, 34(4):136-145. DOI: 10.15888/j.cnki.csa.009820 CSTR: 32024.14.csa.009820
Abstract:When using a consensus speed advisory system (CSAS) to recommend speeds for vehicle fleets, challenges often arise regarding the untrustworthiness of the service and the transmission of incorrect data among vehicles. Additionally, existing research mainly focuses on speed advisory scenarios for flat roads. If the speed recommendations for flat roads are applied to sloped roads, vehicles may consume more energy, failing to achieve the optimization goal of minimum energy consumption. To address these issues, this study proposes a blockchain-based consensus speed advisory framework for sloped roads. This framework extends existing CSAS to sloped road scenarios, further solving the problem of optimizing the minimum energy consumption for autonomous vehicles on sloped roads. At the same time, private blockchains and cryptographic primitives are introduced to ensure the trustworthiness of the service and the privacy of data transmission among vehicles. By implementing this framework with Ethereum private blockchains and Truffle, experimental results show that the framework can provide trustworthy consensus speed recommendations in sloped road scenarios and effectively reduce vehicle energy consumption.
2025, 34(4):146-154. DOI: 10.15888/j.cnki.csa.009833 CSTR: 32024.14.csa.009833
Abstract:In the production of multiple types and small batches of small precision devices (diameter 16–40 mm), the main assembly tasks are mainly completed by fixed station robots, with this assembly mode having a large cost. For such small devices, the automatic guided vehicle (AGV) on the market have the problems of poor flexibility and low positioning accuracy. To address these problems, this study designs and develops an omnidirectional AGV autonomous navigation equipped with industrial cameras and dual manipulators to complete the dynamic combination of multiple production lines and realize the orderly auxiliary assembly of various types of devices. To improve the positioning accuracy, the Bayesian rule fuses 2D LiDAR and RGB-D to establish a fused raster map to improve the obstacle detection rate. EKF is used to fuse the data of the wheeled odometer and the IMU to improve the accuracy of the odometer and reduce the motion error. For the sake of improving work efficiency and making innovations in real-time performance, the distance between the precision device to be grasped and the camera optical center is obtained through RGB-D, and the information such as vehicle speed and the pose relationship between radar and camera are fused to calculate the optimal movement time of the vehicle-mounted dual manipulators at the distance S from the precision device to be grasped. Finally, to accurately identify multi-type and small-batch small precision devices, the improved Yolo-Fastest algorithm is used, which improves the recognition accuracy and reduces the computing cost of the AGV. Test results show that the system for small precision devices (e.g., RF connectors) has identification accuracy greater than 95%. In 70×50×100 cm3 space, it can achieve omnidirectional movement, and the maximum motion error is 10 cm. Compared with the existing production mode, the AGV has improved flexibility, reduced production cost, and nearly doubled work efficiency, worthy of practical promotion.
2025, 34(4):155-165. DOI: 10.15888/j.cnki.csa.009850 CSTR: 32024.14.csa.009850
Abstract:Data sparsity occurs in recommendation systems and the cold-start problem exists in newly launched items due to a lack of user interaction data when providing targeted user interest recommendations. To address these problems, this study proposes a user interest recommendation algorithm based on knowledge graphs. First, to tackle the data sparsity issue in users’ potential interests, it employs a multi-layer graph neural network (GNN) to capture the direct, indirect, and deeper relationships between users and items through their embedding vectors. Second, for users’ explicit interests, it introduces a graph structure enhancement technique to randomly delete explicit relationships between users and items based on rating weights. This method leverages an encoder to analyze the relationships of new users and item nodes, uncovering interactive relationships between users and items, thereby addressing the cold-start problem. Finally, a feature cross-compression module is used to combine knowledge graph embeddings with the recommendation task to achieve feature sharing. The shared features further deepen the interaction between items and knowledge graph entities, enhancing recommendation accuracy. Experiments conducted on the Book-Crossing and Last.FM datasets demonstrate that the proposed algorithm significantly outperforms other baseline algorithms in terms of AUC and ACC indicators.
2025, 34(4):166-174. DOI: 10.15888/j.cnki.csa.009832 CSTR: 32024.14.csa.009832
Abstract:Accurate image segmentation of pulmonary nodules is of great significance for the early diagnosis of lung cancer. To solve the problem of insufficient feature extraction and detail loss caused by multiple scales and blurred edges of pulmonary nodules image, this study proposes a pulmonary nodule image segmentation network named RAVR-UNet, which incorporates multi-scale features and double-branch parallel. Firstly, given the inability of the U-Net network to fully extract pulmonary nodule features in the coding stage, a double-branch parallel feature aggregation network is used to extract the feature information from pulmonary nodule images to reduce the information loss during feature coding. Secondly, the Agent_ViT module is introduced to enhance the capability of global information modeling while maintaining linear computation. Then, to recover the lost pulmonary nodule spatial information during subsampling, a multi-scale feature fusion module is added in the decoding stage. Finally, a mixed loss function is designed to alleviate the imbalance between positive and negative samples in the pulmonary nodule image segmentation task. Experimental results on the LIDC-IDRI public dataset show that the similarity coefficient and intersection over union (IoU) of the proposed network reach 93.15% and 87.63%, respectively, which is better than the mainstream pulmonary nodal segmentation algorithms, and the segmentation results are closer to the real values.
JIA Jun-Ying , WU Xing-Yu , YANG Hai-Bo
2025, 34(4):175-183. DOI: 10.15888/j.cnki.csa.009799 CSTR: 32024.14.csa.009799
Abstract:In recent years, with the acceleration of urbanization, urban drainage systems often struggle to cope with extreme weather, and road waterlogging occurs frequently. To solve the road waterlogging detection problem, this paper proposes an improved algorithm based on the DeepLabv3+ model. Firstly, a weighted bidirectional feature pyramid network (BiFPN) module is designed at the decoder side, which utilizes the different scales of low-level feature mapping obtained from the backbone network for fusion, giving full play to the potential of the multi-scale information obtained from the backbone network. Secondly, the Mamba-improved Transformer module is utilized to design parallel branches to process high-level feature mappings, construct global dependencies, and compensate for the possible local information loss caused by dilated convolution in ASPP. Finally, the polarized self-attention (PSA) module is introduced to mitigate the possible different effects of the direct addition of two-branch outputs on the data. The experimental results show that on the road waterlogging dataset, the improved algorithm has an mIoU of 87.54% and a PA of 96.61%, which is an improvement of 4.22% in terms of mIoU and 1.66% in terms of PA compared with the original algorithm.
2025, 34(4):184-194. DOI: 10.15888/j.cnki.csa.009806 CSTR: 32024.14.csa.009806
Abstract:Aiming at the difficulties in multi-dimensional feature modeling, non-stationary data and high prediction accuracy requirements in current time series prediction tasks, a non-stationary learning inverted Transformer model combined with causal convolution is proposed. The model first uses the original functions of the inverted embedding exchange attention mechanism for time series data and feedforward neural network. It employs the attention mechanism to learn the multivariate correlation of time series data and the feedforward neural network to learn the time dependence of the time series. Modeling the time and variables of multi-dimensional time series enhances the generalization ability of the model in terms of time dimension and the relationship between variables. Thus, the interpretability of the model is improved. Then, the sequence stabilization module is used to solve the problem of data non-stationarity to improve the predictability of the model. Finally, the non-stationary learning attention mechanism combined with causal convolution is used to reintroduce the key features and information that vanish in the stabilization module, thereby enhancing the prediction accuracy of the model. Compared with multiple mainstream benchmark models including PatchTST, iTransformer, and Crossformer, the mean square error of the proposed model on four data sets such as Exchange decreases by 6.2% to 65.0% on average. Ablation experiments show that the inverted embedding module and the non-stationary learning attention module combined with causal convolution can effectively improve the accuracy of time series prediction.
ZHANG Zheng-Xin , ZHANG Du-Zhen
2025, 34(4):195-206. DOI: 10.15888/j.cnki.csa.009807 CSTR: 32024.14.csa.009807
Abstract:With the widespread application of the attention mechanism in object detection, further enhancing the feature extraction ability become the focus of research. A novel attention mechanism is proposed to optimize the feature interaction process and enhance the detection performance. The mechanism eliminates the query operation in traditional self-attention. It employs depth-separable convolution to efficiently extract both local and global information and realizes feature aggregation through the weighted fusion of keys and values. The method effectively reduces the computational complexity and enhances the model’s ability to capture important features. Through validation on five different types of datasets, the experimental results demonstrate that the attention mechanism exhibits excellent performance in handling small target detection, occlusion processing, and complex scenes, significantly improving detection accuracy and efficiency. Visual analysis further verifies its effectiveness in feature extraction.
2025, 34(4):207-217. DOI: 10.15888/j.cnki.csa.009809 CSTR: 32024.14.csa.009809
Abstract:Entity alignment technology aims to identify and match items that refer to the same entity across different knowledge graphs. It plays a crucial role in the integration of knowledge graphs and demonstrates broad application potential and significant practical value in multiple fields such as knowledge completion and social network analysis. With the continuous evolution of entity alignment methods based on knowledge representation learning, researchers begin to explore the use of multiple information dimensions among entities to calculate similarity, thereby evaluating the similarity between source and target entities. Nonetheless, some of the attribute information of entities is not fully exploited in existing methods, especially the thematic information within entity attributes. By using topic models, more prominent semantic connections between entities can be identified. Focusing on this research, with the thematic information of entity attributes as the core, this study proposes an entity alignment framework called EAGT (knowledge graph entity alignment via graph convolutional networks with biterm topic model), which aligns entities by combining entity topics with graph convolutional neural networks. To verify the effectiveness of the proposed method, experiments are conducted on open-source datasets. The results show that EAGT achieves performance improvements in most cases.
CAO Jie , LI Li-Jing , LIANG Hao-Peng
2025, 34(4):218-227. DOI: 10.15888/j.cnki.csa.009811 CSTR: 32024.14.csa.009811
Abstract:To address the problems of small target size, dense distribution, and occlusion caused false detection and missed detection in unmanned aerial vehicle (UAV) aerial images, this study proposes a small target detection algorithm for aerial images which combines reparameterization and multi-level feature fusion. Firstly, the reparameterized convolution module (RCM) is designed by using the idea of reparameterization, and the C2f-RCM module is designed by combining the RCM with the C2f module, which can effectively draw contextual information by enlarging the sensory field and better extract the subtle features in the images. Secondly, to solve the problem of information loss caused by the neck network in the feature fusion part, this study proposes a multi-level feature fusion module (MFFM), which utilizes cross-level information fusion to effectively reduce the missed detection phenomenon in the case of occlusion, so that the network is able to detect large, medium, and small targets with a significant improved accuracy. Finally, an Inner-Shape IoU bounding box regression loss function is proposed to enhance the convergence speed of the model by constructing auxiliary borders and focusing on the shape of the bounding box. Compared with the baseline model, the proposed method improves mAP@0.5, Precision, and Recall by 5.7%, 5.7%, and 2.4% in VisDrone2019 and 3.7%, 3.9%, and 5.3% in AI-TOD, respectively, which verifies that the proposed method is effective in detecting small targets in aerial images.
WANG Chun-Dong , ZHANG Hao-Long
2025, 34(4):228-238. DOI: 10.15888/j.cnki.csa.009812 CSTR: 32024.14.csa.009812
Abstract:Multi-domain facial expression transfer entails the mutual transformation between different images to generate high-fidelity facial images with source facial expressions and target facial identity features, solve the problem of high similarity and low image authenticity of images generated by traditional methods. This study proposes a multi-domain facial expression transfer model based on the improved StarGAN-V2. The model consists of a generator, a discriminator, a mapping network, and a style encoder. The spatial attention mechanism is introduced, and the cycle consistency loss is upgraded to an adversarial cycle consistency loss. A new domain feedback discriminator is appended after the generator. The improved StarGAN-V2 model can generate high-fidelity facial images with source facial expressions and target facial identity features based on the source and target images. Experimental results show that for the improved model, the FID values of latent guided synthesis and reference guided synthesis are 11.9 and 17.4 respectively, and the LPIPS values are 0.491 and 0.426 respectively. These values are better than those of the control model. The improved model solves the problem of high image similarity and generates more realistic images.
WANG Ting , SUN Jin-Ze , ZHAO Qian , JING Chang-Qiang
2025, 34(4):239-247. DOI: 10.15888/j.cnki.csa.009851 CSTR: 32024.14.csa.009851
Abstract:To address the fast convergence that leads to a tendency to local optimal solutions of the sparrow search algorithm SSA when solving problems, this study proposes a sparrow search algorithm incorporating multi-strategy improvement (LCSSA). Firstly, the ability of global searching and to jump out of local optimal solutions is enhanced by introducing nonlinear decreasing weights and Levy flight strategy to jointly improve the discoverer position updating formula. Secondly, Cauchy mutation is introduced to update the positions of the followers, that is, the optimal solution is updated and perturbed. The study selects four comparison algorithms on 12 benchmark functions for comparative experiments. The experimental results show that the improved algorithm has achieved effective improvement in convergence speed and stability. In disease prediction, LCSSA has a good performance in four chronic disease datasets, showing higher prediction accuracy compared with selected algorithms.
2025, 34(4):248-255. DOI: 10.15888/j.cnki.csa.009817 CSTR: 32024.14.csa.009817
Abstract:To accelerate the solution of computational fluid dynamics (CFD), parallel execution is commonly used. However, the diversity of computing hardware architectures and programming languages poses challenges to program portability. In this study, the Kokkos framework is used to implement heterogeneous parallel CFD computing. Moreover, the reduction method, atomic operations, and the coloring approach are employed to address data conflicts in the process of parallel computing. A specific algorithmic solution for data conflict in heterogeneous parallel computing under this framework is proposed. Given the architectural characteristics of the graphics processing unit (GPU), the speedup ratios of single-precision and double-precision calculations on different hardware are analyzed, and optimal parallel strategies on different computing hardware are obtained. The study demonstrates that using atomic operations for single-precision computations on GPUs significantly enhances CFD solving efficiency.
ZHOU Heng , AI Qing , ZHANG Jing-Hui
2025, 34(4):256-265. DOI: 10.15888/j.cnki.csa.009826 CSTR: 32024.14.csa.009826
Abstract:Accurate integrated energy load forecasting is a key prerequisite for the preliminary planning and subsequent on-demand coordinated operation of regional integrated energy systems. The recent Transformer-based method has shown significant potential in long sequence forecasting for its excellent global modeling capabilities. However, the permutationally invariant self-attention mechanism in Transformer leads to the loss of temporal information and ignores the key dependencies between different variables in multi-energy load forecasting. To address the above challenges, this study proposes a patch and variable mixing model (PVMM) to achieve accurate multi-energy load forecasting. PVMM uses patch embedding technology to convert the input multi-energy load sequence into a 3D vector, thereby retaining the temporal and variable information of the patch. Secondly, this study proposes a patch mixing module (PMM) based on deep separable convolution to establish a temporal dependency model. In addition, this study also proposes a variable dynamic projection attention module (VDP-AM) to map Query and Value variables to a higher dimension and handle the interaction between multiple variables through a self-attention mechanism. Finally, the prediction accuracy and generalization ability of this method on the online system dataset publicly available at Arizona State University surpass existing methods.
WANG Fan , ZHANG Fei , SONG Fu-Yuan , YU Jia-Geng
2025, 34(4):266-275. DOI: 10.15888/j.cnki.csa.009827 CSTR: 32024.14.csa.009827
Abstract:The RISC-V software ecosystem is in the stage of accelerated development. International open-source community makes active contributions with focus on adaptation and optimization for RISC-V, driving its software ecosystem forward. PyTorch, an open-source Python machine learning library, has significant advantages in performance, open-source ecosystem, and research areas. It provides strong support for instruction set architectures such as x86, ARM, PowerPC, and CUDA. However, in the current RISC-V architecture, the software ecosystem porting is mainly focused on adapting to the RISC-V standard instruction set and has not yet fully utilized the RISC-V extended instruction sets to optimize the software ecosystem, which leaves a significant gap between the RISC-V software ecosystem and the mature ecosystems like ARM and x86. PyTorch, lacking support of RISC-V V extension (RVV), results in a considerable gap in inference performance between RISC-V platforms and ARM platforms of similar specifications. To address this issue, this study proposes an efficient development scheme for PyTorch RVV1.0 and optimizes deep convolution operators in PyTorch by using the RVV extended instruction set. A comparative analysis is conducted on the K230 development board, with experimental results showing that the performance of deep convolution operators optimized with RVV is improved by approximately 1.35 to 3.8 times compared to scalar implementations.
GUI Xiang-Quan , CAO Lei , LI Li
2025, 34(4):276-285. DOI: 10.15888/j.cnki.csa.009848 CSTR: 32024.14.csa.009848
Abstract:Dunhuang murals are dazzling treasures in the history of human world civilization. However, existing algorithmic studies on Dunhuang murals mainly focus on mural restoration, seldom concentrating on color style transfer. Therefore, a style transfer method for Dunhuang murals which incorporates the CBAM attention mechanism based on recurrent generative adversarial network is proposed in this study. By extracting the features of the input image and feeding them into the generator which is added with the CBAM attention mechanism, the attention mechanism is applied to improve the style transfer effect of the focus area and suppress the generation of boundary artifacts. To better retain the structural information of the image content, a residual network module is added between the down-sampling region and the up-sampling region. In addition, a color loss is added to the loss function to improve the stylization effect of the generated image by constraining the model. Experiments conducted on the self-constructed Dunhuang mural dataset validate the superiority over existing methods of the proposed model in the task of Dunhuang mural art style transfer. This model can generate stylized images of Dunhuang murals with more excellent visual effects and stronger artistic flavor, providing a new idea for innovative research on Dunhuang murals.
ZHENG Bai-Chuan , CHEN Kai , LI Sheng-Hui , LI Bing-Qian , ZHANG Ning
2025, 34(4):286-297. DOI: 10.15888/j.cnki.csa.009859 CSTR: 32024.14.csa.009859
Abstract:Entity alignment (EA) tasks are pivotal in the integration of knowledge graphs. The most advanced research has introduced external knowledge (attribute texts, timestamps, image information, etc.) and multimodal methods, achieving relatively high accuracy. However, these methods often have a strong dependence on specific structures, which limits their applicability in the entity alignment tasks of knowledge graphs with different structures. Therefore, this study proposes a universal knowledge graph alignment approach that utilizes the information of shared entity, relationship, and graph structure of knowledge graphs which are called surface information as they can be directly observed in knowledge graphs. An embedding generation module and an alignment module are included in the proposed method, and the former uses the Transformer model to capture the inherent semantics of entities and the contributions of their neighbors while the latter achieves high-performance and stable alignment through a matching algorithm. Experiment results show that the proposed method has achieved the best performance in the alignment scenarios among multiple mainstream knowledge graphs, demonstrating stability and strong interpretability. The code used in this study can be obtained at https://github.com/zb1tree/TGEA.