DING Qiang-Long , YE Hui-Zhu , YUAN Hong-Qiang , LI Zhi-Xin
2024, 33(5):1-14. DOI: 10.15888/j.cnki.csa.009517 CSTR: 32024.14.csa.009517
Abstract:This study proposes an algorithm named DPCP-CROSS-JOIN for fast co-spatiotemporal relationship join queries of large-scale trajectory data in insufficient cluster computing resource environments. The proposed algorithm discretizes continuous trajectory data by segmenting and cross-coding the temporal fields of trajectory data and conducting spatiality gridded coding and then stores the data in two-level partitions using date and grid region coding. It achieves 3-level indexing and 4-level acceleration for spatiotemporal join queries through cross “equivalent” join queries. As a result, the time complexity of the co-spatiotemporal relationship join queries among n$\cdot $n objects is reduced from O(n2) to O(nlogn). It can improve the efficiency of join queries by up to 30.66 times when Hive and TEZ are used on a Hadoop cluster for join queries of large-scale trajectory data. This algorithm uses time-slice and gridding coding as the join condition, thereby cleverly bypassing the real-time calculation of complex expressions during the join process. Moreover, complex expression calculation join is replaced with “equivalent” join to improve the parallelism of MapReduce tasks and enhance the utilization rates of cluster storage and computing resources. Similar tasks of larger scales of trajectory data that are almost impossible to accomplish using general optimization methods can still be completed by the proposed algorithm within a few minutes. The experimental results suggest that the proposed algorithm is efficient and stable, and it is especially suitable for the co-spatiotemporal relationship join queries of large-scale trajectory data under insufficient computing resource conditions. It can also be used as an atomic algorithm for searching accompanying spatiotemporal trajectories and determining the intimacy of relationships among objects. It can be widely applied in fields such as national security and social order maintenance, crime prevention and combat, and urban and rural planning support.
YANG Ben-Chen , QU Ye-Tian , JIN Hai-Bo
2024, 33(5):15-27. DOI: 10.15888/j.cnki.csa.009512 CSTR: 32024.14.csa.009512
Abstract:The scenes in high-resolution aerial images are of many highly similar categories. The classic classification method based on deep learning offers low operational efficiency because of the redundant floating-point operations generated in the feature extraction process. FasterNet improves the operational efficiency through partial convolution but reduces the feature extraction ability and hence the classification accuracy of the model. To address the above problems, this study proposes a hybrid structure classification method integrating FasterNet and the attention mechanism. Specifically, the “cross-shaped convolution module” is used to partially extract scene features and thereby improve the operational efficiency of the model. Then, a dual-branch attention mechanism that integrates coordinate attention and channel attention is used to enable the model to better extract features. Finally, a residual connection is made between the “cross-shaped convolution module” and the dual-branch attention module so that more task-related features can be obtained from network training, thereby reducing operational costs and improving operational efficiency in addition to improving classification accuracy. The experimental results show that compared with the existing classification models based on deep learning, the proposed method has a short inference time and high accuracy. Its number of parameters is 19M, and its average inference time for one image is 7.1 ms. The classification accuracy of the proposed method on the public datasets NWPU-RESISC45, EuroSAT, VArcGIS (10%), and VArcGIS (20%) is 96.12%, 98.64%, 95.42%, and 97.87%, respectively, which is 2.06%, 0.77%, 1.34%, and 0.65% higher than that of the FasterNet model, respectively.
2024, 33(5):28-36. DOI: 10.15888/j.cnki.csa.009504 CSTR: 32024.14.csa.009504
Abstract:The statistical inference of network data has become a hot topic in statistical research in recent years. The independence assumption among sample data in traditional models often fails to meet the analytical demands of modern network-linked data. This work studies the independent effect of each network node in the network-linked data, and based on the idea of fusion penalty, the independent effect of the associated nodes is converged. Knockoff variables construct covariates independent of the target variable by imitating the structure of the original variable. With the help of Knockoff variables, this study proposes a general method framework for variable selection for network-linked data (NLKF). The study proves that NLKF can control the false discovery rate (FDR) at the target level and has higher statistical power than the Lasso variable selection method. When the covariance of the original data is unknown, the covariance matrix using the estimation still has good statistical properties. Finally, combining the 200 factor samples of more than 4 000 stocks in the A-share market and their network relations constructed by Shenyin Wanguo’s first-level industry classification, an example of the application in the field of financial engineering is given.
YAO Juan , QIAO Huan , FANG Ling-Ling
2024, 33(5):37-46. DOI: 10.15888/j.cnki.csa.009482 CSTR: 32024.14.csa.009482
Abstract:Optical coherence tomography (OCT) is a new type of ophthalmic diagnosis method with non-contact, high resolution, and other characteristics, which has been used as an important reference for doctors to clinically diagnose ophthalmic diseases. As early detection and clinical diagnosis of retinopathy are crucial, it is necessary to change the time-consuming and laborious status quo of the manual classification of diseases. To this end, this study proposes a multi-classification recognition method for retinal OCT images based on an improved MobileNetV2 neural network. This method uses feature fusion technology to process images and designs an attention increase mechanism to improve the network model, greatly improving the classification accuracy of OCT images. Compared with the original algorithm, the classification effect has been significantly improved, and the classification accuracy, recall value, accuracy, and F1 value of the proposed model reach 98.3%, 98.44%, 98.94% and 98.69%, respectively, which has exceeded the accuracy of manual classification. Such methods not only speed up the diagnostic process, reduce the burden on doctors, and improve the quality of diagnosis in actual diagnosis, but also provide a new direction for ophthalmic medical research.
CHEN Wan-Zhi , ZHANG Si-Wei , WANG Tian-Yuan
2024, 33(5):47-56. DOI: 10.15888/j.cnki.csa.009490 CSTR: 32024.14.csa.009490
Abstract:A new method for short-term power load forecasting is proposed to address issues such as complex and non-stationary load data, as well as large prediction errors. Firstly, this study utilizes the maximum information coefficient (MIC) to analyze the correlation of feature variables and selects relevant variables related to power load sequences. At the same time, as the variational mode decomposition (VMD) method is susceptible to subjective factors, the study employs the rime optimization algorithm (RIME) to optimize VMD and decompose the original power load sequence. Then, the long and short-term time series network (LSTNet) is improved as the prediction model by replacing the recursive LSTM layer with BiLSTM and incorporating the convolutional block attention mechanism (CBAM). Comparative experiments and ablation experiments demonstrate that RIME-VMD reduces the root mean square error (RMSE) of the LSTM, GRU, and LSTNet models by more than 20%, significantly improving the prediction accuracy of the models, and can be adapted to different prediction models. Compared with LSTM, GRU, and LSTNet, the proposed BLSTNet-CBAM model reduces the RMSE by 35.54%, 6.78%, and 1.46% respectively, improving the accuracy of short-term power load forecasting.
WU Yong-Qing , ZHU Yue , WANG Yu-Han
2024, 33(5):57-66. DOI: 10.15888/j.cnki.csa.009483 CSTR: 32024.14.csa.009483
Abstract:To address the challenge of data sparsity within session recommendation systems, this study introduces a self-supervised graph convolution session recommendation model based on the attention mechanism (ATSGCN). The model constructs the session sequence into three distinct views: the hypergraph view, item view, and session view, showing the high-order and low-order connection relationships of the session. Secondly, the hypergraph view employs hypergraph convolutional networks to capture higher-order pairwise relationships among items within a conversation. The item view and session view employ graph convolutional networks and attention mechanisms respectively to capture lower-order connection details within local conversation data at both item and session levels. Finally, self-supervised learning is adopted to maximize the mutual information between the session representations learned by the two encoders, thereby effectively improving recommendation performance. Comparative experiment on the Nowplaying and Diginetica public datasets demonstrates the superior performance of the proposed model over the baseline model.
ZHOU Yun-Long , JI Fan-Fan , PAN Ze-Feng
2024, 33(5):67-75. DOI: 10.15888/j.cnki.csa.009502 CSTR: 32024.14.csa.009502
Abstract:The previous methods for precipitation nowcasting based on deep learning try to model the spatiotemporal evolution of radar echoes in a unified architecture. However, these methods may face difficulty in capturing the complex spatiotemporal relationships completely. This study proposes a two-stage precipitation nowcasting network based on the Halo attention mechanism. This network divides the spatiotemporal evolution process of precipitation nowcasting into two stages: motion trend prediction and spatial appearance reconstruction. Firstly, a learnable optical flow module models the motion trend of radar echoes and generates coarse prediction results. Secondly, a feature reconstruction module models the spatial appearance changes in the historical radar echo sequences and refines the spatial appearance of the coarse-grained prediction results, generating fine-grained radar echo maps. The experimental results on the CIKM dataset demonstrate that the proposed method outperforms mainstream methods. The average Heidke skill score and critical success index are improved by 4.60% and 3.63%, reaching 0.48 and 0.45, respectively. The structural similarity index is improved by 4.84%, reaching 0.52, and the mean squared error is reduced by 6.13%, reaching 70.23.
XU Yan , LIN Yun-Han , MIN Hua-Song
2024, 33(5):76-84. DOI: 10.15888/j.cnki.csa.009500 CSTR: 32024.14.csa.009500
Abstract:GSNet relies on graspness to distinguish graspable areas in cluttered scenes, which significantly improves the accuracy of robot grasping pose detection in cluttered scenes. However, GSNet only uses a fixed-size cylinder to determine the grasping pose parameters and ignores the influence of features of different sizes on grasping pose estimation. To address this problem, this study proposes a multi-scale cylinder attention feature fusion module (Ms-CAFF), which contains two core modules: the attention fusion module and the gating unit. It replaces the original feature extraction method in GSNet and uses an attention mechanism to effectively integrate the geometric features inside the four cylinders of different sizes, thereby enhancing the network’s ability to perceive geometric features at different scales. The experimental results on GraspNet-1Billion, a grabbing pose detection dataset for large-scale cluttered scenes, show that after the introduction of the modules, the accuracy of the network’s grasping poses is increased by up to 10.30% and 6.65%. At the same time, this study applies the network to actual experiments to verify the effectiveness of the method in real scenes.
SONG Wen-Qi , WU Long , LI Yao
2024, 33(5):85-93. DOI: 10.15888/j.cnki.csa.009506 CSTR: 32024.14.csa.009506
Abstract:To address the problems of few shots and varying sizes in the surface defects on steel strips in industrial scenarios, this study proposes a detection network for surface defects on steel strips readily applicable to few-shot situations. Specifically, the algorithm is based on the you only look once version 5 small (YOLOv5s) framework and a multi-scale path aggregation network with an attention mechanism is designed to serve as the neck of the model and thereby enhance the ability of the model to predict the defect objects on multiple scales. Then, an adaptive coord-decoupled head is proposed to alleviate the contradiction among classification and positioning tasks in few-shot scenarios. Finally, a bounding box regression loss function fused with the Wasserstein distance is presented to improve the accuracy of the model in detecting small defect objects. Experiments show that the proposed model outperforms other few-shot object detection models on the few-shot dataset of surface defects on steel strips, indicating that it is more suitable for few-shot defect detection tasks in industrial environments.
HONG Jun , LIU Xiao-Nan , LIU Zhen-Yu
2024, 33(5):94-102. DOI: 10.15888/j.cnki.csa.009513 CSTR: 32024.14.csa.009513
Abstract:This study aims to solve the problems faced by traditional U-Net network in the semantic segmentation task of street scene images, such as the low accuracy of object segmentation under multi-scale categories and the poor correlation of image context features. To this end, it proposes an improved U-Net semantic segmentation network AS-UNet to achieve accurate segmentation of street scene images. Firstly, the spatial and channel squeeze & excitation block (scSE) attention mechanism module is integrated into the U-Net network to guide the convolutional neural network to focus on semantic categories related to segmentation tasks in both channel and space dimensions, to extract more effective semantic information. Secondly, to obtain the global context information of the image, the multi-scale feature map is aggregated for feature enhancement, and the atrous spatial pyramid pooling (ASPP) multi-scale feature fusion module is embedded into the U-Net network. Finally, the cross-entropy loss function and Dice loss function are combined to solve the problem of unbalanced target categories in street scenes, and the accuracy of segmentation is further improved. The experimental results show that the mean intersection over union (MIoU) of the AS-UNet network model in the Cityscapes and CamVid datasets increases by 3.9% and 3.0%, respectively, compared with the traditional U-Net network. The improved network model significantly improves the segmentation effect of street scene images.
WANG Hong , FENG Jia-Jun , DAI Qi , SHI Yu , LIANG Yu-Hang , ZHANG Hui
2024, 33(5):103-109. DOI: 10.15888/j.cnki.csa.009493 CSTR: 32024.14.csa.009493
Abstract:The traditional prediction models for the corrosion rates of industrial pipelines often have the problems of dependence of feature extraction on artificial experience and insufficient generalization ability. To address this issue, this study combines the convolutional neural network (CNN) with the long short-term memory (LSTM) network and proposes a network model based on the cuckoo search (CS) optimization algorithm, namely, the CNN-LSTM-CS model, to predict the corrosion rates of industrial pipelines. Specifically, the collected pipeline corrosion dataset is pre-processed by normalization. Then, the CNN is used to extract information on the deep features of factors affecting the corrosion rates of the pipelines, and a CNN-LSTM prediction model is constructed by training the LSTM network. Finally, the CS algorithm is used to optimize the parameters of the prediction model, thereby reducing the prediction error and predicting the corrosion rate accurately. The experimental results show that compared with several typical prediction methods for the corrosion rate, the method proposed has higher prediction accuracy and provides a new approach for predicting the corrosion rates of industrial pipelines.
2024, 33(5):110-117. DOI: 10.15888/j.cnki.csa.009510 CSTR: 32024.14.csa.009510
Abstract:Convolutional neural network (CNN), as an important part of U-Net baseline networks in the field of medical image segmentation, is mainly used to deal with the relationships among local feature information. Transformer is a visual model that can effectively strengthen the long-distance dependency among feature information. The previous study shows that Transformer can be combined with CNNs to improve the accuracy of medical image segmentation to a certain extent. However, labeled data in medical images are rarely available while a large amount of data is required to train the Transformer model, exposing the Transformer model to the challenges of high time consumption and a large number of parameters. Due to these considerations, this paper proposes a novel medical image segmentation model based on a hybrid multi-layer perception (MLP) network by combining the multi-scale hybrid MLP with a CNN based on the UNeXt model, namely, the LM-UNet model. This model can effectively enhance the connection between local and global information and strengthen the fusion between feature information. Experiments on multiple datasets reveal significantly improved segmentation performance of the LM-UNet model on the international skin imaging collaboration (ISIC) 2018 dataset manifested as an average Dice coefficient of 92.58% and an average intersection over union (IoU) coefficient of 86.52%, which are 3% and 3.5% higher than those of the UNeXt model, respectively. The segmentation effects of the proposed model on the osteoarthritis initiative-zuse institute Berlin two-dimensional (OAI-ZIB 2D) and the breast ultrasound image (BUSI) datasets are also substantially superior, represented as average Dice coefficients 2.5% and 1.0% higher than those of the UNeXt counterpart, respectively. In summary, the LM-UNet model not only improves the accuracy of medical image segmentation but also provides better generalization performance.
CHEN Wan-Zhi , RONG Xin-Xin , WANG Tian-Yuan
2024, 33(5):118-126. DOI: 10.15888/j.cnki.csa.009499 CSTR: 32024.14.csa.009499
Abstract:Accurately predicting wind power is of great significance for improving the efficiency and safety of the power system, while the intermittence and randomness of wind energy make it difficult to predict wind power accurately. Therefore, an improved wind power prediction model based on Informer, namely PCI-Informer (PATCH-CNN-IRFFN-Informer) is proposed. The sequence data is divided into subsequence-level patches for feature extraction and integration, which improves the model’s ability to process sequence data and its effectiveness. Multiple-scale causal convolution self-attention mechanism is used to achieve multi-scale local feature fusion, which enhances the model’s understanding and modeling ability of local information. The inverse residual feedforward network (IRFFN) is introduced to enhance the model’s ability to extract and preserve local structural information. Experiment verification is conducted using data from a wind farm, and the results show that compared with mainstream prediction models, the PCI-Informer model achieves better prediction performance at different prediction time steps, with an average reduction of 11.1% in MAE compared with the Informer model, effectively improving the short-term wind power prediction accuracy.
ZHANG Shi-Qing , HU Wei , ZHAO Xiao-Ming
2024, 33(5):127-135. DOI: 10.15888/j.cnki.csa.009484 CSTR: 32024.14.csa.009484
Abstract:Spatiotemporal forecasting finds extensive applications in domains such as pollution management, transportation, energy, and meteorology. Predicting PM2.5 concentration, as a quintessential spatiotemporal forecasting task, necessitates the analysis and utilization of spatiotemporal dependencies within air quality data. Existing studies on spatiotemporal graph neural networks (ST-GNNs) either employ predefined heuristic rules or trainable parameters for adjacency matrices, posing challenges in accurately representing authentic inter-station relationships. This study introduces the adaptive hierarchical graph convolutional neural network (AHGCNN) to address these issues concerning PM2.5 prediction. Firstly, a hierarchical mapping graph convolutional architecture is introduced, employing distinct self-learning adjacency matrices at different hierarchical levels, efficiently uncovering unique spatiotemporal dependencies among various monitoring stations. Secondly, an attention-based aggregation mechanism is employed to connect adjacency matrices across different hierarchical levels, expediting the convergence process. Finally, the hidden spatial states are fused with gated recurrent unit (GRU), forming a unified predictive framework capable of concurrently capturing multi-level spatial and temporal dependencies, ultimately delivering the prediction results. In the experiments, the proposed model is comparatively analyzed with seven mainstream models. The results indicate that the model can effectively capture the spatiotemporal dependencies between air monitoring stations, improving predictive accuracy.
WANG Chao , SUN Yong-Yong , XU Fei , MA Yuan-Yuan , WEN Wen , WANG Lu
2024, 33(5):136-143. DOI: 10.15888/j.cnki.csa.009497 CSTR: 32024.14.csa.009497
Abstract:In the field of short-text intent recognition, convolutional neural networks (CNN) have garnered considerable attention due to their outstanding performance in extracting local information. Nevertheless, their limitations arise from the difficulty in capturing the global features of short-text corpora. To address this issue, this study combines the strengths of TextCNN and BiGRU-att to propose a dual-channel short-text intent recognition model, aiming to better recognize the intent of short texts by leveraging both local and global features, thereby compensating for the model’s inadequacies in capturing overall text features. The AB-CNN-BGRU-att model initially utilizes an ALBERT multi-layer bidirectional Transformer structure to vectorize the input text and subsequently feeds these vectors separately into TextCNN and BiGRU network models to extract local and global features, respectively. The fusion of these two types of features, followed by passing through fully connected layers and inputting into the Softmax function, yields the intent labels. The experimental results demonstrate that on the THUCNews_Title dataset, the proposed AB-CNN-BGRU-att algorithm achieves an accuracy (Acc) of 96.68% and an F1 score of 96.67%, exhibiting superior performance compared with other commonly used intent recognition models.
GUO Wei , WANG Zhu-Ying , JIN Hai-Bo
2024, 33(5):144-153. DOI: 10.15888/j.cnki.csa.009471 CSTR: 32024.14.csa.009471
Abstract:At present, there are many small targets in UAV images and the background is complex, which makes it easy to cause a high error detection rate in target detection. To solve these problems, this study proposes a small target detection algorithm for high-order depth separable UAV images. Firstly, by combining the CSPNet structure and ConvMixer network, the study utilizes the deeply separable convolution kernel to obtain the gradient binding information and introduces a recursively gated convolution C3 module to improve the higher-order spatial interaction ability of the model and enhance the sensitivity of the network to small targets. Secondly, the detection head adopts two heads to decouple and respectively outputs the feature map classification and position information, accelerating the model convergence speed. Finally, the border loss function EIoU is leveraged to improve the accuracy of the detection frame. The experimental results on the VisDrone2019 data set show that the detection accuracy of the model reaches 35.1%, and the missing and false detection rates of the model are significantly reduced, which can be effectively applied to the small target detection task of UAV images. The model generalization ability is tested on the DOTA 1.0 dataset and the HRSID dataset, and the experimental results show that the model has good robustness.
NIE Yu-Ming , ZANG Wen-Ke , MA Xue-Hao , LIU Yu-Ru , BAO Zhi-Cheng , ZHANG Zhen , PENG Yi
2024, 33(5):154-161. DOI: 10.15888/j.cnki.csa.009489 CSTR: 32024.14.csa.009489
Abstract:There are challenges in training local models at resource-constrained edges in federated learning systems. The limitations in computing, storage, energy consumption, and other aspects constantly affect the scale and effectiveness of the model. Traditional federated pruning methods prune the model during the federated training process, but they fail to prune models adaptively according to the environment and may remove some important parameters, resulting in poor performance of models. This study proposes a distributed model pruning method based on federated reinforcement learning to solve this problem. Firstly, the model pruning process is abstracted, and a Markov decision process is established. DQN algorithm is used to construct a universal reinforcement pruning model, so as to dynamically adjust the pruning rate and improve model generalization performance. Secondly, an aggregation method for sparse models is designed to reinforce and generalize pruning methods, optimize the structure of the model, and reduce its complexity. Finally, this method is compared with different baselines on multiple publicly available datasets. The experimental results show that the proposed method maintains model effectiveness while reducing model complexity.
QU Hai-Cheng , LI Zhu-Yuan , LIU Wan-Jun
2024, 33(5):162-169. DOI: 10.15888/j.cnki.csa.009505 CSTR: 32024.14.csa.009505
Abstract:As one of the important development directions of artificial intelligence, spiking neural networks have received extensive attention in the fields of neuromorphic engineering and brain-inspired computing. To solve the problems of poor generalization as well as large memory and time consumption in spiking neural networks, this study proposes a classification method based on spiking neural networks for spatio-temporal interactive images. Specifically, a temporal efficient training algorithm is introduced to compensate for the kinetic energy loss in the gradient descent process. Then, the spatial learning through time algorithms are integrated to improve the ability of the network to process information efficiently. Finally, the spatial attention mechanism is added to enable the network to better capture important features in the spatial dimension. The experimental results show that the training memory occupation on the three datasets of CIFAR10, DVS Gesture, and CIFAR10-DVS are reduced by 46.68%, 48.52%, and 10.46%, respectively, and the training speed is increased by 2.80 times, 1.31 times, and 2.76 times, respectively. These results indicate that the proposed method improves network performance effectively on the premise of maintaining accuracy.
GAO Shao-Shu , SONG Shang-Ge , NI Xiao
2024, 33(5):170-177. DOI: 10.15888/j.cnki.csa.009515 CSTR: 32024.14.csa.009515
Abstract:The currently available quality assessment methods for images rarely fully utilize the color coding mechanisms of the retina of human eyes and the visual cortex and fail to fully consider the influence of color information on image quality. In this study, an objective assessment model for the color harmony of visible light (dim-light) and infrared color fused images based on multiple visual features is proposed to address the above problems. This model incorporates more color information into image quality assessment by considering a variety of visual features of human eyes comprehensively, including the feature of visual contrast colors, the feature of color information fluctuation, and the feature of advanced visual content. Through feature fusion and support vector regression training, it achieves the objective assessment of the color harmony of color fused images. Experimental comparisons and analyses are conducted using databases of fused images in three typical scenes. The experimental results show that compared with the existing eight methods of objective image quality assessment, the proposed method is more consistent with the subjective perception of human eyes and has higher prediction accuracy.
XU Jiu-Yun , TUO Ying-Chao , ZHAO Yao-Peng , LI Shi-Bao
2024, 33(5):178-186. DOI: 10.15888/j.cnki.csa.009516 CSTR: 32024.14.csa.009516
Abstract:The emergence of network function virtualization (NFV) technology enables network services instantiated as service function chains (SFCs) to share the underlying network, alleviating the rigidity of traditional network architectures. However, the large number of service requests in the network brings new challenges to multi-domain SFC orchestration. For one thing, the privacy of the intra-domain resource information and internal policies of the network makes multi-domain SFC orchestration more complicated. For another, multi-domain SFC orchestration requires the determination of the optimal set of candidate orchestration domains. Nevertheless, previous studies rarely considered the inter-domain load balance, which negatively affected the service acceptance rate. In addition, the orchestration of service requests across network domains places more stringent requirements on the cost and response time of the service. To address the above challenges, this study proposes a construction method for domain-level graphs to meet the privacy requirement of multi-domain networks. Then, a calculation method for domain weight based on the inter-domain load balance is proposed to select SFC orchestration domains. Finally, the study proposes an orchestration algorithm considering the cost and responses time requirements of multi-domain networks. The experimental results show that the proposed algorithm effectively trades off the average service cost and the acceptance rate and also optimizes the average service response time.
2024, 33(5):187-194. DOI: 10.15888/j.cnki.csa.009485 CSTR: 32024.14.csa.009485
Abstract:Unlike appearance-based methods whose input may bring in some background noises, skeleton-based gait representation methods take key joints as input, which can neglect the noise interference. Meanwhile, most of the skeleton-based representation methods ignore the significance of the prior knowledge of human body structure or tend to focus on the local features. This study proposes a skeleton-based gait recognition framework, GaitBody, to capture more distinctive features from the gait sequences. Firstly, the study leverages a temporal multi-scale convolution module with a large kernel size to learn the multi-granularity temporal information. Secondly, it introduces topology information of the human body into a self-attention mechanism to exploit the spatial representations. Moreover, to make full use of temporal information, the most salient temporal information is generated and introduced into the self-attention mechanism. Experiments on the CASIA-B and OUMVLP-Pose datasets show that the method achieves state-of-the-art performance in skeleton-based gait recognition, and ablation studies show the effectiveness of the proposed modules.
2024, 33(5):195-202. DOI: 10.15888/j.cnki.csa.009496 CSTR: 32024.14.csa.009496
Abstract:MonteCloPi is an anytime subgroup discovery algorithm based on Monte Carlo tree search (MCTS). It aims to build an asymmetric best-first search tree to discover a diverse pattern set with high quality by MCTS policies, while it is limited to a binary target. To this end, this study combines the characteristics of the numerical target to extend the MonteCloPi algorithm to the numerical target. The study selects the appropriate C value for the upper confidence bound (UCB) formula, adjusts the expansion weight of each sample dynamically as well as prunes the search tree, and uses the adaptive top-k-mean-update policy. Finally, the experimental results on the UCI datasets and the National Health and Nutrition Examination Survey (NHANES) audiometry datasets show that the proposed algorithm outperforms other algorithms in terms of discovering diverse pattern sets with high quality and the interpretability of the best subgroup.
MENG Meng-Meng , HUANG Rui-Rui , WU Lin , HUANG Ya-Bo
2024, 33(5):203-209. DOI: 10.15888/j.cnki.csa.009518 CSTR: 32024.14.csa.009518
Abstract:Synthetic aperture radar (SAR) images provide an important time-series data source for land cover classification. The existing time-series matching algorithms can fully exploit the similarity among time-series features to obtain satisfactory classification results. In this study, the classic time-series matching algorithm named time-weighted dynamic time warping (TWDTW), which comprehensively considers shape similarity and phenological differences, is introduced to guide SAR-based land cover classification. To solve the problem that the traditional TWDTW algorithm only considers the similarity matching of a single feature on the time series, this study proposes a multi-feature fusion-based TWDTW (Mult-TWDTW) algorithm. In the proposed method, three features, namely, the backscattering coefficient, interferometric coherence, and the dual-polarization radar vegetation index (DpRVI), are extracted, and the Mult-TWDTW model is designed by fusing multiple features based on the TWDTW algorithm. To verify the effectiveness of the proposed method, the study implements land cover classification in the Danjiangkou area using time-series data obtained from the Sentinel-1A satellite. Then, the Mult-TWDTW algorithm is compared with the multi-layer perception (MLP), one-dimensional convolutional neural network (1D-CNN), K-means, and support vector machine (SVM) algorithms as well as the TWDTW algorithm using a single feature. The experimental results show that the Mult-TWDTW algorithm obtains the best classification results, manifested as its overall accuracy and Kappa coefficient reaching 95.09% and 91.76, respectively. In summary, the Mult-TWDTW algorithm effectively fuses the information of multiple features and can enhance the potential of time-series matching algorithms in the classification of multiple types of land covers.
ZHOU Zi-Li , GAO Shi-Liang , AN Run-Lu , BAO Xin-Yue
2024, 33(5):210-217. DOI: 10.15888/j.cnki.csa.009508 CSTR: 32024.14.csa.009508
Abstract:Abstract neural networks have made significant progress and demonstrated remarkable achievements in the field of text summarization. However, abstract summarization is highly likely to generate summaries of poor fidelity and even deviate from the semantic essence of the source documents due to its flexibility. To address this issue, this study proposes two methods to improve the fidelity of summaries. For Method 1, since entities play an important role in summaries and are usually derived from the original documents, the paper suggests allowing the model to copy entities from the source document to ensure that the generated entities match those in the source document and thereby prevent the generation of inconsistent entities. For Method 2, to better prevent the generated summary from deviating from the original text semantically, the study uses key entities and key tokens as two types of guiding information at different levels of granularity in the summary generation process. The performance of the proposed methods is evaluated using the ROUGE metric on two widely used text summarization datasets, namely, CNNDM and XSum. The experimental results demonstrate that both methods have significantly improved the performance of the model. Furthermore, the experiments also prove that the entity copy mechanism can, to some extent, use guiding information to correct introduced semantic noise.
YANG Qun , ZI Ling-Ling , CONG Xin
2024, 33(5):218-227. DOI: 10.15888/j.cnki.csa.009494 CSTR: 32024.14.csa.009494
Abstract:During peer evaluation, evaluators may give inaccurate evaluation scores as a result of strategic evaluation. Taking into account the evaluators’ social interest (SI) relations, this study proposes a prediction method named graph attention network-social interest relation-oriented attention network (GAT-SIROAN) that integrates SI and the GAT. This method consists of a weighted network SIROAN that represents the evaluators’ relations with the solutions and a GAT that is used to predict peer evaluation scores. In the SIROAN, the interrupted time-series analysis (ITSA) method is applied to define the evaluators’ two characteristics: the self-evaluation ability and the peer evaluation ability, and these two characteristics are compared to obtain the SI factors and relations among the evaluators. In the score prediction stage, considering the importance of each node, this study uses a self-attention mechanism to calculate the attention coefficients at the nodes, thereby improving the prediction ability. Network parameters are learned by minimizing the root mean square error (RMSE) to obtain more accurate predicted peer evaluation scores. The GAT-SIROAN method is compared experimentally with five baseline methods, namely, the mean, median, PeerRank, RankwithTA, and GCN-SOAN methods, on real datasets. The results show that the GAT-SIROAN method outperforms all the above baseline methods in the RMSE.
2024, 33(5):228-238. DOI: 10.15888/j.cnki.csa.009511 CSTR: 32024.14.csa.009511
Abstract:Selecting appropriate optimizers for a federated learning environment is an effective way to improve model performance, especially in situations where the data is highly heterogeneous. In this study, the FedAvg and FedALA algorithms are mainly investigated, and an improved version called pFedALA is proposed. PFedALA effectively reduces resource waste caused by synchronization demands by allowing clients to continue local training during waiting periods. Then, the roles of the optimizers in these three algorithms are analyzed in detail, and the performance of various optimizers such as stochastic gradient descent (SGD), Adam, averaged SGD (ASGD), and AdaGrad in handling non-independent and identically distributed (Non-IID) and imbalanced data is compared by testing them on the MNIST and CIFAR-10 datasets. Special attention is given to practical heterogeneity based on the Dirichlet distribution and extreme heterogeneity in terms of data setting. The experimental results suggest the following observations: 1) The pFedALA algorithm outperforms the FedALA algorithm, with an average test accuracy approximately 1% higher than that of FedALA; 2) Optimizers commonly used in traditional single-machine deep learning environments deliver significantly different performance in a federated learning environment. Compared with other mainstream optimizers, the SGD, ASGD, and AdaGrad optimizers appear to be more adaptable and robust in the federated learning environment.
2024, 33(5):239-245. DOI: 10.15888/j.cnki.csa.009487 CSTR: 32024.14.csa.009487
Abstract:Existing Siamese network object tracking techniques perform only one fusion operation of template features and search features, which makes the object features on the fused feature map relatively coarse and unfavorable to the tracker’s precise positioning. In this study, a serial mutual correlation module is designed. It aims to use the existing mutual correlation method to enhance the object features on the fused feature map by performing multiple mutual correlation operations on the template features and the search features, so as to improve the accuracy of the subsequent classification and regression results and strike a balance between speed and accuracy with fewer parameters. The experimental results show that the proposed method achieves good results on four mainstream tracking datasets.
2024, 33(5):246-253. DOI: 10.15888/j.cnki.csa.009488 CSTR: 32024.14.csa.009488
Abstract:This study is dedicated to exploring the complex process of opinion formation in social networks, with a particular focus on the mechanisms of consensus achievement in decentralized environments. A novel opinion classification strategy, termed “the second confidence interval” is proposed to improve the traditional DeGroot consensus model, and two distinct opinion dynamics models are developed: the far attack inbreeding (FAI) model and the outbred recent attack (ORA) model. These models comprehensively consider the degree of individual acceptance and emphasis on surrounding opinions. In addition, through an in-depth analysis of neighborhood opinions in social networks, a comprehensive setup of the individual model is carried out, covering multiple factors such as private opinions, expressed opinions, obstinacy, and preferences. The results indicate that under specific parameter settings, both the FAI and ORA models can reach a consensus more rapidly than the original DeGroot model. Specifically, the ORA model converges at around 700 steps, while the convergence speed of the FAI model gradually approaches that of the ORA model with increasing parameter values. Compared with the baseline model, the ORA model exhibits smaller variations in converged opinion values, no more than 3.5%, whereas the FAI model demonstrates greater volatility. These findings not only deepen people’s understanding of the public opinion formation mechanisms in social networks but also highlight the significance of opinion dynamics within individual neighborhoods in the consensus formation process, offering new perspectives and research directions for future studies in this field.
JIN Xu-Ming , LIN Yun-Han , ZHANG Lei , MIN Hua-Song
2024, 33(5):254-261. DOI: 10.15888/j.cnki.csa.009495 CSTR: 32024.14.csa.009495
Abstract:This study proposes a two-stage path planning method for the path planning task of the inner wall operation of a mobile robot in multi-room. In the first stage, for the sensor failure caused by dust or fog in the environment during wall operation and incomplete path planning when there are many exits in a room, the study proposes a start-point automatically selected wall following path planning method, which is based on grid maps to generate the wall following paths offline. In the second stage, for the dynamic obstacle avoidance problem during point-to-point path planning, it proposes a point-to-point path planning method based on the prioritized experience replay soft actor critic (PSAC) algorithm, which introduces the prioritized experience playback strategy in the soft actor critic (SAC) to achieve dynamic obstacle avoidance. The comparison experiments of wall following path planning and dynamic obstacle avoidance are designed to verify the effectiveness of the proposed method in the indoor wall following path planning and point-to-point path planning.
ZHANG Meng-Jie , CHEN Yao-Jie , DENG Jiang
2024, 33(5):262-270. DOI: 10.15888/j.cnki.csa.009498 CSTR: 32024.14.csa.009498
Abstract:This study analyzes the multivariate, nonlinear, and strong coupling characteristics of permanent magnet synchronous motors (PMSM) in industrial applications, as well as the difficulties in their parameter adjustment, response delay, poor robustness, and adaptability issues encountered with traditional PID control. A novel approach combining a twin delayed deep deterministic policy gradient (TD3) algorithm with PID control is proposed to optimize PID parameter adjustment for more accurate motor speed control. In this method, bidirectional long short-term memory networks (BiLSTM) are integrated into the Actor and Critic networks, significantly enhancing the processing capability for time-series data of PMSM’s dynamic behavior. This enables the system to accurately capture the current state and predict future trends, achieving more precise and adaptive self-tuning of PID parameters. Moreover, the integration of entropy regularization and curiosity-driven exploration methods further enhances the diversity of the strategy, preventing premature convergence to suboptimal strategies and encouraging in-depth exploration of unknown environments. To validate the effectiveness of the proposed method, a simulation model of a PMSM is designed, and the proposed BiLSTM-TD3-ICE method is compared with the traditional TD3 and the classical Ziegler-Nichols (Z-N) method. The experimental results demonstrate the significant advantages of the proposed strategy in control performance.
BAI Xue , WANG Xia-Guang , JIN Ji-Xin , SONG Chun-Mei , ZHAO Si-Tong
2024, 33(5):271-279. DOI: 10.15888/j.cnki.csa.009519 CSTR: 32024.14.csa.009519
Abstract:In the digital era, an increasing number of people prefer shopping on e-commerce platforms. With the development of agricultural product e-commerce platforms, consumers find it challenging to discover suitable products among numerous choices. To enhance user satisfaction and purchase intent, agricultural product e-commerce platforms need to recommend appropriate products based on user preferences. Considering various agricultural features such as season, region, user interests, and product attributes, feature interactions can better capture user demands. This study introduces a new model, fine-grained feature interaction selection networks (FgFisNet). The model effectively learns feature interactions using both the inner product and Hadamard product by introducing fine-grained interaction layers and feature interaction selection layers. During the training process, it automatically identifies important feature interactions, eliminates redundant ones, and feeds the significant feature interactions and first-order features into a deep neural network to obtain the final click through rate (CTR) prediction. Extensive experiments on a real dataset from agricultural e-commerce demonstrate significant economic benefits achieved by the proposed FgFisNet method.
LIU Ruo-Chen , FENG Guang , LUO Liang-Yu , LIN Hao-Ze
2024, 33(5):280-287. DOI: 10.15888/j.cnki.csa.009492 CSTR: 32024.14.csa.009492
Abstract:In the context of current multi-modal emotion analysis in videos, the influence of modality representation learning on modality fusion and final classification results has not been adequately considered. To this end, this study proposes a multi-modal emotion analysis model that integrates cross-modal representation learning. Firstly, the study utilizes Bert and LSTM to extract internal information from text, audio, and visual modalities separately, followed by cross-modal representation learning to obtain more information-rich unimodal features. In the modal fusion stage, the study fuses the gating mechanism and improves the traditional Transformer fusion mechanism to control the information flow more accurately. Experimental results on the publicly available CMU-MOSI and CMU-MOSEI datasets demonstrate that the accuracy and F1 score of this model are improved compared with the traditional models, validating the effectiveness of this model.