2024, 33(1):1-10. DOI: 10.15888/j.cnki.csa.009346 CSTR:
Abstract:Federated learning is a distributed machine learning approach that enables model delivery and aggregation without compromising the privacy and security of local data. However, federated learning faces a major challenge: the large size of the models and the parameters that need to be communicated multiple times between the client and the server, bringing difficulties for small devices with insufficient communication capability. Therefore, this study set up the client and server to communicate with each other only once. Another challenge in federated learning is the data imbalance among different clients. The model aggregation for servers becomes inefficient in data imbalance. To overcome these challenges, the study proposes a lightweight federated learning framework that requires only one-shot communication between the client and the server. The framework also introduces an aggregation policy algorithm, FBL-LD. The algorithm selects the most reliable and dominant model from the client models in a one-shot communication and adjusts the weights of other models based on a validation set to achieve a generalized federated model. FBL-LD reduces the communication overhead and improves aggregation efficiency. Experimental results show that FBL-LD outperforms existing federated learning algorithms in terms of accuracy and robustness to data imbalance.
LI Shun-Yong , XU Rui , LI Shi-Yi
2024, 33(1):11-21. DOI: 10.15888/j.cnki.csa.009348 CSTR:
Abstract:Most of the existing deep clustering algorithms adopt symmetric autoencoders to extract low-dimensional features of high-dimensional data. However, with the increasing training times of autoencoders, the low-dimensional feature space of the data is distorted to a certain extent, and then the obtained data low-dimensional feature space cannot reflect the potential clustering structure information in the original data space. To this end, this study proposes a new deep embedded K-means algorithm (SDEKC). First, during low-dimensional feature extraction, two skip connections are added with a certain weight between the corresponding encoder and decoder in the symmetric convolutional autoencoder. As a result, the encoding requirements of the decoder for the encoder are reduced, and the coding ability of the convolutional autoencoder is highlighted, which can better retain the clustering structure information in the original data space. Second, the low-dimensional data space is converted into a new space revealing clustering structure information by an orthogonal transformation matrix in the clustering stage. Finally, this study utilizes the greedy algorithm to iteratively optimize the low-dimensional representation of the data and its clustering in an end-to-end way and verifies the effectiveness of the proposed new algorithm on six real datasets.
KE Ao , WANG Yu-Cong , HU Bo-Yu , LIN Qi , LI Yong , SHUANG Feng
2024, 33(1):22-36. DOI: 10.15888/j.cnki.csa.009369 CSTR:
Abstract:Wildlife monitoring is essential for wildlife conservation and ecosystem maintenance, and wildlife detection and identification is the core technology to achieve monitoring. In recent years, with the rapid development and widespread application of computer vision technology, image-based non-contact methods have attracted extensive attention in the field of wildlife monitoring, and researchers have proposed various methods to solve different problems in this field. However, the complexity of wild environment still poses challenges for accurate detection and identification of wildlife. In order to promote research in this field, the existing image-based wildlife monitoring methods are reviewed in this study, which mainly include three sections: wildlife image acquisition methods, wildlife image preprocessing methods, and wildlife detection and recognition algorithms. These methods are discussed and classified according to the different processing mechanisms of image datasets and wildlife detection and recognition algorithms. Finally, the research hotspots and existing problems of wildlife monitoring based on deep learning are analyzed and summarized, and the prospect for future research priorities is proposed in the study.
LUO Min , GAO Jun-Tao , YAN Ting
2024, 33(1):37-48. DOI: 10.15888/j.cnki.csa.009377 CSTR:
Abstract:With the continuous evolution of computer technology, process simulation is becoming increasingly widely employed in various industries and utilizes simulation models to mimic business process behavior. Additionally, it can be adopted to predict and optimize system performance, assess the impact of decisions, provide a decision-making basis for managers, and reduce the experimental cost and time. Currently, how to efficiently develop a simulation model that can be trusted has caught widespread attention. This study traces, summarizes, and analyzes the relevant references on methods for building business process simulation models. Meanwhile, the processes, advantages, disadvantages, and progress of process model-based, system dynamics-based, and deep learning-based simulation modeling approaches are presented. Finally, the challenges and future directions of process simulation are discussed to provide references for future research in this field.
YANG Shi-Jie , SHUAI Yang , HAN Chao , ZHANG Wei-Ping
2024, 33(1):49-57. DOI: 10.15888/j.cnki.csa.009360 CSTR:
Abstract:Community detection for directed networks is an important topic in network science. Thus, this study proposes a semi-supervised community detection algorithm for directed networks based on non-negative matrix factorization (NMF). First, prior information is adopted to reconstruct the adjacency matrix and then penalize the community membership of nodes. Meanwhile, the influence of node degree heterogeneity is eliminated by row normalization, and finally, the objective function is solved using alternating iterative updates. Comparative experiments on real network datasets demonstrate the effectiveness of the proposed algorithm. Compared to existing NMF-based methods, this method can significantly improve community detection accuracy.
LIN Fei-Fan , LI Ling , XU Qiang
2024, 33(1):58-67. DOI: 10.15888/j.cnki.csa.009343 CSTR:
Abstract:In response to the key information blur in images and poor adaptability in the gastrointestinal endoscopy diagnosis and treatment system, this study proposes a cycle generative adversarial network (CycleGAN) combining an improved attention mechanism to accurately estimate the depth information of the digestive tract. Based on CycleGAN, the network combines a dual attention mechanism and introduces a residual gate mechanism and a non-local module to comprehensively capture and understand the feature structure and global correlation of input data, thereby improving the quality and adaptation of depth image generation. Meanwhile, a dual-scale feature fusion network is employed as the discriminator to improve the discrimination ability and balance the working performance between the generator and the discriminator. Experimental results show that the proposed method yields good prediction performance in the gastrointestinal endoscopy scenes. Its average accuracy of the stomach, small intestine, and colon datasets is improved by 7.39%, 10.17%, and 10.27% respectively compared with other unsupervised methods. Additionally, it can accurately estimate the relative depth information and provide accurate boundary information in the laboratory human gastric organ model.
LIAO Long-Long , ZHENG Zhi-Wei , ZHANG Yu-Peng , FANG Xin , ZHENG Yu-Qiang , XIONG Ning , YU Yuan-Long
2024, 33(1):68-75. DOI: 10.15888/j.cnki.csa.009372 CSTR:
Abstract:This study aims to meet the requirements of member working hours and efficiency analysis, and reasonable task allocation assessment in scientific research management of labs. It studies a multi-mode analysis system of research efficiency in labs named MASRE based on camera videos, attendance machines, and Web systems. Meanwhile, the system can motivate researchers to invest more time in academic studies by comparing and presenting actual work time, invalid work hours caused by phone abuse, and the research efficiency of researchers. Additionally, according to the research efficiency trends calculated by the system, the lab leaders can analyze whether the research tasks are allocated reasonably or not, and the researchers can explore the factors influencing their efficiency. The MASRE system comprises two core modules of the Web system module and the AI analysis module. The Web system module is responsible for work hours and efficiency statistics, and the AI analysis module supports the automatic identification of invalid work hours. The system is implemented by PyTorch, VUE 3, and MySQL. The work hour and efficiency analysis developed by this system and written by its research report are taken as an example to conduct experimental analysis. The results show that the MASRE system can identify invalid work hours and perform work hour statistics and efficiency analysis. Meanwhile, the system MASRE is now available at https://icnc-fskd.fzu.edu.cn/htower/, and research labs can apply for free use.
ZHU Xin-Jie , XIONG Feng-Guang , XIE Shuai-Kang , SONG Ning-Dong , LI Wen-Qing
2024, 33(1):76-86. DOI: 10.15888/j.cnki.csa.009358 CSTR:
Abstract:This study proposes a cross feature fusion and RASPP-driven scene segmentation method to address the edge segmentation errors and feature discontinuity caused by target diversity and scale inconsistency in the scenes. This method combines the multi-scale features output by the encoder in the way of cross feature fusion and employs the compound convolution attention module to process high-level semantic information fusion. As a result, this avoids the feature information loss caused by the upsampling operation and the influence of noise and refines the segmentation effect of target edges. Meanwhile, this study proposes a depthwise separable convolution combining residual connections. Based on this, a pyramid pooling module RASPP combining residuals is designed and implemented to process the features after cross fusion, obtain contextual information at different scales, and enhance feature semantic expression. Finally, the features processed by the RASPP module are merged to improve the segmentation effect. The experimental results on the Cityscapes and CamVid datasets show that the proposed method outperforms existing methods and has better segmentation performance on target edges in the scenes.
WANG Zhi-Ge , LI Wang-Gen , XIA Yi-Chun , GAO Kun , SHU Yang , GE Ying-Kui
2024, 33(1):87-98. DOI: 10.15888/j.cnki.csa.009353 CSTR:
Abstract:Predicting click-through rate (CTR) is a fundamental task in online advertising and recommendation systems. Mainstream models often enhance performance and generalization by modeling interactions between high-order and low-order features. However, many models only learn fixed representations of each feature, neglecting the importance of features in different contexts and having overly simplistic model structures. To address these issues, this study proposes the feature refinement convolutional neural network-fusion matrix factorization (FRCNN-F) model. Firstly, the study integrates the feature generation module of convolutional neural networks into the feature refinement network (FRNet), leveraging its ability to generate new features by recombining local patterns to enhance important feature selection. Secondly, the study designs the fusion matrix factorization mechanism to enable the model to perceive context and model displays through interactions across different scenarios, thereby enhancing the combination of submodels. Finally, through comparative experiments on the publicly available datasets Frappe and MovieLens, the results demonstrate that the FRCNN-F model outperforms the baseline FRNet, with improvements of 0.32% and 0.40% in AUC scores and reductions of 1.50% and 1.11% in cross-entropy loss (Logloss) respectively. This research has practical applications in achieving precise advertising and personalized recommendations.
CHEN Qing-Yu , ZHANG Yan-Yan , ZHAO Wei-Yu
2024, 33(1):99-109. DOI: 10.15888/j.cnki.csa.009366 CSTR:
Abstract:Traffic data loss is common in network systems and is usually caused by sensor failure, transmission errors, and storage loss. The existing data repair methods cannot learn the multi-dimensional characteristics of traffic data. Therefore, this study proposes a dual-channel parallel architecture that combines bidirectional long short-term memory (LSTM) networks with multi-scale convolutional networks (ST-MFCN) for filling the missing values in traffic data. Meanwhile, a novel adversarial loss function is designed to further improve the prediction accuracy, which allows the model to effectively learn the temporal and dynamic spatial features of traffic data. Additionally, the model is tested on the Web traffic time series dataset and compared with the existing repair methods. Experimental results demonstrate that ST-MFCN can reduce data recovery errors and improve data repair accuracy, providing a robust and efficient solution for traffic data repair in network systems.
2024, 33(1):110-118. DOI: 10.15888/j.cnki.csa.009371 CSTR:
Abstract:In the Transformer model, the convolutional vision Transformer (CvT) has caught attention for its ability to extract both local and global features from images simultaneously. However, for abdominal organ segmentation tasks, the blurry object boundaries in CNN models should be addressed. Thus, this study proposes a novel dual-branch closed-loop segmentation model DBLNet based on CvT and CNN. The model employs explicit supervision of segmented contours using shape priors and predicted results to guide the network learning. The DBLNet model includes contour extraction encoding module (CEE), boundary shape segmentation network (BSSN), and closed-loop structure. The CEE module first utilizes modified 3D CvT and 3D gated convolutional layers (GCL) to capture multi-level contour features and assist in BSSN training. The BSSN module contains a shape feature fusion (SFF) module that captures both the object region and contour features to promote CEE training convergence. The closed-loop structure allows mutual feedback of segmentation results between the dual branches, assisting each other’s training. Experimental evaluations on the BTCV benchmark show that DBLNet achieves an average Dice score of 0.878, ranking 13th. Application tests on clinical hospital data demonstrate the strong performance of the proposed model.
2024, 33(1):119-126. DOI: 10.15888/j.cnki.csa.009361 CSTR:
Abstract:This study proposes a cross-modal fusion dual attention net (CFDA-Net) for brain tumor image segmentation to solve the insufficient multi-modal information fusion of brain tumors and detail loss of the tumor regions. Based on the encoder-decoder architecture, a new convolutional block with dense blocks and large kernel attention parallel is first adopted in the encoder branch, which can effectively fuse global and local information and prevent the gradient vanishing during backpropagation. Secondly, a multi-modal deep fusion module is added to the left sides of the second, third, and fourth layers of the encoder to effectively utilize the complementary information among different modalities. Then, in the decoder branch, Shuffle Attention is adopted to group the feature maps and aggregate them, and the subfeatures of the group are divided into two parts to obtain important attention features of space and channels. Finally, binary cross entropy (BCE), Dice Loss, and L2 Loss are employed to form a new hybrid loss function, which alleviates the category imbalance of brain tumor data and further improves the segmentation performance. The experimental results on the BraTS2019 brain tumor dataset show that the average Dice coefficient values of the model in the whole tumor region, tumor core region, and tumor enhancement region are 0.887, 0.892, and 0.815 respectively. The proposed model has better segmentation performance in the core and enhanced regions of tumors than other advanced segmentation methods such as ADHDC-Net and SDS-MSA-Net.
SUN Yi-Yang , CHEN Zhi-De , FENG Chen , ZHU Ke-Xin
2024, 33(1):127-133. DOI: 10.15888/j.cnki.csa.009347 CSTR:
Abstract:Anomaly detection in multivariate time series is a challenging problem that requires models to learn information representations from complex temporal dynamics and derive a distinguishable criterion that can identify a small number of outliers from a large number of normal time points. However, in time series analysis, the complex temporal correlation and high dimensionality of multivariate time series will result in poor anomaly detection performance. To this end, this study proposes a model based on MLP (multi-layer perceptron) architecture (UMTS-Mixer). Since the linear structure of MLP is sensitive to order, it is employed to capture temporal correlation and cross-channel correlation. A large number of experiments show that UMTS-Mixer can detect time series anomalies and perform better on the four benchmark datasets. Meanwhile, the highest F1 is 91.35% and 92.93% on the MSL and PSM datasets, respectively.
LIU Ji-Chi , LYU Hou-Kun , LI Wei
2024, 33(1):134-140. DOI: 10.15888/j.cnki.csa.009364 CSTR:
Abstract:To solve the limited resources and long time of detection equipment in detecting surface damage of steel cables, this study applies advanced technology of deep learning and convolutional neural networks (CNNs) to surface damage detection of the cables. On this basis, it proposes a YOLO-based defect detection network model to integrate GhostNet into the backbone network, and a new feature extraction module (ShuffleC3) based on ShuffleNet and attention mechanism, and then prunes and improves the Head part. Experimental results show that compared with the baseline YOLOv5s, the average accuracy of the improved network is increased by 1.1%. In addition, the number of parameters and calculations are reduced by 43.4% and 31% respectively, and the model size is reduced by 42.3%. Thus, the proposed model can reduce the network computing cost and maintain higher identification accuracy, which better meets the requirements for surface damage detection of steel cable materials.
ZHU Geng-Xin , CHENG Yuan-Zhi , LIU Hao
2024, 33(1):141-147. DOI: 10.15888/j.cnki.csa.009374 CSTR:
Abstract:This study designs the dynamic fully connected layer (DyFC) to enhance the feature fusion, which redefines the weights and biases by adopting base vectors to represent the new weights and biases. The coefficients of the base vectors are learned based on each input feature, and the weights and biases are no longer shared but unique, which provides more directional expressiveness for each feature. In this study, a dual-stream mapping architecture model IUINet is proposed. IUINet combines the 3DShift operation and spatial separable convolution to achieve medical image segmentation tasks and maintain a balance between accuracy and efficiency. The proposed IUINet follows an encoder-decoder structure, where the encoder consists of two parts. One part includes the Shift operation and pointwise Conv1×1 operation, and the other part incorporates spatial separable convolution operation. IUINet utilizes multi-scale inputs and multi-scale feature mapping layers to improve the backpropagation speed and reduce the average backpropagation distance. Finally, this enhances the model accuracy, improves generalization ability, and reduces overfitting.
WANG Zheng-Hong , WANG Dan , HU Rong-Jun
2024, 33(1):148-156. DOI: 10.15888/j.cnki.csa.009357 CSTR:
Abstract:In the context of complex structures and blurred cell boundaries in microscopic breast cancer histopathological images, traditional threshold-based segmentation faces challenges in accurately separating lesion areas of breast cancer images. To address this issue, this study proposes a multi-threshold segmentation method for breast cancer images based on the improved dandelion optimization algorithm (IDO). This method introduces the IDO to calculate the maximum inter-class variance (Otsu) as the objective function for finding the optimal thresholds. The IDO incorporates a defensive strategy to address the issue of unbounded search in the traditional dandelion optimization algorithm (DO) that extends beyond pixel ranges. Additionally, opposition-based learning (OBL) is introduced to prevent the algorithm from getting trapped in local optima. The experimental results indicate that compared with the Harris Hawks optimization (HHO), gorilla troop optimization (GTO), traditional DO, and marine predators algorithm (MPA), the IDO algorithm achieves the highest fitness value and fastest convergence under the same number of threshold levels. Moreover, it outperforms other comparative algorithms in terms of peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) , and feature similarity index (FSIM).
LIN Cheng-Zhang , WU Tao , ZHOU Qi-Zhao , CHEN Xi
2024, 33(1):157-166. DOI: 10.15888/j.cnki.csa.009373 CSTR:
Abstract:Aiming at insufficient user fairness in UAV-assisted mobile edge computing systems, this study proposes a user fairness-oriented 3D deployment and unloading optimization algorithm. The algorithm comprehensively considers the effects of user matching, 3D UAV deployment, computing resource allocation, and unloading factors on the total system delay and user fairness. Meanwhile, a multivariate optimization problem is established to minimize the total system delay, and a two-stage joint optimization algorithm is put forward for this problem. In the first stage, a clustering algorithm with balanced constraints is adopted to solve the problem of user matching and horizontal UAV deployment. In the second stage, the convex optimization algorithm is utilized to iteratively solve the UAV altitude deployment, resource allocation, and optimization problems of unloading factors. The experimental results show that the proposed algorithm has better performance than the four benchmark algorithms in both total system latency and user fairness.
WANG Yu-Biao , TAO Ba-Mei , LI Heng , TAO Zhi-Hong
2024, 33(1):167-176. DOI: 10.15888/j.cnki.csa.009310 CSTR:
Abstract:In view of problems such as discreteness and sparsity in the massive data accumulated by “campus big data”, how to detect potential students with abnormal behavior from the campus student groups with a large base, wide activity ranges, and strong personality has become an urgent issue to be solved in the analysis of abnormal behavior of students. This study proposes an early warning method for abnormal behavior of college students based on multi-modal fusion in big data environment (EWMAB). First of all, in view of the insufficient representation of student behavior portraits and the timeliness and dynamics of behavior labels, a cross-modal student behavior portrait model based on multi-modal feature deep learning is established; secondly, for the timeliness and post-alarm of the prediction and early warning of abnormal behavior of students, a multi-modal fusion-based early warning method for student abnormal behaviors is proposed based on the student behavior portrait and student behavior classification prediction. Through the long and short term memory network (LSTM), combined with student behavior multi-index data and text information, the problem of early warning of students’ abnormal behaviors is solved; finally, this study uses an example to verify the model and takes the early warning of abnormal academic performance of students as an example. Compared with other early warning algorithms, the EWMAB method can improve the accuracy of early warning and realize the timeliness and pre-alarm of abnormal behaviors of students so that the education of students is more targeted, personalized, and predictable.
MENG Fan-Lin , HE Xiao-Xi , LIU Ying-Hu , LI Jia-Ru , ZHU Qun
2024, 33(1):177-184. DOI: 10.15888/j.cnki.csa.009341 CSTR:
Abstract:Due to the disorder and lack of topological information, the classification and segmentation of 3D point clouds is still challenging. To this end, this study designs a 3D point cloud classification algorithm based on the self-attention mechanism to learn point cloud feature information for object classification and segmentation. Firstly, a self-attention module suitable for point clouds is designed for feature extraction. A neighborhood graph is constructed to enhance the input embedding, and the local features are extracted and aggregated by utilizing the self-attention mechanism. Finally, the local features are combined via multi-layer perceptron and encoder-decoder approaches to achieve 3D point cloud classification and segmentation. This method considers the local context information of individual points in the point cloud during input embedding, constructs a network structure under local long distances, and ultimately yields more distinctive results. Experiments on datasets such as ShapeNetPart and RoofN3D demonstrate that the proposed method performs better in classification and segmentation.
CAO Chen-Guang , XU Xiao-Zhong
2024, 33(1):185-191. DOI: 10.15888/j.cnki.csa.009365 CSTR:
Abstract:Gas load forecasting is an important task for cities to deploy gas safely and economically. At present, the Seq2Seq model based on the attention mechanism is increasingly utilized in gas data forecasting and is an effective method for gas load forecasting. However, the gas load data have such characteristics as high mutation frequency and large amplitude. The Seq2Seq model based on the general attention mechanism is difficult to extract the multivariate time pattern information in the data and deal with data random mutation. It is still necessary for improving gas load prediction with complex influencing factors. Therefore, this study proposes a multi-dimensional attention mechanism Seq2Seq model. On the one hand, a multi-level time attention module is designed and studied to integrate single-time step and multi-time step attention calculation to extract different time pattern information in the data. On the other hand, the design adds a local history attention module. By improving the model’s defect of distinguishing important historical information, the model tends to refer to more important historical information when making predictions. The improved model has better prediction performance for the unique gas load characteristics. The gas consumption data of an urban area in China and the electric load data of the 2016 electrical mathematical modeling competition are taken as examples. The experimental results show that the MAE of the improved model is reduced by 17% and 9% respectively compared with the general attention mechanism Seq2Seq model.
FENG Xing-Sheng , LIU Yong , TANG Lei , LIU Wen-Xing
2024, 33(1):192-198. DOI: 10.15888/j.cnki.csa.009367 CSTR:
Abstract:Instance segmentation of 3D point clouds is a critical preprocessing step in industrial automation. However, there are often many occlusions in industrial grasping scenarios, which makes it difficult for instance segmentation networks of 3D point clouds to distinguish between similar objects. To this end, this study proposes an improved algorithm based on FPCC. This algorithm has two branches, including a center point branch for inferring the center points of instances and an embedded feature branch for describing point features. The segmentation results are obtained by clustering algorithms. The feature enhancement (FEH) module plays a crucial role in improving the accuracy of center point prediction. This module employs FEH methods to improve the prediction accuracy and further modifies the loss function for center point prediction. Experimental results show that compared with the FPCC algorithm, the improved algorithm increases the Precision and Recall values by 10% and 15% respectively.
WANG Hao , XIONG Shu-Hua , HE Hai-Bo , WU Xiao-Hong , TENG Qi-Zhi
2024, 33(1):199-205. DOI: 10.15888/j.cnki.csa.009375 CSTR:
Abstract:During petroleum exploration, core particles are effective data for studying geological sequence, evaluating oil and gas contents, and understanding geological structures. The extraction of core particle images is conducive to the further analysis of geological researchers. The core particle images usually have blurred particle edges, and complex backgrounds and particle colors. To improve the extraction effect of core particles, this study designs a core image particle extraction algorithm based on the improved UNet3+. This algorithm adds the receptive field module (RFB) after each coding layer of UNet3+ to expand the receptive field of the network, thus solving the low segmentation accuracy caused by the limited receptive field of the network. Meanwhile, the convolutional block attention module (CBAM) is embedded after the RFB module to make the network focus on the target region more accurately and improve the feature weight of the target region. The experimental results show that compared with the original UNet3+ network, the improved algorithm yields a good segmentation effect on the core particle images, improving mIoU, mPA, and FWIoU by 5.43%, 2.99%, and 5.34%, respectively.
MENG Xiu-Jian , QIAO Huan-Huan , WANG Ya , CHENG Xiao
2024, 33(1):206-212. DOI: 10.15888/j.cnki.csa.009378 CSTR:
Abstract:In order to address the problem that existing image dehazing algorithms cannot simultaneously consider both dehazing effects and real-time performance when processing road traffic images, a fast all-in-one dehazing network (AOD-Net) algorithm is improved in this study. Firstly, SE channel attention is added to the AOD-Net to adaptively allocate channel weights and focus on important features. Secondly, a pyramid pooling module is introduced to enlarge the receptive field of the network and fuse the features in different scales, so as to better capture image information. Finally, a composite loss function is used to simultaneously focus on image pixel information and structural texture information. Experimental results show that the improved AOD-Net algorithm increases the peak signal-to-noise ratio (SNR) of road traffic images by 2.52 dB after dehazing, and the structural similarity reaches 91.2%. The algorithm complexity and dehazing time are slightly increased, but still meet real-time requirements.
CHEN Guo-Jun , LI Zi-Xiang , FU Yun-Peng , LI Zhen-Shuo
2024, 33(1):213-218. DOI: 10.15888/j.cnki.csa.009363 CSTR:
Abstract:Under a large data amount of sampling points, Delaunay triangulation can be adopted to establish a triangulation network and then employ local neighborhood sampling points for Kriging interpolation. However, this algorithm requires fitting a semi-variogram to each interpolation point, which incurs significant overhead in the condition of a large interpolation point scale. Therefore, this study proposes a Kriging interpolation method that fits the semi-variogram on a triangular basis. Additionally, it utilizes CPU-GPU load balancing to optimize some calculations and fully considers the influence of non-uniform samples on the Kriging interpolation effect. The results show that the proposed algorithm can ensure the interpolation effect of non-uniform sample sets, improve computational performance, and ensure high accuracy.
LI Hong-Yu , ZHANG Yi-Fei , YANG Dong-Bao
2024, 33(1):219-230. DOI: 10.15888/j.cnki.csa.009362 CSTR:
Abstract:Self-supervised learning on RGB-D datasets has attracted extensive attention. However, most methods focus on global-level representation learning, which tends to lose local details that are crucial for recognizing the objects. The geometric consistency between image and depth in RGB-D data can be used as a clue to guide self-supervised feature learning for the RGB-D data. In this study, ArbRot is proposed, which can not only rotate the angle without restriction and generate multiple pseudo-labels for pretext tasks, but also establish the relationship between global and local context. The ArbRot can be jointly trained with contrastive learning methods for establishing a multi-modal, multiple pretext task self-supervised learning framework, so as to enforce feature consistency within image and depth views, thereby providing an effective initialization for RGB-D semantic segmentation. The experimental results on the datasets of SUN RGB-D and NYU Depth Dataset V2 show that the quality of feature representation obtained by multi-modal, arbitrary-orientation rotation self-supervised learning is better than the baseline models.
ZHANG Zhi-Qiang , ZHAO Ke-Hui , NIU Hui-Fang , ZHANG Zi-Yu , ZHOU Lian-Tian
2024, 33(1):231-244. DOI: 10.15888/j.cnki.csa.009368 CSTR:
Abstract:In recent years, diabetic retinopathy (DR) has become the main reason for the global blind population increase. The early DR severity classification is particularly important to prevent vision loss in DR patients. As the number of diabetes patients grows year by year, the demand for DR grading is also rising. However, the traditional manual grading cannot meet the growing demands, and it is time-consuming and laborious. The development of deep learning technology provides a more efficient and reliable means for DR detection and grading. Although the current DR binary detection has yielded good results, DR severity grading is still challenging due to the slight differences between DR complexity and lesion degree. This work studies and summarizes DR grading methods in recent years. It introduces six deep learning classification methods based on VGG, InceptionNet, ResNet, EfficientNet, DenseNet, and CapsNet models. In addition, the study presents DR grading methods based on multi-network fusion. Finally, summary and prospect are provided for the research trends of DR grading methods based on deep learning.
MA Yu-Xin , XU Yin-Long , LI Cheng , ZHONG Jin
2024, 33(1):245-253. DOI: 10.15888/j.cnki.csa.009283 CSTR:
Abstract:Graph neural network (GNN) has become an important method for handling graph data. Due to the complexity of calculation and large capacity of graph data, training GNNs on large-scale graphs relies on CPU-GPU cooperation and graph sampling, which stores graph structure and feature data in CPU memory and transfers sampled subgraphs and their features to GPU for training. However, this approach faces a serious bottleneck in graph feature data loading, leading to a significant decrease in end-to-end training performance and severely limiting graph scale that can be trained as graph features take up too much memory. To address these challenges, this study proposes a data loading approach based on input feature sparsification, which significantly reduces CPU memory usage and data transfer across the PCIe bus, significantly shortens data loading time, accelerates GNN training, and enables full utilization of GPU resources. In view of the graph features and GNN computational characteristics, the study proposes a sparsification method suitable for the graph feature data, which achieves a balance between compression ratio and model accuracy. The study also conducts experimental evaluations on three common GNN models and three datasets of different sizes, including MAG240M, one of the largest publicly available datasets. The results show that this method reduces the feature size by more than one order of magnitude and achieves 1.6–6.7 times end-to-end training acceleration, while the model accuracy is reduced by less than 1%. In addition, with only four GPUs, the GraphSAGE model can be trained on the MAG240M in just 40 minutes with expected accuracy.
GUO Sheng , CAI Shan , ZOU Xue , ZHOU Zhen-Sheng , WANG Lin
2024, 33(1):254-262. DOI: 10.15888/j.cnki.csa.009352 CSTR:
Abstract:Facial expression recognition (FER) has widespread application significance in many fields, but it is difficult to extract effective FER features due to local occlusion during the recognition. FER with local occlusion may require expression features of multiple regions, and a single attention mechanism cannot focus on the features of multiple facial regions simultaneously. To this end, this study proposes a local occlusion FER model based on weighted multi-head parallel attention. The model extracts the expression features of multiple facial regions that are not occluded by multiple channels in parallel-spatial attention, alleviating the occlusion interference on expression recognition. A large number of experiments show that the proposed method yields the best performance compared with many advanced methods, and the accuracy on RAF-DB and FERPlus is 89.54% and 89.13%, respectively. On the occluded datasets Occlusion-RAF-DB and Occlusion-FERPlus, the accuracy is 87.47% and 86.28%, respectively. Therefore, this method has strong robustness.
2024, 33(1):263-271. DOI: 10.15888/j.cnki.csa.009382 CSTR:
Abstract:High-speed rail (HSR) has gradually become a popular travel option, and passengers have high demand for streaming media services during HSR travel. However, in high-speed mobile scenarios, user bandwidth jitter is severe, and user media experience cannot be guaranteed. To this end, a cross-layer optimization method for adaptive cloud collaborative transmission of streaming media, based on DASH protocol, is proposed in this study. Firstly, a cross-layer architecture for adaptive cloud cooperative transmission of streaming media, based on DASH protocol, is proposed, and a QoE model for users in high-speed rail environment is suggested. Next, on this basis, a cross-layer optimization model for adaptive cloud collaborative transmission of streaming media, based on DASH protocol, is constructed, and a cross-layer adaptive bitrate selection algorithm for cloud collaborative transmission of streaming media, based on DASH protocol, is proposed to improve the user’s media experience. Finally, the simulation experiment results show that the method proposed in this study can greatly improve the media experience of HSR passengers, and is helpful for the optimization study of the transmission of streaming media in high-speed mobile scenarios.
YANG Pan , SU Bo , LIU Min-Xian , YE Chuan-Tao , HU Yi-Ling , ZHANG Wei
2024, 33(1):272-279. DOI: 10.15888/j.cnki.csa.009354 CSTR:
Abstract:This study presents a proposal to improve the delegated proof of stake consensus mechanism based on dynamic weighted election, so as to mitigate issues such as the lack of initiative in user nodes, collusion among nodes, difficulty in suppressing malicious node appearance, and increased centralization risk. Firstly, a system of rewards and penalties is established for user nodes to incentivize users’ participation in the election process. Moreover, an address clustering algorithm of user nodes is introduced to identify user nodes exhibiting similar voting behavior, effectively curbing undesirable voting actions of user nodes. The enhanced entropy weighting method is utilized to dynamically calculate the weights of each candidate node’s features during each round of the election process. The voting results of user nodes are combined with the performance distance algorithm to rank the candidate node, leading to more rational election results. Subsequently, in the block production process, the production order of production nodes is dynamically adjusted to avoid the centralization risk. Finally, the feasibility and effectiveness of the proposed scheme are validated through simulation. The results demonstrate that the proposed scheme can not only incentivize user nodes but also limit the bad behavior of nodes, effectively reducing the probability of malicious nodes and avoiding centralization risk.
ZHOU Jing , CUI Can-Can , WANG Meng-Di , WANG Ze-Min
2024, 33(1):280-288. DOI: 10.15888/j.cnki.csa.009370 CSTR:
Abstract:Medical terminology standardization, as an important means to eliminate entity ambiguity, is widely used in the process of building knowledge graphs. Aiming at the problem that the medical field involves a large number of professional terminology and complex expressions, and the traditional matching models are often difficult to achieve a high accuracy rate, a two-stage model of semantic recall and precise sorting is proposed to improve the standardization effect of medical terminology. First, in the semantic recall stage, a semantic representation model CL-BERT is proposed based on the improved supervised contrastive learning and RoBERTa-wwm. The semantic representation vector of an entity is generated through CL-BERT, and recall is carried out according to the cosine similarity between the vectors, so as to obtain the standard word candidate set. Secondly, in the precise sorting stage, T5, combined with prompt tuning, is used to build a precise semantic matching model, and FGM confrontation training is applied to the model training; next, the precise matching model is used to precisely sort the original word and standard word candidate sets, so as to obtain the final standard words. The ccks2019 public data set is used for experiments, achieving an F1 value of 0.920 6. The experimental results show that the proposed two-stage model showcases high performance, and provides a new idea for medical terminology standardization.
2024, 33(1):289-296. DOI: 10.15888/j.cnki.csa.009383 CSTR:
Abstract:To address the problem that existing knowledge graph-based recommendation models only perform feature extraction from one end of users or items, missing the feature extraction from the other end, a bipartite knowledge-aware graph convolution recommendation model based on knowledge graph is proposed. First, the initial feature representation is obtained by random initialization characterization of users, items and entities in the knowledge graph; then, a user and item-based knowledge-aware attention mechanism is used to simultaneously extract features from both users and items in the knowledge graph; next, a graph convolutional network is used to aggregate feature information in the knowledge graph propagation process using different aggregation methods and predict the click-through rate; finally, the effectiveness of the model is verified by comparing it with four baseline models on two publicly available datasets, Last.FM and Book-Crossing. On the Last.FM dataset, AUC and F1 improve by 4.4% and 3.8% respectively, and ACC improves by 1.1%, compared with the optimal baseline model. On the Book-Crossing dataset, AUC and F1 improve by 1.5% and 2.2% respectively, and ACC improves by 1.4% . The experimental results show that the model in this study has better robustness than other baseline models in AUC, F1 and ACC metrics.
2024, 33(1):297-303. DOI: 10.15888/j.cnki.csa.009381 CSTR:
Abstract:Water pollution seriously affects the water landscape and water ecology. In this study, a deep-wise convolution and cross attention (DCCA) algorithm module is proposed to address the issues of complex water surface scenes and difficulty in extracting features of small target pollutants in the process of identifying water surface pollution. The use of deep-wise convolution reduces the parameters and computational complexity of the model, and establishes relationships between feature maps at different scales using cross attention, enabling the model to better understand contextual information and improve its ability to recognize complex scenes and small targets. The experimental results show that the average accuracy has been improved by 1.8% after adding the DCCA module, reaching 88.7%. The detection effect of water surface pollution has been improved by using less memory occupation.