LIU Tian-Yu , LIU Jing , MA Jin-Gang , CHEN Tian-Zhen , LI Ming
2024, 33(12):1-15. DOI: 10.15888/j.cnki.csa.009708 CSTR: 32024.14.csa.009708
Abstract:Skin cancer is one of the most common and deadliest types of cancer, with its incidence rapidly increasing worldwide. Failure to diagnose it in its early stages can lead to metastasis and high mortality rates. This study provides a systematic review of recent literature on the application of traditional machine learning and deep learning in the diagnosis of skin cancer lesions, providing valuable reference for further research in skin cancer diagnosis. Firstly, several publicly available datasets of skin diseases are compiled. Secondly, the application of different machine learning algorithms in the classification of skin cancer lesions is analyzed and compared to better understand their advantages and limitations in practical applications, with a focus on convolutional neural network in diagnosis classification. With a thorough understanding of these algorithms, their performance differences and improvement strategies in dealing with skin diseases are discussed. Ultimately, through discussions on current challenges and future directions, beneficial insights and recommendations are provided to further enhance the performance and reliability of early skin cancer diagnosis systems.
2024, 33(12):16-29. DOI: 10.15888/j.cnki.csa.009711 CSTR: 32024.14.csa.009711
Abstract:In mobile edge computing (MEC), load imbalance among edge servers occurs due to irrational task offloading strategies and resource allocation, as well as a sharp increase in the number of multi-type tasks. To address the above-mentioned issues, this study proposes a load prediction and balanced assignment scheme for multi-type tasks (LBMT) in a multi-user, multi-MEC edge environment. The LBMT scheme includes three components: task type classification, task load prediction, and task adaptive mapping. Firstly, considering the diversity of task types, a task type model is designed to classify tasks. Secondly, a task load prediction model is developed, considering the varying loads imposed by different tasks on servers, and employs an improved K-nearest neighbor (KNN) algorithm for load prediction. Thirdly, taking into account the heterogeneity of MEC servers and the limitation of resources, a task allocation model is designed in conjunction with a server load balancing model. Additionally, a task allocation method based on an adaptive task mapping algorithm is proposed. Finally, the LBMT scheme optimizes resource utilization and task processing rates for MEC servers to achieve the optimal load-balanced task offloading strategy. Simulation experiments compare LBMT with improved min-min offloading, intermediate node-based offloading, and weighted bipartite graph-based offloading schemes. The results show that LBMT improves the resource utilization rate by more than 12.5% and the task processing rate by more than 20.3%. Additionally, LBMT significantly reduces the standard deviation of load balancing, more effectively achieving load balance among servers.
LIANG Jin-Xin , LI Wei , TANG Zheng-Yi , LI Zuo-Yong
2024, 33(12):30-42. DOI: 10.15888/j.cnki.csa.009727 CSTR: 32024.14.csa.009727
Abstract:Computed tomography (CT) scanning provides valuable material for detecting hepatic lesions in the liver. Manual detection of hepatic lesions is laborious and heavily relies on the expertise of physicians. Existing algorithms for liver lesion detection exhibit suboptimal performance in detecting subtle lesions. To address this issue, this study proposes a self-supervised liver lesion detection algorithm based on frequency-aware image restoration. Firstly, this algorithm designs a self-supervised task based on synthetic anomalies to generate a broader and more suitable set of pseudo-anomalous images, thereby alleviating the issue of insufficient abnormal data during model training. Secondly, to suppress the sensitivity of the reconstructed network to synthetic liver anomalies, a module is designed to extract high-frequency information from images. By restoring the images from their high-frequency components, the adverse generalization of the reconstructed network to anomalies is mitigated. Lastly, the algorithm adopts weight decay to train the segmented sub-networks, reducing the occurrence of trivial solutions during the early stages of training and enabling the detection of local and subtle lesions. Extensive experiments conducted on publicly available real datasets demonstrate that the proposed method achieves state-of-the-art performance in liver lesion detection.
CHENG Yong , CHENG Yao , WANG Jun , YANG Ling , XU Xiao-Long , GAO Yuan-Yuan , ZHANG Kai-Hua
2024, 33(12):43-54. DOI: 10.15888/j.cnki.csa.009698 CSTR: 32024.14.csa.009698
Abstract:Group activity recognition (GAR) is one of the highly researched areas in the field of computer vision, aiming to detect the overall behavior performed by multiple individual actions and interactions. However, due to difficulties in determining individual interaction relationships, the tightness of connections, and the key actor, current methods often focus on individual character features, yet neglecting connections with scene context. To address that issue, a novel reasoning model for GAR, GIFFNet, is proposed based on global-individual feature fusion (GIFF). To compensate for the lack of scene information in predicting group activity, GIFFNet, on the basis of focusing on key information, effectively integrates scene context and individual character features by constructing the GIFF module, obtaining more representative fusion features. Subsequently, GIFFNet utilizes fusion features to calculate the interaction relationship graph between characters in the scene and uses graph convolutional network (GCN) for training and predicting group behavior categories. In addition, to address the issue of imbalanced samples in the dataset, GIFFNet adopts a strategy of dynamically assigning weights to optimize the loss function. Experimental results demonstrate that GIFFNet achieves a multi-class classification accuracy (MCA) of 93.8% and 96.1% on Volleyball and Collective Activity datasets, and the mean per class accuracy (MPCA) is 93.9% and 95.8%, respectively, outperforming other existing deep learning methods. GIFFNet provides features with a more powerful characterization ability for activity classification through feature fusion, which effectively improves GAR accuracy.
JIANG Kui , HUANG Rui-Bin , DENG Zhao-Rui , WU Bo , ZHU Si-Lin
2024, 33(12):55-66. DOI: 10.15888/j.cnki.csa.009691 CSTR: 32024.14.csa.009691
Abstract:As an Internet infrastructure, DNS is rarely subjected to deep monitoring by firewalls, allowing hackers and Asia-Pacific Telecommunity (APT) organizations to exploit DNS covert tunnels for data theft or network control and posing a significant threat to network security. In response to the easily bypassed nature of existing detection methods and their weak generalization capabilities, this study enhances the characterization method of DNS traffic and introduces the pcap features extraction CNN-Transformer (PFEC-Transformer) model. This model uses characterized decimal numerical sequences as input, conducts local feature extraction through CNN modules, and then analyzes long-distance dependency patterns between local features by using the Transformer for classification. The research builds datasets by collecting internet traffic and data packets generated by various DNS covert tunnel tools and conducts generalization testing with publicly available datasets containing traffic from unknown tunneling tools. Experimental results demonstrate that the model achieves an accuracy of 99.97% on the testing dataset and 92.12% on the generalization testing dataset, effectively showcasing its exceptional performance in detecting unknown DNS covert tunnels.
LI Ze-Peng , LUO Yuan-Xin , SUN Jia-Ning , CHEN Hong , LI Cheng
2024, 33(12):67-77. DOI: 10.15888/j.cnki.csa.009713 CSTR: 32024.14.csa.009713
Abstract:Braille conversion technology is crucial for advancing information accessibility for the blind. With the rapid advancement of information globalization, the blind are increasingly exposed to bilingual information in both Chinese and English. While existing braille conversion systems have successfully translated Chinese and English into braille, they fall short in accurately converting punctuation, including poor differentiation of punctuation with multiple uses and lack of error correction for the mixed use of Chinese and English punctuation. Failure to address these issues may lead to misunderstanding of text by the blind. This study delves into these problems, designing and implementing a bilingual braille conversion system capable of distinguishing multipurpose punctuation and correcting the mixed use of punctuation. The performance of the system is evaluated by using a dataset based on BLCU Chinese Corpus. The results demonstrate that the proposed system accurately distinguishes multipurpose punctuation and corrects the mixed use of Chinese and English punctuation according to language types and context, outperforming other braille conversion systems. Overall, this research has significant potential for promoting information accessibility in China.
LIAN Yu-Han , LIAO Sheng-Yang , ZHANG Kun-San , ZOU Wei-Fu , LIN Nan
2024, 33(12):78-88. DOI: 10.15888/j.cnki.csa.009714 CSTR: 32024.14.csa.009714
Abstract:The Hadoop system is widely used as a distributed architecture for big data storage. It generates a large amount of log data during runtime to record device anomalies, which provides important clues for locating and analyzing problems. However, traditional log anomaly detection models typically collect log data on a central server, which introduces the risk of sensitive information leakage during data collection. Federated learning, a novel machine learning paradigm, effectively protects data privacy by training models on local servers and aggregating model parameters only on a central server. This study proposes a log anomaly detection architecture based on federated learning, which combines local and central servers to perform detection tasks, avoiding the risk of leaking sensitive information during network transmission. Additionally, it employs a tree parser to standardize log templates. To effectively capture complex patterns and anomalous behaviors in log data, a BiLSTM model based on the self-attention mechanism is established as a local server model. To validate the effectiveness of the proposed method, simulation experiments are conducted using publicly available datasets of distributed systems. The results demonstrate that the model maintains stable comprehensive evaluation metrics, with an accuracy rate above 93%, indicating high applicability.
2024, 33(12):89-96. DOI: 10.15888/j.cnki.csa.009597 CSTR: 32024.14.csa.009597
Abstract:Due to the popularity of electric vehicles, more and more electric vehicles are illegally modified with rain shields. However, this modification increases safety hazards. Firstly, rain shields block riders’ view, increasing the risk of accidents. Secondly, rain shields can inadvertently scratch pedestrians when the modified vehicles are at excessive speeds, posing a great safety hazard and a serious threat to traffic safety. This study proposes an improved YOLOv7-tiny algorithm for detecting illegally modified electric vehicles. Firstly, a BiFormer attention mechanism is added to the network structure, enabling the model to capture more details of electric vehicles and focus more on smaller target information. Secondly, an improved feature pyramid structure is combined with the tensor concatenation of a feature fusion network to enhance the detection ability of the model for small- and medium-sized targets. Finally, the ELAN and SPPCSPC modules of the framework are optimized, which improves the detection accuracy of small- and medium-sized targets and enhances the effectiveness of feature extraction without adding too many parameters.
SHI Xin-Yu , LIN Shan-Ling , LIU Ke , LIN Jian-Pu , LYU Shan-Hong , LIN Zhi-Xian , GUO Tai-Liang
2024, 33(12):97-105. DOI: 10.15888/j.cnki.csa.009706 CSTR: 32024.14.csa.009706
Abstract:Most current recommendation models often overlook the importance of features during feature interactions, leading to low accuracy. To address this issue, an enhanced recommendation model combining feature selection and the cross network is proposed. The SENet network is employed to filter out unimportant features before feature interaction, enabling the extraction of more valuable interaction information. On this basis, parallel cross network and deep neural network are utilized to capture explicit and implicit feature interactions. Additionally, low-rank techniques are introduced in the cross network, transforming weight vectors into low-rank matrices to maintain model performance and reduce model training costs. Comparative experiments on the datasets of MovieLens-1M and Criteo demonstrate that the proposed recommendation model is significantly superior to other models in terms of AUC metrics, which proves the effectiveness of the proposed recommendation model.
2024, 33(12):106-114. DOI: 10.15888/j.cnki.csa.009718 CSTR: 32024.14.csa.009718
Abstract:The end-to-end Transformer model based on the self-attention mechanism shows superior performance in speech recognition. However, this model has limitations in capturing local feature information during shallow processing and does not fully consider the interdependence between different blocks. To address these issues, this study proposes Conformer-SE, an improved end-to-end model for speech recognition. The model first adopts the Conformer structure to replace the encoder in the Transformer model, thus enhancing its ability to extract local features. Next, by introducing the SE channel attention mechanism, it integrates the output of each block into the final output through a weighted sum. The experimental results on the Aishell-1 dataset show that the Conformer-SE model reduces the character error rate by 18.18% compared to the original Transformer model.
MA Yu-Bo , ZHOU Chang-Dong , ZHANG Zhi-Wen , YANG Pei-Ze , ZHANG Bo
2024, 33(12):115-122. DOI: 10.15888/j.cnki.csa.009705 CSTR: 32024.14.csa.009705
Abstract:Multi-agent collaboration plays a crucial role in the field of reinforcement learning, focusing on how agents cooperate to achieve common goals. Most collaborative multi-agent algorithms emphasize the construction of collaboration but overlook the reinforcement of individual decision-making. To address this issue, this study proposes an online reinforcement learning model, BiTransformer memory (BTM), which not only considers the collaboration among multiple agents but also uses a memory module to assist individual decision-making. The BTM model is composed of a BiTransformer encoder and a BiTransformer decoder, which are utilized to improve individual decision-making and collaboration within the multi-agent system, respectively. Inspired by human reliance on historical decision-making experience, the BiTransformer encoder introduces a memory attention module to aid current decisions with a library of explicit historical decision-making experience rather than hidden units, differing from the conventional RNN-based method. Additionally, an attention fusion module is proposed to process partial observations with the assistance of historical decision experience, to obtain the most valuable information for decision-making from the environment, thereby enhancing the decision-making capabilities of individual agents. In the BiTransformer decoder, two modules are proposed: a decision attention module and a collaborative attention module. They are used to foster potential cooperation among agents by considering the collaborative benefits between other decision-making agents and the current agent, as well as partial observations with historical decision-making experience. BTM is tested in multiple scenes of StarCraft, achieving an average win rate of 93%.
LI Zhi-Jie , MI De-Yuan , LI Chang-Hua , ZHANG Jie , DONG Wei
2024, 33(12):123-130. DOI: 10.15888/j.cnki.csa.009692 CSTR: 32024.14.csa.009692
Abstract:Currently, super-resolution reconstruction technology is applied in various fields. However, digital elevation model (DEM) reconstruction presents numerous challenges. To address the issues of detail loss and distortion caused by inadequate utilization of complex terrain features in DEM, this study proposes a deep residual frequency-adaptive DEM super-resolution reconstruction model. The model consists of multiple high and low-frequency feature extraction modules forming a residual network structure, enhancing the overall perception of DEM features. Additionally, a frequency selection feature extraction module is integrated to improve the identification and capture of complex terrain features. The model also incorporates atrous spatial pyramid pooling, which merges multi-scale information to enhance reconstruction quality and retain detailed terrain features and structures. Final super-resolution reconstruction is completed under dual constraints in the gradient and height domains. Experimental results demonstrate that using elevation maps of the Qinling Mountains in Shaanxi with two different accuracies as test data, the deep residual frequency-adaptive DEM super-resolution model outperforms other advanced models across various metrics. Reconstructed DEMs exhibit richer details and clearer textures.
ZHANG Lu , WEI Ben-Chang , WEI Hong-Ao , ZHOU Long-Gang
2024, 33(12):131-140. DOI: 10.15888/j.cnki.csa.009684 CSTR: 32024.14.csa.009684
Abstract:Underwater target detection has practical significance in ocean exploration. This study proposes a FERT-DETR network suitable for underwater target detection to address the issues of complex underwater environments and limited target feature extraction due to occlusion and overlap. The proposed model first introduces a feature extraction module, Faster EMA, to replace the BasicBlock of ResNet18 in RT-DETR, which can significantly improve its capability to extract features of underwater targets while effectively reducing the number of parameters and depth of the model. Secondly, a cascaded group attention module, AIFI-CGA, is used in the encoding part to reduce computational redundancy in multi-head attention and improve attention diversity. Finally, a feature pyramid for high-level filtering named HS-FPN is used to replace CCFM, achieving multi-level fusion and improving the accuracy and robustness of detection. The experimental results show that the proposed algorithm, FERT-DETR, improves detection accuracy by 3.1% and 1.7% compared to RT-DETR on the URPC2020 and DUO datasets respectively, compresses the number of parameters by 14.7%, and reduces computational complexity by 9.2%. It can effectively avoid missed and false detection of targets of different sizes in complex underwater environments.
LIU Yun , ZOU Fu-Min , CAI Qi-Qin , LI Jun-Qing , ZHONG Ji-Xiong
2024, 33(12):141-152. DOI: 10.15888/j.cnki.csa.009715 CSTR: 32024.14.csa.009715
Abstract:To address the problem of low QR code reading rates caused by complex environments and changes in shooting angles during QR code detection, this study proposes an algorithm for correcting and recognizing deformed QR codes based on an improved YOLOv8n-Pose algorithm. First, the efficient channel attention (ECA) module is introduced into the backbone network. This module achieves cross-channel interaction without dimensionality reduction, effectively enhancing the feature extraction capabilities and detection accuracy of the network. Secondly, the Slim-neck architecture is adopted to reconstruct the neck network, reducing model complexity and improving the detection capability for QR codes of different scales. Finally, detected QR code corner points are used for correction through inverse perspective transformation, and the corrected QR codes are read using the ZBar algorithm. Experimental results show that, on a public QR code dataset, the improved algorithm increases mAP50 and mAP50-95 by 1.6% and 1.1%, respectively, compared to the original algorithm. Model parameters and computational costs are reduced by 6.5% and 9.5%, respectively. Detection speed on CPU and GPU is improved by 0.3 f/s and 0.7 f/s, reaching 14.2 f/s and 59.6 f/s, respectively, meeting the requirements for efficient detection of QR code corner points. In addition, on a custom-made dataset of deformed QR codes, the proposed method based on the improved YOLOv8n-Pose algorithm enhances the QR code reading rate by 23.66% compared to the standalone ZBar algorithm, achieving a recognition rate of 87.41%. This method only requires one photo to recognize all the information about the goods, which can effectively improve the efficiency of goods management.
JI Tian-Jie , ZHENG Liao-Mo , CAO Ke-Rang , WANG Shi-Yu , ZHOU Song-Jie
2024, 33(12):153-160. DOI: 10.15888/j.cnki.csa.009710 CSTR: 32024.14.csa.009710
Abstract:With the continuous development of industrial automation, the three-dimensional reconstruction technology of workpieces is playing an increasingly important role in the manufacturing industry. In actual working environments, there is a common problem of stacking workpieces, which significantly impacts subsequent work including robot recognition and grasping. Currently, it is hard for 3D reconstruction to extract image feature points and achieve accurate feature registration in workpieces with weak textures. To address the above issues, this study proposes a 3D reconstruction method for stacked workpieces based on deep learning with multi-view stereo matching. Firstly, multiple images from different perspectives are input through a DCNv2-based feature pyramid network for feature extraction. Then, homography transformation is performed to construct cost volumes, and a unified cost volume is obtained through variance aggregation. In the regularization section of the cost volume, an SE channel attention module is introduced to improve the feature expression ability of the network and enhance the performance and generalization ability of the model. This method exhibits good performance on the Danish Technical University (DTU) dataset. The point cloud model of stacked workpieces generated by this method is of great significance for future applications of industrial automation.
ZHANG Rui-Xuan , ZHAO Yu-Feng , XU Fei , YU Ting-Ting , ZHANG Le-Yi
2024, 33(12):161-169. DOI: 10.15888/j.cnki.csa.009720 CSTR: 32024.14.csa.009720
Abstract:Model quantization is widely used for fast inference and deployment of deep neural network models. Post-training quantization has attracted much attention from researchers due to its reduced retraining time and low performance loss. However, most existing post-training quantization methods rely on theoretical assumptions or use fixed bit-width allocations for network layers during the quantization process, which results in significant performance loss in the quantized network, especially in low-bit scenarios. To improve the accuracy of post-training quantized network models, this study proposes a novel post-training mixed-accuracy quantization method (MSQ). This method estimates the accuracy of each layer of the network by inserting a task predictor module, which incorporates the pyramid pooling module and weight imprinting, after each layer of the network model. With the estimations, it assesses the importance of each layer of the network and determines the quantization bit-width of each layer based on the assessment. Experiments show that the MSQ algorithm proposed in this study outperforms some existing mixed-accuracy quantization methods on several popular network architectures, and the quantized network model tested on edge hardware devices shows better performance and lower latency.
LI Chun-Lei , RUAN Yi-Ming , ZHANG Xiao-Ming , WANG Hong-Miao , WANG Ming-Jie
2024, 33(12):170-176. DOI: 10.15888/j.cnki.csa.009694 CSTR: 32024.14.csa.009694
Abstract:This study proposes a method for pointer instrument reading recognition based on YOLOv8 and an improved UNet++ to solve the problem of low reading recognition accuracy caused by complex backgrounds and multiple rotational angles in images of substation meters. YOLOv8 is utilized to detect the instrument area, and perspective transformation is used for rotation correction. The improved UNet++, enhanced by a polarized self-attention module, is utilized to segment dial images to extract scales and pointer regions. After the pointer line is extracted, the instrument reading is computed using the angle method. Experimental results indicate that the proposed method achieves an average citation error of 1.82% in identifying instrument readings. The method has superior recognition accuracy and is feasible for application in the intelligent inspection of pointer instruments in substations.
QI Jing , LI Zi-Rong , LIU Xiu-Ting , MA Lu , CHEN Jun-Hao
2024, 33(12):177-184. DOI: 10.15888/j.cnki.csa.009704 CSTR: 32024.14.csa.009704
Abstract:The AI diagnostic model based on deep learning relies heavily on high-quality detailed annotated data for algorithm training, but is affected by label noise information. To enhance the robustness of the model and prevent noisy label memory, a noise label sample selection (NLSS) model is proposed to fully mine the hidden information of noise samples and alleviate model overfitting. Firstly, distributed feature representations of the image are extracted by taking hybrid enhanced images as input. Secondly, the contrasive loss function is introduced to compare the similarity between the predicted label distribution of the sample and the real label distribution for sample evaluation and selection. Finally, based on sample selection, supervised information of the noisy label is re-corrected by the pseudo-label promotion strategy of the label redistribution module. Taking the PET/CT dataset of non-small cell lung cancer (NSCLC) patients as an example, results show that the proposed models outperform comparison models, reducing the interference of label noise in the diagnosis of lymph node metastasis.
2024, 33(12):185-196. DOI: 10.15888/j.cnki.csa.009675 CSTR: 32024.14.csa.009675
Abstract:As the demand for unmanned aerial vehicle (UAV) applications continues to expand, the design of disturbance rejection controllers which aim to ensure that UAVs can complete designated tasks as required has received significant attention. Traditional control algorithms widely used currently exhibit good stability but poor disturbance rejection capability. To address this issue, a hybrid disturbance rejection controller based on an improved twin delayed deep deterministic (TD3) policy gradient algorithm is proposed. This method utilizes nonlinear model predictive control (NMPC) as the base controller and introduces a disturbance compensator based on improved TD3 for hybrid control. This approach combines the advantages of the NMPC controller as well as addresses the shortcomings in disturbance rejection of traditional control algorithms. This study introduces a multi-head attention (MA) mechanism and long short-term memory (LSTM) network into the Actor network of TD3, enhancing TD3’s ability to capture spatial management information and temporal correlation information. Additionally, a continuous logarithmic reward function is introduced to improve training stability and convergence speed, and training is conducted using random task scenarios with random disturbances to enhance model generalization. In experiments, the NMPC-MALSTM-TD3 architecture is compared with architectures using DDPG, SAC, TD3, and PPO algorithms as disturbance compensators. Experimental results demonstrate that the NMPC-MALSTM-TD3 architecture exhibits the most excellent disturbance rejection capabilities and a smaller influence on the stability and real-time performance of NMPC.
2024, 33(12):197-209. DOI: 10.15888/j.cnki.csa.009687 CSTR: 32024.14.csa.009687
Abstract:Deep reinforcement learning algorithms are more and more widely used in UAV trajectory planning tasks, but many studies do not consider complex scenarios of random changes. To address the above problems, this study proposes an improved PP-CMNTD3 algorithm based on TD3, which puts forward a simple and effective prior strategy and draws on the idea of artificial potential fields to design dense rewards. UAVs are better guided to effectively avoid obstacles and swiftly approach target points. Simulation results show that the algorithm improvement can effectively improve the training efficiency of the network and the trajectory planning performance in complex scenarios. At the same time, the strategy can be flexibly adjusted under different initial power levels, achieving an effective balance between energy consumption and rapid arrival at the destination.
WANG Hui-Jing , YUAN Peng-Cheng
2024, 33(12):210-221. DOI: 10.15888/j.cnki.csa.009726 CSTR: 32024.14.csa.009726
Abstract:In crowdsourcing platforms, orders have different types (takeaway and express orders), while delivery riders are typically responsible for only one type of order (either takeaway or express delivery). Additionally, the existing delivery mechanism rarely meets the satisfaction of merchants and customers. Therefore, considering the heterogeneity of riders in a dispatch mode, this study introduces the concept of all-round riders, dividing riders into three categories: takeaway riders express riders, and all-round riders. According to the differences in the types of orders that riders can serve, a cost function based on a fuzzy time window is constructed to represent the satisfaction of merchants and customers with the time when riders arrive at pick-up and delivery points. The satisfaction is then transformed into a time penalty function. A model is constructed to minimize time penalty costs, route driving costs and personnel operation costs. Considering the characteristics of the model and the limitations of traditional algorithms, this study designs a hybrid algorithm combining genetic algorithms and search algorithms in large domains. Then, the simulated annealing algorithm, genetic algorithms, and hybrid algorithm are used to solve the problem respectively through concrete examples. The analysis of the optimization results of different algorithms validates the feasibility and effectiveness of the proposed model and the improved algorithm. Experimental results show that considering the heterogeneity of riders and the satisfaction of merchants and customers during crowdsourcing delivery not only effectively improves their satisfaction but also reduces delivery costs and improves delivery efficiency for crowdsourcing platforms. This strategy offers a reference for crowdsourcing platforms in formulating delivery strategies.
PAN Xian-Shan , WANG Zheng-Yong , LUO Bin-Bin , TENG Qi-Zhi , HE Xiao-Hai
2024, 33(12):222-230. DOI: 10.15888/j.cnki.csa.009716 CSTR: 32024.14.csa.009716
Abstract:Rock debris recognition is an important tool in geological exploration and logging. To improve the efficiency of traditional manual lithology identification and overcome the challenges of slow inference and high computational complexity in common deep learning networks, this study proposes DAF-STDC, a real-time semantic segmentation network for rock debris images based on a well-performing STDC network model. The network uses dilated convolution to maintain resolution while extracting features and utilizes an attention mechanism to help the model acquire global information from the feature map, thus refining the edge information of rock debris particles. It also uses a feature fusion module to enhance the fusion of low-level detail features and high-level semantic features, improving feature representation. Experiments have proved that the improved network model significantly enhances accuracy. The mean intersection over union of DAF-STDC reaches 83.12% on the RC_Dataset which consists of six types of rock debris images collected from exploratory wells. While maintaining the number of parameters, DAF-STDC significantly improves its inference speed and segmentation accuracy, providing an effective reference for the digitization of rock debris logging.
2024, 33(12):231-239. DOI: 10.15888/j.cnki.csa.009728 CSTR: 32024.14.csa.009728
Abstract:Remaining time prediction helps enterprises improve the quality and efficiency of business process execution. Although existing deep learning methods have shown improvement in remaining time prediction, they still face challenges when dealing with complex business processes. These challenges include insufficient utilization of time features and limited ability to extract local features, leaving room for improvement in prediction accuracy. This study proposes a remaining time prediction method based on the improved Transformer encoder model. Existing methods ignore event time features and struggle to capture local dependencies. To address these limitations, this study introduces a time feature encoding module and a local dependency enhancement module into the model. The time encoding module constructs a semantically rich and discriminative event time representation by embedding learning and multi-granularity concatenation. The local dependency enhancement module uses convolutional neural networks to extract fine-grained local features from the trajectory prefix after processing with the Transformer encoder. Experiments show that integrating time features and local dependency enhancement improves the prediction accuracy of the remaining time for complex business processes.
CHEN Guan-Hao , PAN Guang-Zhen
2024, 33(12):240-247. DOI: 10.15888/j.cnki.csa.009696 CSTR: 32024.14.csa.009696
Abstract:The rapid growth of security inspection demand drives the development of intelligent security inspection technology. Due to the unique characteristics of X-ray images, detecting small contraband items is challenging. This study proposes an improved YOLOv8s network for contraband recognition to address this issue. Firstly, the Focal L1 Loss function is introduced to enhance CIoU and optimize the position and aspect ratio of prediction boxes to improve the network’s ability to identify contraband items. Improved deformable convolution is added to the shallow backbone network to capture features of contraband items in different directions. LSKA is incorporated into the SPPF module to expand the network’s receptive field, while the Swin-CS module captures global information and supplements dimensional interaction. Finally, three stacked attention blocks are used for processing, enhancing the network’s sensitivity towards small targets. The improved network achieves an average precision mean of 96.1% on the SIXray dataset, a 5.4% improvement over YOLOv8s with mAP50-95 reaching 0.682, a 4.5% increase. Experimental results indicate that the proposed model can accurately generate prediction boxes, effectively handle contraband detection in complex scenarios, and validate algorithm effectiveness.
LIU Yuan-Yuan , CHEN Lu , LU Feng , YE Yang , AN Yu-Tong , JIN Ming-Hui , XING Kai-Yuan , ZENG Guang
2024, 33(12):248-255. DOI: 10.15888/j.cnki.csa.009690 CSTR: 32024.14.csa.009690
Abstract:This study proposes a model called E2E-DRNet to address issues in manual diabetic retinopathy (DR) diagnosis, including poor classification performance, laborious processes, minimal differences in grades of retinal images, and inconspicuous lesions. This model is based on EfficientNetV2 and incorporates the efficient channel attention (ECA) module. By processing and optimizing a DR dataset, the Focal Loss function is introduced to address sample imbalance. The model achieves refined DR classification through two stages. Experimental results demonstrate that the proposed model performs well on both public and clinical datasets. Additionally, it enhances the interpretability of lesion regions in fundus images, thereby improving the efficiency of DR lesion screening and overcoming the limitations of manual diagnosis.
MENG Xian , TIAN Yong , LI Jiang-Chen
2024, 33(12):256-263. DOI: 10.15888/j.cnki.csa.009707 CSTR: 32024.14.csa.009707
Abstract:Flight segments and waypoints are crucial for the normal operation of a route network. A network’s resistance to disruptions can be enhanced by correctly identifying key flight segments and waypoints and analyzing the correlation between various indicators and the importance of these segments or waypoints. To address the weak resistance of the route network to unexpected situations, both static and dynamic indicators are considered in this study. Using the entropy weight method, the weights of these indicators are determined based on their intrinsic fluctuations. Then, the technique for order preference by similarity to ideal solution is applied to calculate the optimal and worst solutions for the edges, so as to obtain comprehensive scores for each flight segment and waypoint. Further, analysis is conducted on the correlation among indicators, as well as the correlation between indicators and the comprehensive scores of flight segments or waypoints. The results show that while the indicators are independent of each other, their correlation with the scores of the flight segments or waypoints is high. This conclusion provides a basis for improving the route network structure.