Abstract: To avoid eye image disappearance and inaccurate head pose estimation during image capture, a non-contact method for acquiring eye information is employed to collect facial images, determining the pilot’s current gaze direction from a single image frame. Concurrently, considering the poor classification of current networks due to the neglect of visual obstruction caused by head movements, with a combination of facial images and head poses, a multimodal data fusion network for the pilot’s gaze region classification is proposed using an improved MobileVit model. Firstly, a multi-modal data fusion module is introduced to address the problem of overfitting resulting from size imbalances during feature concatenation. Additionally, an inverse residual block based on a parallel branch SE mechanism is proposed to fully leverage spatial and channel feature information in the shallow layers of the network. Moreover, multi-scale features are captured by integrating the global attention mechanism from the Transformer. Finally, the Mobile Block structure is redesigned and the depthwise separable convolution is utilized to reduce model complexity. Experimental comparisons with mainstream baseline models are conducted using a self-made dataset FlyGaze. The results demonstrate that the PilotT model achieves classification accuracies exceeding 92% for gaze regions 0, 3, 4, and 5, with robust adaptability to facial deflection. These findings hold practical significance for enhancing flight training quality and facilitating pilot intention recognition and fatigue assessment.
Abstract: In multi-object tracking tasks, the interference of external noise can lead to unreliable system modeling of traditional methods, thus reducing the accuracy of object position prediction; and the congestion and obstruction caused by dense crowds seriously affect the reliability of the object appearance, resulting in incorrect identity association. To address these issues, this study proposes a multi-object tracking algorithm Ecsort. This algorithm improves position prediction accuracy by introducing a noise compensation module based on traditional motion prediction to reduce errors caused by noise interference. Secondly, this algorithm introduces a feature similarity matching module. It can achieve accurate identity association by learning discriminative appearance features of objects and combining the advantages of motion cues and discriminative appearance features. Extensive experimental results on multi-object tracking benchmark datasets demonstrate that, compared to the baseline model, this method improves ID F1 score (IDF1), higher order tracking accuracy (HOTA), association accuracy (AssA), and detection accuracy (DetA) by 1.1%, 0.5%, 0.6%, and 0.3% respectively on the MOT17 test set, and by 2.3%, 1.9%, 3.4%, and 0.2% respectively on the MOT20 test set.
Abstract: An optimized bilinear structure based on ResNet34, termed OBSR-Net, is proposed for more accurate and quick facial expression recognition. OBSR-Net adopts a bilinear network structure as its overall framework and incorporates ResNet34 as the backbone network to model the local paired feature interaction by translation invariance, to extract more complete and effective features. At the same time, transfer learning mitigates the limitations imposed by small sample image data sets of facial expressions on deep learning. In addition, gradient concentration, a new general optimization technique, is utilized during the training process. This technique operates directly on gradients by concentrating gradient vectors to zero mean, which can be regarded as a projected gradient descent method with a constrained loss function. Experiments on two public datasets, namely Fer2013 and CK+, reveal that OBSR-Net achieves recognition accuracy of 77.65% and 98.82%, respectively. The experimental results show that OBSR-Net is more competitive than other advanced facial expression recognition methods.
Abstract: Compared with centralized cloud computing frameworks, edge computing deploys additional “edge servers” between a cloud center and on-site intelligent devices to support those devices to quickly and efficiently complete computing tasks and event processing. In an edge computing system, there are a large number of on-site intelligent devices and heterogeneous edge computing servers. Also, stored data is sensitive and requires high privacy. These characteristics of edge computing systems make it difficult to ensure network security. Solving information and network security of edge computing systems is the key to the large-scale industrialization of edge computing technology. However, due to the limitations of computing capacity, network capacity, and storage capacity of edge server devices and on-site intelligent devices, traditional computer network security technology may not fully meet the requirements. Analyzing effective sensitive data protection technologies suitable for edge computing systems, such as federated learning, lightweight encryption, confused and virtual location information, and anonymous identity authentication, and exploring new technologies such as artificial intelligence and blockchain to prevent malicious attacks in edge computing will greatly promote the industrial development of edge computing.
Abstract: The development of deep learning technology invites most research to consider short-term precipitation nowcasting as a prediction task of radar echo sequences. Due to the nonlinear spatiotemporal transformations involved in the complexity of precipitation, existing short-term nowcasting methods have problems such as low accuracy, short extrapolation time, and difficulty in dealing with complex nonlinear spatiotemporal transformations. To address these issues, this study proposes an S-UNet short-term precipitation forecasting network based on U-Net and LSTM. Firstly, the study introduces the S-UNet layer (SL) module to help the network better extract radar sequence features and construct the overall trend of spatiotemporal changes, thereby improving the network efficiency and increasing the extrapolation duration. Secondly, to better address the complexity of radar echo deformation, accumulation, and dissipation, and to enhance the network’s ability to capture complex spatial relationships and simulate movement trajectories, this study constructs the radar feature (RF) module based on LSTM. Finally, by combining the SL module and the RF module with the U-Net framework, the S-UNet short-term precipitation nowcasting network is proposed, achieving remarkable performance on the KNMI dataset. Experimental results show that, compared with the mainstream methods, on the KNMI’s NL-50 and NL-20 datasets, the proposed method improves the Heidke skill score (HSS) and critical success index (CSI) by 5.25% (6.57%) and 2.17% (4.75%) respectively, reaching 0.30(0.29) and 0.72(0.58); the accuracy increases by 2.10% (1.35%), reaching 0.80 (0.80); and the false acceptance rate decreases by 4.27% (1.80%), dropping to 0.24 (0.38). Additionally, the effectiveness of the proposed modules and their combination methods are verified through ablation experiments.
Abstract: The quality of steel surface defect inspection directly affects industrial production safety and machine performance. However, in real factories, steel quality control is limited by equipment conditions, making it challenging to achieve high-precision and real-time inspection. To solve this problem, a lightweight YOLOv8n detection algorithm with multi-scale fusion is proposed. Firstly, a lightweight multi-scale fusion backbone network (RepHGnetv2) is introduced, combining HGnetv2 and RepConv to improve the feature extraction and generalization capabilities of Backbone and reduce the complexity of the model. In the Head part, the ordinary convolution of the original algorithm is replaced with the ADown downsampling module, which reduces computational complexity and improves semantic retention. Finally, the loss function of the original algorithm is replaced by SlideLoss to address sample imbalance. Ablation and comparison experiments are conducted on the NEU-DET dataset. Compared with the original algorithm, the improved algorithm increases precision by 9.3%, reduces the model size by 25.5%, decreases computational complexity by 17.2%, and improves FPS to a certain extent. Comparative experiments are conducted on the VOC2012 dataset to evaluate the generalizability of the improved algorithm, and the results show that the improved algorithm exhibits strong generalizability and effectively improves the accuracy and efficiency of defect detection.
Abstract: With the rapid development and application of Artificial Intelligence and the Internet of Things (AIoT), new challenges are posed to the network’s useful life, reliability, and coverage. The current wireless sensor network (WSN) consists of a large number of self-organizing sensor nodes deployed in monitoring areas, exhibiting advantages such as low cost, energy efficiency, self-organization, and large-scale deployment. However, how to further extend the network life and enhance the coverage reliability of wireless sensor networks remains a primary challenge in current research. To address these challenges, a coverage reliability assessment model is proposed by integrating the backbone network with coverage models, collaborative sensing of sensor nodes, and spatial correlation. Subsequently, a coverage reliability optimization algorithm based on the confident information coverage model is proposed. On one hand, the algorithm utilizes the confident information coverage model to ensure collaborative sensing of data, enhancing network service quality. On the other hand, it employs backbone network optimization for routing to conserve energy consumption. Furthermore, to validate the superiority of the proposed algorithm, sensor multi-states, and coverage rate are taken as evaluation metrics, with RMSE threshold and energy consumption as performance indicators. The proposed algorithm is compared with ACR and CICR algorithms. Finally, a verification model is built on Matlab simulation software. Simulation results demonstrate that the proposed algorithm significantly improves coverage reliability.
Abstract: Acute ischemic stroke is the most common type of stroke in clinical practice. Due to its sudden onset and short treatment time window, it becomes one of the important factors leading to disability and death world wide. With the rapid development of artificial intelligence, deep learning technology shows great potential in the diagnosis and treatment of acute ischemic stroke. Deep learning models can quickly and efficiently segment and detect lesions based on patients’ brain images. This study introduces the development history of deep learning models and commonly used public datasets for stroke research. For various modalities and scanning sequences derived from computerized tomography (CT) and magnetic resonance imaging (MRI), it elaborates on the research progress of deep learning technology in the field of lesion segmentation and detection in acute ischemic stroke and summarizes and analyzes the improvement ideas of related research. Finally, it points out existing challenges of deep learning in this field and proposes possible solutions.
Abstract: Existing few-shot relational triple extraction methods often struggle with handling multiple triples in a single sentence and fail to consider the semantic similarity between the support set and the query set. To address these issues, this study proposes a few-shot relational triple extraction method based on module transfer and semantic similarity inference. The method uses a mechanism that constantly transfers among three modules, namely relation extraction, entity recognition, and triple discrimination, to extract multiple relational triples efficiently from a query instance. In the relation extraction module, BiLSTM and a self-attention mechanism are integrated to better capture the sequence information of the emergency plan text. In addition, a method based on semantic similarity inference is designed to recognize emergency organizational entities in sentences. Finally, extensive experiments are conducted on ERPs+, a dataset for emergency response plans. Experimental results show that the proposed model is more suitable for relational triple extraction in the field of emergency plans compared with other baseline models.
Abstract: In the current electricity market, the volume of daily spot market clearing data has reached millions or tens of millions. With the increase in trading activities and the complexity of the market structure, ensuring the integrity, transparency, and traceability of trading data has become a key issue to be studied in the field of market clearing in China. Therefore, this study proposes a data provenance method for power market clearing based on the PROV model and smart contracts, aiming to automate the storage and updating of provenance information through smart contracts to improve the transparency of the clearing process and the trust of the participants. The proposed method utilizes the elements of entities, activities, and agents in the PROV model, combined with the hierarchical storage and immutability of blockchain technology, to record and track trading activities and rule changes in the electricity market. The method not only enhances data transparency and trust among market participants but also optimizes data management and storage strategies, reducing operational costs. In addition, the method provides proof of compliance for power market clearing, helping market participants meet increasing regulatory requirements.
Abstract: In recent years, with the development of deep learning techniques, convolutional neural network (CNN) and Transformers have made significant progress in image super-resolution. However, for the extraction of global features of an image, it is common to stack individual operators and repeat the computation to gradually expand the receptive field. To better utilize global information, this study proposes that local, regional, and global features should be explicitly modeled. Specifically, local information, regional-local information, and global-regional information of an image are extracted and fused hierarchically and progressively through channel attention-enhanced convolution, a dual-branch parallel architecture consisting of a window-based Transformer and CNN, and a dual-branch parallel architecture consisting of a standard Transformer and a window-based Transformer. In addition, a hierarchical feature fusion method is designed to fuse the local information extracted from the CNN branch and the regional information extracted from the window-based Transformer. Extensive experiments show that the proposed network achieves better results in lightweight SR. For example, in the 4× upscaling experiments on the Manga109 dataset, the peak signal-to-noise ratio (PSNR) of the proposed network is improved by 0.51 dB compared to SwinIR.
Abstract: Cigarette laser code recognition is an important tool for tobacco inspection. This study proposes a method for recognizing cigarette codes based on a dual-state asymmetric network. Insufficient training on samples of distorted cigarette codes leads to the weak generalization ability of the model. To address this issue, a nonlinear local augmentation (NLA) method is designed, which generates effective training samples with distortion to enhance the generalization ability of the model through spatial transformation using controllable datums at the edges of cigarette codes. To address the problem of low recognition accuracy due to the similarity between cigarette codes and their background patterns, a dual-state asymmetric network (DSANet) is proposed, which divides the convolutional layers of the CRNN into training and deployment modes. The training mode enhances the key feature extraction capability of the model by introducing asymmetric convolution for optimizing feature weight distribution. For real-time performance, the deployment mode designs BN fusion and branch fusion methods. By calculating fusion weights and initializing convolutional kernels, convolutional layers are equivalently converted back to their original structures, which reduces user-side inference time. Finally, a self-attention mechanism is introduced into the loop layer to enhance the extraction capability of the model for cigarette code features by dynamically adjusting the weights of sequence features. Comparative experiments show that this method has higher recognition accuracy and speed, with the recognition accuracy reaching 87.34%.
Abstract: A lesion of the sacroiliac joint is one of the primary signs for the early warning of ankylosing spondylitis. Accurate and efficient automatic segmentation of the sacroiliac joint is crucial for assisting doctors in clinical diagnosis and treatment. The limitations in feature extraction in sacroiliac joint CT images, due to diverse gray levels, complex backgrounds, and volume effects resulting from the narrow sacroiliac joint gap, hinder the improvement of segmentation accuracy. To address these problems, this study proposes the first U-shaped network for sacroiliac joint segmentation diagnosis, utilizing the concept of hierarchical cascade compensation for downsampling information loss and parallel attention preservation of cross-dimensional information features. Moreover, to enhance the efficiency of clinical diagnosis, the traditional convolutions in the U-shaped network are replaced with efficient partial convolution blocks. The experiment, conducted on a sacroiliac joint CT dataset provided by Shanxi Bethune Hospital, validates the effectiveness of the proposed network in balancing segmentation accuracy and efficiency. The network achieves a DICE value of 91.52% and an IoU of 84.41%. The results indicate that the improved U-shaped segmentation network effectively enhances the accuracy of sacroiliac joint segmentation and reduces the workload of medical professionals.
Abstract: This study proposes a lightweight apple detection algorithm based on an improved YOLOv8n model for apple fruit recognition in natural orchard environments. Firstly, the study uses a combination of DSConv and FEM feature extraction modules to replace some regular convolutions in the backbone network for lightweight improvements. In this way, the floating-point numbers and computational quantity during the convolution process can be reduced. To maintain performance during the lightweight process, a structured state space model is introduced to construct the CBAMamba module, which efficiently processes features through the Mamba structure, during the feature processing procedure. Subsequently, the convolutions at the detecting head are replaced with RepConv and the convolution layer is reduced. Finally, the bounding box loss function is changed to the dynamic non-monotonic focusing mechanism WIoU to accelerate model convergence and further enhance model detection performance. The experiments show that, on the public dataset, the improved YOLOv8 algorithm outperforms the original YOLOv8n algorithm by 1.6% in mAP@0.5 and 1.2% in mAP@0.5:0.95. Meanwhile, it also increases FPS by 8.0% and reduces model parameters by 13.3%. The lightweight design makes it highly practical in robotics and embedded system deployment fields.
Abstract: Aiming at the problems that mechanical equipment signals in actual operation are susceptible to noise interference, making it difficult to accurately extract fault features, and that the information from a single position of the equipment cannot fully reflect operational status, this study proposes an improved spatio-temporal fault classification method of signal adaptive decomposition and multi-source data fusion. Firstly, an improved signal adaptive decomposition algorithm named signal adaptive variational mode decomposition (SAVMD) is proposed, and a weighted kurtosis sparsity index named weighted kurtosis sparsity (WKS) is constructed to filter out intrinsic mode function (IMF) components rich in feature information for signal reconstruction. Secondly, multi-source data from different position sensors are fused, and the data set obtained by periodic sampling is used as the input of the model. Finally, a spatio-temporal fault classification model is built to process multi-source data, which reduces noise interference through an improved sparse self-attention mechanism and effectively processes time step and spatial channel information by using a dual-encoder mechanism. Experiments on three public mechanical equipment fault datasets achieve average accuracy rates of 99.1%, 98.5%, and 99.4% respectively. Compared with other fault classification methods, it has better performance, good adaptability and robustness, and provides a feasible method for fault diagnosis of mechanical equipment.
Abstract: Unmanned aerial vehicle (UAV) is equipped with an edge server to constitute a mobile edge server. It can provide computing services for user equipment (UE) in some scenarios where base stations are difficult to deploy. With the help of deep reinforcement learning to train the intelligent body, it can formulate reasonable offloading decisions in a continuous and complex state space. It can also offload partial computing-intensive missions produced by users to edge servers for execution, thus improving the working and responding time of the system. However, at the moment, the fully connected neural networks used by the deep reinforcement learning algorithm are unable to handle the time-series data in the scenarios of UAV-assisted mobile edge computing (MEC). In addition, the training efficiency of the algorithm is low, and the decision-making performance is poor. To address the above problems, this study proposes a twin delayed deep deterministic policy gradient algorithm based on long short term memory (LSTM-TD3), using LSTM to improve the Actor-Critic network structure of the TD3 algorithm. In this way, the network is divided into three parts: the memory extraction unit containing LSTM, the current feature extraction unit, and the perceptual integration unit. Besides, the sample data in the experience pool are improved, and the historical data are defined, which provides the memory extraction unit with a better training effect. Simulation results show that, compared with the AC algorithm, the DQN algorithm, and the DDPG algorithm, the LSTM-TD3 algorithm has the best performance when optimizing the offloading strategy with the minimum total delay of the system as the target.
Abstract: In autonomous driving, the task of using bird’s eye view (BEV) for 3D object detection has attracted significant attention. Existing camera-to-BEV transformation methods are facing challenges of insufficient real-time performance and high deployment complexity. To address these issues, this study proposes a simple and efficient view transformation method that can be deployed without any special engineering operations. First, to address the redundancy in complete image features, a width feature extractor is introduced and supplemented by a monocular 3D detection task to refine the key features of the image. In this way, the minimal information loss in the process can be ensured. Second, a feature-guided polar coordinate positional encoding method is proposed to enhance the mapping relationship between the camera view and the BEV representation, as well as the spatial understanding of the model. Lastly, the study has achieved the interaction between learnable BEV embeddings and width image features through a single-layer cross-attention mechanism, thus generating high-quality BEV features. Experimental results show that, compared to lift, splat, shoot (LSS), on the nuScenes validation set, this network structure improves mAP from 29.5% to 32.0%, an increase of 8.5%, and NDS from 37.1% to 38.0%, an increase of 2.4%. This demonstrates the effectiveness of the model in 3D object detection tasks in autonomous driving scenarios. Additionally, compared to LSS, it reduces latency by 41.12%.
Abstract: This study introduces a knee cartilage segmentation method based on semi-supervised learning and conditional probability, to address the scarcity and quality issues of annotated samples in medical image segmentation. As it is difficult for existing embedded deep learning models to effectively model the hierarchical relationships among network outputs, the study proposes an approach combining conditional-to-unconditional mixed training and task-level consistency. In this way, the hierarchical relationships and relevance among labels are efficiently utilized, and the segmentation accuracy is enhanced. Specifically, the study employs a dual-task deep network predicting both pixel-level segmentation images and geometric perception level set representations of the target. The level set is shifted into an approximate segmentation map through a differentiable task transformation layer. Meanwhile, the study also introduces task-level consistency regularization between level line-based and directly predicted segmentation maps on labeled and unlabeled data. Extensive experiments on two public datasets demonstrate that this approach can significantly improve performance through the incorporation of unlabeled data.
Abstract: Deformable 3D medical image registration remains challenging due to irregular deformations of human organs. This study proposes a multi-scale deformable 3D medical image registration method based on Transformer. Firstly, the method adopts a multi-scale strategy to realize multi-level connections to capture different levels of information. Self-attention mechanism is employed to extract global features, and dilated convolution is used to capture broader context information and more detailed local features, so as to enhance the registration network’s fusion capacity for global and local features. Secondly, according to the sparse prior of the image gradient, the normalized total gradient is introduced as a loss function, effectively reducing the interference of noise and artifacts on the registration process, and better adapting to different modes of medical images. The performance of the proposed method is evaluated on publicly available brain MRI datasets (OASIS and LPBA). The results show that the proposed method can not only maintain the advantages of the learning-based method in run-time but also well performs in mean square error and structural similarity. In addition, ablation experiment results further prove the validity of the method and normalized total gradient loss function design proposed in this study.
Abstract: Prompt engineering plays a crucial role in unlocking the potential of large language model. This method guides the model’s response by designing prompt instructions to ensure the relevance, coherence, and accuracy of the response. Prompt engineering does not require fine-tuning model parameters and can be seamlessly connected with downstream tasks. Therefore, various prompt engineering techniques have become a research hotspot in recent years. Accordingly, this study introduces the key steps for creating effective prompts, summarizes basic and advanced prompt engineering techniques, such as chain of thought and tree of thought, and deeply explores the advantages and limitations of each method. At the same time, it discusses how to evaluate the effectiveness of prompt methods from different perspectives and using different methods. The rapid development of these technologies enables large language models to succeed in a variety of applications, ranging from education and healthcare to code generation. Finally, future research directions of prompt engineering technology are prospected.
Abstract: This study aims to delve into the joint detection of traffic signs and signals under complex and variable traffic conditions, analyzing and resolving the detrimental effects of harsh weather, low lighting, and image background interference on detection accuracy. To this end, an improved RT-DETR network is proposed. Based on a resource-limited operating environment, this study introduces a network, ResNet with PConv and efficient multi-scale attention (PE-ResNet), as the backbone to enhance the model’s capability to detect occlusions and small targets. To augment the feature fusion capability, a new cross-scale feature-fusion module (NCFM) is introduced, which facilitates better integration of semantic and detailed information within images, offering a more comprehensive understanding of complex scenes. Additionally, the MPDIoU loss function is introduced to more accurately measure the positional relationships among target boxes. The improved network reduces the parameter count by approximately 14% compared to the baseline model. On the CCTSDB 2021 dataset, S2TLD dataset, and the self-developed multi-scene traffic signs (MTST) dataset, the mAP50:95 increases by 1.9%, 2.2%, and 3.7%, respectively. Experimental results demonstrate that the enhanced RT-DETR model effectively improves target detection accuracy in complex scenarios.
Abstract: This study proposes an algorithm for road damage detection based on an improved YOLOv8 to address challenges in road damage detection, including multi-scale targets, complex target structures, uneven sample distribution, and the impact of hard and easy samples on bounding box regression. The algorithm introduces dynamic snake convolution (DSConv) to replace some of the Conv modules in the original faster implementation of CSP bottleneck with 2 convolutions (C2f) module, aiming to adaptively focus on small and intricate local features, thereby enhancing the perception of geometric structures. By incorporating an efficient multi-scale attention (EMA) module before each detection head, the algorithm achieves cross-dimensional interaction and captures pixel-level relationships, improving its generalization capability for complex global features. Additionally, an extra small object detection layer is added to enhance the precision of small object detection. Finally, a strategy termed Flex-PIoUv2 is proposed, which alleviates sample distribution imbalance and anchor box inflation through linear interval mapping and size-adaptive penalty factors. Experimental results demonstrate that the improved model increases the F1 score, mAP50, and mAP50-95 on the RDD2022 dataset by 1.5%, 2.1%, and 1.2%, respectively. Additionally, results on the GRDDC2020 and China road damage datasets validate the strong generalization of the proposed algorithm.
Abstract: To solve the vehicle routing problem with time windows (VRPTW), this study establishes a mixed-integer programming model aimed at minimizing total distance and proposes a hybrid ant colony optimization algorithm with relaxed time window constraints. Firstly, an improved ant colony algorithm, combined with TSP-Split encoding and decoding, is proposed to construct a routing solution that allows time-window constraints to be violated, to improve the global optimization ability of the algorithm. Then, a repair strategy based on variable neighborhood search is proposed to repair infeasible solutions using the principle of return in time and the penalty function method. Finally, 56 Solomon and 12 Homberger benchmark instances are tested. The results show that the proposed algorithm is superior to the comparative algorithms from references. The known optimal solution can be obtained in 50 instances, and quasi-optimal solutions can be obtained in the remaining instances within acceptable computing time. The results prove the effectiveness of the proposed algorithm.
Abstract: Embodied AI requires the ability to interact with and perceive the environment, and capabilities such as autonomous planning, decision making, and action taking. Behavior trees (BTs) become a widely used approach in robotics due to their modularity and efficient control. However, existing behavior tree generation techniques still face certain challenges when dealing with complex tasks. These methods typically rely on domain expertise and have a limited capacity to generate behavior trees. In addition, many existing methods have language comprehension deficiencies or are theoretically unable to guarantee the success of the behavior tree, leading to difficulties in practical robotic applications. In this study, a new method for automatic behavior tree generation is proposed, which generates an initial behavior tree with task goals based on large language models (LLMs) and scene semantic perception. The method in this study designs robot action primitives and related condition nodes based on the robot’s capabilities. It then uses these to design prompts to make the LLMs output a behavior plan (generated plan), which is then transformed into an initial behavior tree. Although this paper takes this as an example, the method has wide applicability and can be applied to other types of robotic tasks according to different needs. Meanwhile, this study applies this method to robot tasks and gives specific implementation methods and examples. During the process of the robot performing a task, the behavior tree can be dynamically updated in response to the robot’s operation errors and environmental changes and has a certain degree of robustness to changes in the external environment. In this study, the first validation experiments on behavior tree generation are carried out and verified in the simulated robot environment, which demonstrates the effectiveness of the proposed method.
Abstract: In the field of visual tracking, most deep learning-based trackers overemphasize accuracy while overlooking efficiency, thereby hindering their deployment on mobile platforms such as drones. In this study, a deep cross guidance Siamese network (SiamDCG) is put forward. To better deploy on edge computing devices, a unique backbone structure based on MobileNetV3-small is devised. Given the complexity of drone scenarios, the traditional method of regressing target boxes using Dirac δ distribution has significant drawbacks. To overcome the blurring effects inherent in bounding boxes, the regression branch is converted into predicting offset distribution, and the learned distribution is used to guide classification accuracy. Excellent performances on multiple aerial tracking benchmarks demonstrate the proposed approach’s robustness and efficiency. On an Intel i5 12th generation CPU, SiamDCG runs 167 times faster than SiamRPN++, while using 98 times fewer parameters and 410 times fewer FLOPs.
Abstract: When firefighting robots are deployed for medium to long-distance emergency tasks in urban areas, they often struggle with the inability to obtain a global prior map of the environment in advance. Consequently, they require manual remote control to reach the fire location, which involves cumbersome operations and significantly reduces firefighting efficiency. To address these issues, this study designs a new autonomous navigation system for firefighting robots in urban areas. This system is based on commercial electronic maps (such as Amap, Baidu Maps, and other 2D electronic maps) and effectively integrates the global navigation satellite system (GNSS) with local laser-based environmental sensing technologies. Firstly, commercial electronic maps are used to plan rough global sub-goal points. The sequence of global goal points is then registered with the actual positioning information and sent to the local planner. Subsequently, local planning tasks are performed within the local grid map established by laser sensing, following the sequence of sub-goal points. The improved local planner updates the sub-goal points dynamically based on real-time environmental changes during movement. Multiple simulations are conducted in a simulated environment, and validation is performed using a tracked vehicle in real-world scenarios. The results indicate that the designed system can accurately execute long-distance outdoor navigation tasks without a global prior map of the environment, providing an efficient and safe solution for the outdoor navigation of firefighting robots.
Abstract: The temperature in knowledge distillation (KD) is set as a fixed value during the distillation process in most previous work. However, when the temperature is reexamined, it is found that the fixed temperature restricts inherent knowledge utilization in each sample. This study divides the dataset into low-energy and high-energy samples based on energy scores. Through experiments, it is confirmed that the confidence score of low-energy samples is high, indicating that predictions are deterministic, while the confidence score of high-energy samples is low, indicating that predictions are uncertain. To extract the best knowledge by adjusting non-target class predictions, this study applies higher temperatures to low-energy samples to generate smoother distributions and applies lower temperatures to high-energy samples to obtain clearer distributions. In addition, to address the imbalanced dependence of students on prominent features and their neglect of dark knowledge, this study introduces entropy-reweighted knowledge distillation, which utilizes the entropy predicted by teachers to reweight the energy distillation loss on a sample basis. This method can be easily applied to other logic-based knowledge distillation methods and achieve better performance, which can be closer or even better than feature-based methods. This study conducts extensive experiments on image classification datasets (CIFAR-100, ImageNet) to validate the effectiveness of this method.
Abstract: The piecewise linear representation algorithm of the time series represents the whole series with fewer points according to trend changes in the series. However, most of these algorithms focus on the information of local sequence points and rarely pay attention to global data. Some algorithms only focus on fitting on datasets instead of being applied to classification. To solve these problems, this study proposes an algorithm for extracting trend features from time series based on angle key points and inflection points. The algorithm selects angle key points according to the angle change values of the sequence data and then extracts inflection points based on these key points. It determines whether interpolation is needed according to segmentation requirements, so as to obtain a segmentation sequence meeting the requirements. Fitting and classification experiments are conducted on simulated data and 40 public datasets. Experimental results show that the proposed algorithm exhibits better fitting on the simulated data, compared with other algorithms such as piecewise aggregate approximation (PAA), the TD algorithm, the BU algorithm, the FFTO algorithm based on inflection points, the Trend algorithm based on turning points and trend segments, and the ITTP algorithm based on trend turning points. On the UCR public datasets, the proposed algorithm achieves an average fitting error of 1.165. Its classification accuracy is 2.8% higher than the DTW-1NN algorithm published by Keogh.
Abstract: Existing methods for binary fuzzing are difficult to dive into programs to find vulnerabilities. To address this problem, this study proposes a multi-angle optimization method integrating hardware-assisted program tracing, static analysis, and concolic execution. Firstly, static analysis and hardware-assisted tracing are used to calculate program path complexity and execution probability. Then, seed selection and mutation energy allocation are performed according to the path complexity and execution probability. Meanwhile, concolic execution is leveraged to assist seed generation and record key bytes for targeted variations. Experimental results show that this method finds more program paths as well as crashes in most cases, compared to other fuzzing methods.
Abstract: Traditional object detection algorithms often face challenges such as poor detection performance and low detection efficiency. To address these problems, this study proposes a method for detecting small objects based on an improved YOLOv7 network. This method adds more paths to the efficient layer aggregation module (ELAN) of the original network and effectively integrates the feature information from different paths before introducing the selective kernel network (SKNet). This allows the model to pay more attention to features of different scales in the network and extract more useful information. To enhance the model’s perception of spatial information for small objects, an eSE module is designed and connected to the end of ELAN, thus forming a new efficient layer aggregation network module (EF-ELAN). This module preserves image feature information more completely and improves the generalization ability of the network. Additionally, a cross stage-adaptively spatial feature fusion module (CS-ASFF) is designed to address the issue of inconsistent feature scales in small object detection. This module is improved based on the ASFF network and the Nest connection method. It extracts weights through operations such as convolution and pooling on each image of the feature pyramid, applies the feature information to a specific layer, and utilizes other feature layers to enhance the network’s feature processing capabilities. Experimental results show that the proposed algorithm improves the average precision rate by 1.5% and 2.1% on the DIOR and DOTA datasets, respectively, validating its effectiveness in enhancing the detection performance of small objects.
Abstract: Deep reinforcement learning algorithms are more and more widely used in UAV trajectory planning tasks, but many studies do not consider complex scenarios of random changes. To address the above problems, this study proposes an improved PP-CMNTD3 algorithm based on TD3, which puts forward a simple and effective prior strategy and draws on the idea of artificial potential fields to design dense rewards. UAVs are better guided to effectively avoid obstacles and swiftly approach target points. Simulation results show that the algorithm improvement can effectively improve the training efficiency of the network and the trajectory planning performance in complex scenarios. At the same time, the strategy can be flexibly adjusted under different initial power levels, achieving an effective balance between energy consumption and rapid arrival at the destination.
Abstract: Currently, super-resolution reconstruction technology is applied in various fields. However, digital elevation model (DEM) reconstruction presents numerous challenges. To address the issues of detail loss and distortion caused by inadequate utilization of complex terrain features in DEM, this study proposes a deep residual frequency-adaptive DEM super-resolution reconstruction model. The model consists of multiple high and low-frequency feature extraction modules forming a residual network structure, enhancing the overall perception of DEM features. Additionally, a frequency selection feature extraction module is integrated to improve the identification and capture of complex terrain features. The model also incorporates atrous spatial pyramid pooling, which merges multi-scale information to enhance reconstruction quality and retain detailed terrain features and structures. Final super-resolution reconstruction is completed under dual constraints in the gradient and height domains. Experimental results demonstrate that using elevation maps of the Qinling Mountains in Shaanxi with two different accuracies as test data, the deep residual frequency-adaptive DEM super-resolution model outperforms other advanced models across various metrics. Reconstructed DEMs exhibit richer details and clearer textures.
Abstract: The end-to-end Transformer model based on the self-attention mechanism shows superior performance in speech recognition. However, this model has limitations in capturing local feature information during shallow processing and does not fully consider the interdependence between different blocks. To address these issues, this study proposes Conformer-SE, an improved end-to-end model for speech recognition. The model first adopts the Conformer structure to replace the encoder in the Transformer model, thus enhancing its ability to extract local features. Next, by introducing the SE channel attention mechanism, it integrates the output of each block into the final output through a weighted sum. The experimental results on the Aishell-1 dataset show that the Conformer-SE model reduces the character error rate by 18.18% compared to the original Transformer model.
Abstract: The Hadoop system is widely used as a distributed architecture for big data storage. It generates a large amount of log data during runtime to record device anomalies, which provides important clues for locating and analyzing problems. However, traditional log anomaly detection models typically collect log data on a central server, which introduces the risk of sensitive information leakage during data collection. Federated learning, a novel machine learning paradigm, effectively protects data privacy by training models on local servers and aggregating model parameters only on a central server. This study proposes a log anomaly detection architecture based on federated learning, which combines local and central servers to perform detection tasks, avoiding the risk of leaking sensitive information during network transmission. Additionally, it employs a tree parser to standardize log templates. To effectively capture complex patterns and anomalous behaviors in log data, a BiLSTM model based on the self-attention mechanism is established as a local server model. To validate the effectiveness of the proposed method, simulation experiments are conducted using publicly available datasets of distributed systems. The results demonstrate that the model maintains stable comprehensive evaluation metrics, with an accuracy rate above 93%, indicating high applicability.
Abstract: To address the problem of low QR code reading rates caused by complex environments and changes in shooting angles during QR code detection, this study proposes an algorithm for correcting and recognizing deformed QR codes based on an improved YOLOv8n-Pose algorithm. First, the efficient channel attention (ECA) module is introduced into the backbone network. This module achieves cross-channel interaction without dimensionality reduction, effectively enhancing the feature extraction capabilities and detection accuracy of the network. Secondly, the Slim-neck architecture is adopted to reconstruct the neck network, reducing model complexity and improving the detection capability for QR codes of different scales. Finally, detected QR code corner points are used for correction through inverse perspective transformation, and the corrected QR codes are read using the ZBar algorithm. Experimental results show that, on a public QR code dataset, the improved algorithm increases mAP50 and mAP50-95 by 1.6% and 1.1%, respectively, compared to the original algorithm. Model parameters and computational costs are reduced by 6.5% and 9.5%, respectively. Detection speed on CPU and GPU is improved by 0.3 f/s and 0.7 f/s, reaching 14.2 f/s and 59.6 f/s, respectively, meeting the requirements for efficient detection of QR code corner points. In addition, on a custom-made dataset of deformed QR codes, the proposed method based on the improved YOLOv8n-Pose algorithm enhances the QR code reading rate by 23.66% compared to the standalone ZBar algorithm, achieving a recognition rate of 87.41%. This method only requires one photo to recognize all the information about the goods, which can effectively improve the efficiency of goods management.
Abstract: Model quantization is widely used for fast inference and deployment of deep neural network models. Post-training quantization has attracted much attention from researchers due to its reduced retraining time and low performance loss. However, most existing post-training quantization methods rely on theoretical assumptions or use fixed bit-width allocations for network layers during the quantization process, which results in significant performance loss in the quantized network, especially in low-bit scenarios. To improve the accuracy of post-training quantized network models, this study proposes a novel post-training mixed-accuracy quantization method (MSQ). This method estimates the accuracy of each layer of the network by inserting a task predictor module, which incorporates the pyramid pooling module and weight imprinting, after each layer of the network model. With the estimations, it assesses the importance of each layer of the network and determines the quantization bit-width of each layer based on the assessment. Experiments show that the MSQ algorithm proposed in this study outperforms some existing mixed-accuracy quantization methods on several popular network architectures, and the quantized network model tested on edge hardware devices shows better performance and lower latency.
Abstract: Group activity recognition (GAR) is one of the highly researched areas in the field of computer vision, aiming to detect the overall behavior performed by multiple individual actions and interactions. However, due to difficulties in determining individual interaction relationships, the tightness of connections, and the key actor, current methods often focus on individual character features, yet neglecting connections with scene context. To address that issue, a novel reasoning model for GAR, GIFFNet, is proposed based on global-individual feature fusion (GIFF). To compensate for the lack of scene information in predicting group activity, GIFFNet, on the basis of focusing on key information, effectively integrates scene context and individual character features by constructing the GIFF module, obtaining more representative fusion features. Subsequently, GIFFNet utilizes fusion features to calculate the interaction relationship graph between characters in the scene and uses graph convolutional network (GCN) for training and predicting group behavior categories. In addition, to address the issue of imbalanced samples in the dataset, GIFFNet adopts a strategy of dynamically assigning weights to optimize the loss function. Experimental results demonstrate that GIFFNet achieves a multi-class classification accuracy (MCA) of 93.8% and 96.1% on Volleyball and Collective Activity datasets, and the mean per class accuracy (MPCA) is 93.9% and 95.8%, respectively, outperforming other existing deep learning methods. GIFFNet provides features with a more powerful characterization ability for activity classification through feature fusion, which effectively improves GAR accuracy.
Abstract: Remaining time prediction helps enterprises improve the quality and efficiency of business process execution. Although existing deep learning methods have shown improvement in remaining time prediction, they still face challenges when dealing with complex business processes. These challenges include insufficient utilization of time features and limited ability to extract local features, leaving room for improvement in prediction accuracy. This study proposes a remaining time prediction method based on the improved Transformer encoder model. Existing methods ignore event time features and struggle to capture local dependencies. To address these limitations, this study introduces a time feature encoding module and a local dependency enhancement module into the model. The time encoding module constructs a semantically rich and discriminative event time representation by embedding learning and multi-granularity concatenation. The local dependency enhancement module uses convolutional neural networks to extract fine-grained local features from the trajectory prefix after processing with the Transformer encoder. Experiments show that integrating time features and local dependency enhancement improves the prediction accuracy of the remaining time for complex business processes.
Abstract: In crowdsourcing platforms, orders have different types (takeaway and express orders), while delivery riders are typically responsible for only one type of order (either takeaway or express delivery). Additionally, the existing delivery mechanism rarely meets the satisfaction of merchants and customers. Therefore, considering the heterogeneity of riders in a dispatch mode, this study introduces the concept of all-round riders, dividing riders into three categories: takeaway riders express riders, and all-round riders. According to the differences in the types of orders that riders can serve, a cost function based on a fuzzy time window is constructed to represent the satisfaction of merchants and customers with the time when riders arrive at pick-up and delivery points. The satisfaction is then transformed into a time penalty function. A model is constructed to minimize time penalty costs, route driving costs and personnel operation costs. Considering the characteristics of the model and the limitations of traditional algorithms, this study designs a hybrid algorithm combining genetic algorithms and search algorithms in large domains. Then, the simulated annealing algorithm, genetic algorithms, and hybrid algorithm are used to solve the problem respectively through concrete examples. The analysis of the optimization results of different algorithms validates the feasibility and effectiveness of the proposed model and the improved algorithm. Experimental results show that considering the heterogeneity of riders and the satisfaction of merchants and customers during crowdsourcing delivery not only effectively improves their satisfaction but also reduces delivery costs and improves delivery efficiency for crowdsourcing platforms. This strategy offers a reference for crowdsourcing platforms in formulating delivery strategies.
Abstract: Computed tomography (CT) scanning provides valuable material for detecting hepatic lesions in the liver. Manual detection of hepatic lesions is laborious and heavily relies on the expertise of physicians. Existing algorithms for liver lesion detection exhibit suboptimal performance in detecting subtle lesions. To address this issue, this study proposes a self-supervised liver lesion detection algorithm based on frequency-aware image restoration. Firstly, this algorithm designs a self-supervised task based on synthetic anomalies to generate a broader and more suitable set of pseudo-anomalous images, thereby alleviating the issue of insufficient abnormal data during model training. Secondly, to suppress the sensitivity of the reconstructed network to synthetic liver anomalies, a module is designed to extract high-frequency information from images. By restoring the images from their high-frequency components, the adverse generalization of the reconstructed network to anomalies is mitigated. Lastly, the algorithm adopts weight decay to train the segmented sub-networks, reducing the occurrence of trivial solutions during the early stages of training and enabling the detection of local and subtle lesions. Extensive experiments conducted on publicly available real datasets demonstrate that the proposed method achieves state-of-the-art performance in liver lesion detection.
Abstract: Rock debris recognition is an important tool in geological exploration and logging. To improve the efficiency of traditional manual lithology identification and overcome the challenges of slow inference and high computational complexity in common deep learning networks, this study proposes DAF-STDC, a real-time semantic segmentation network for rock debris images based on a well-performing STDC network model. The network uses dilated convolution to maintain resolution while extracting features and utilizes an attention mechanism to help the model acquire global information from the feature map, thus refining the edge information of rock debris particles. It also uses a feature fusion module to enhance the fusion of low-level detail features and high-level semantic features, improving feature representation. Experiments have proved that the improved network model significantly enhances accuracy. The mean intersection over union of DAF-STDC reaches 83.12% on the RC_Dataset which consists of six types of rock debris images collected from exploratory wells. While maintaining the number of parameters, DAF-STDC significantly improves its inference speed and segmentation accuracy, providing an effective reference for the digitization of rock debris logging.
Abstract: In mobile edge computing (MEC), load imbalance among edge servers occurs due to irrational task offloading strategies and resource allocation, as well as a sharp increase in the number of multi-type tasks. To address the above-mentioned issues, this study proposes a load prediction and balanced assignment scheme for multi-type tasks (LBMT) in a multi-user, multi-MEC edge environment. The LBMT scheme includes three components: task type classification, task load prediction, and task adaptive mapping. Firstly, considering the diversity of task types, a task type model is designed to classify tasks. Secondly, a task load prediction model is developed, considering the varying loads imposed by different tasks on servers, and employs an improved K-nearest neighbor (KNN) algorithm for load prediction. Thirdly, taking into account the heterogeneity of MEC servers and the limitation of resources, a task allocation model is designed in conjunction with a server load balancing model. Additionally, a task allocation method based on an adaptive task mapping algorithm is proposed. Finally, the LBMT scheme optimizes resource utilization and task processing rates for MEC servers to achieve the optimal load-balanced task offloading strategy. Simulation experiments compare LBMT with improved min-min offloading, intermediate node-based offloading, and weighted bipartite graph-based offloading schemes. The results show that LBMT improves the resource utilization rate by more than 12.5% and the task processing rate by more than 20.3%. Additionally, LBMT significantly reduces the standard deviation of load balancing, more effectively achieving load balance among servers.
Abstract: This study proposes a method for pointer instrument reading recognition based on YOLOv8 and an improved UNet++ to solve the problem of low reading recognition accuracy caused by complex backgrounds and multiple rotational angles in images of substation meters. YOLOv8 is utilized to detect the instrument area, and perspective transformation is used for rotation correction. The improved UNet++, enhanced by a polarized self-attention module, is utilized to segment dial images to extract scales and pointer regions. After the pointer line is extracted, the instrument reading is computed using the angle method. Experimental results indicate that the proposed method achieves an average citation error of 1.82% in identifying instrument readings. The method has superior recognition accuracy and is feasible for application in the intelligent inspection of pointer instruments in substations.
Abstract: Multi-agent collaboration plays a crucial role in the field of reinforcement learning, focusing on how agents cooperate to achieve common goals. Most collaborative multi-agent algorithms emphasize the construction of collaboration but overlook the reinforcement of individual decision-making. To address this issue, this study proposes an online reinforcement learning model, BiTransformer memory (BTM), which not only considers the collaboration among multiple agents but also uses a memory module to assist individual decision-making. The BTM model is composed of a BiTransformer encoder and a BiTransformer decoder, which are utilized to improve individual decision-making and collaboration within the multi-agent system, respectively. Inspired by human reliance on historical decision-making experience, the BiTransformer encoder introduces a memory attention module to aid current decisions with a library of explicit historical decision-making experience rather than hidden units, differing from the conventional RNN-based method. Additionally, an attention fusion module is proposed to process partial observations with the assistance of historical decision experience, to obtain the most valuable information for decision-making from the environment, thereby enhancing the decision-making capabilities of individual agents. In the BiTransformer decoder, two modules are proposed: a decision attention module and a collaborative attention module. They are used to foster potential cooperation among agents by considering the collaborative benefits between other decision-making agents and the current agent, as well as partial observations with historical decision-making experience. BTM is tested in multiple scenes of StarCraft, achieving an average win rate of 93%.
Abstract: Flight segments and waypoints are crucial for the normal operation of a route network. A network’s resistance to disruptions can be enhanced by correctly identifying key flight segments and waypoints and analyzing the correlation between various indicators and the importance of these segments or waypoints. To address the weak resistance of the route network to unexpected situations, both static and dynamic indicators are considered in this study. Using the entropy weight method, the weights of these indicators are determined based on their intrinsic fluctuations. Then, the technique for order preference by similarity to ideal solution is applied to calculate the optimal and worst solutions for the edges, so as to obtain comprehensive scores for each flight segment and waypoint. Further, analysis is conducted on the correlation among indicators, as well as the correlation between indicators and the comprehensive scores of flight segments or waypoints. The results show that while the indicators are independent of each other, their correlation with the scores of the flight segments or waypoints is high. This conclusion provides a basis for improving the route network structure.
Abstract: Skin cancer is one of the most common and deadliest types of cancer, with its incidence rapidly increasing worldwide. Failure to diagnose it in its early stages can lead to metastasis and high mortality rates. This study provides a systematic review of recent literature on the application of traditional machine learning and deep learning in the diagnosis of skin cancer lesions, providing valuable reference for further research in skin cancer diagnosis. Firstly, several publicly available datasets of skin diseases are compiled. Secondly, the application of different machine learning algorithms in the classification of skin cancer lesions is analyzed and compared to better understand their advantages and limitations in practical applications, with a focus on convolutional neural networks in diagnosis classification. With a thorough understanding of these algorithms, their performance differences and improvement strategies in dealing with skin diseases are discussed. Ultimately, through discussions on current challenges and future directions, beneficial insights and recommendations are provided to further enhance the performance and reliability of early skin cancer diagnosis systems.
Abstract: Due to the popularity of electric vehicles, more and more electric vehicles are illegally modified with rain shields. However, this modification increases safety hazards. Firstly, rain shields block riders’ view, increasing the risk of accidents. Secondly, rain shields can inadvertently scratch pedestrians when the modified vehicles are at excessive speeds, posing a great safety hazard and a serious threat to traffic safety. This study proposes an improved YOLOv7-tiny algorithm for detecting illegally modified electric vehicles. Firstly, a BiFormer attention mechanism is added to the network structure, enabling the model to capture more details of electric vehicles and focus more on smaller target information. Secondly, an improved feature pyramid structure is combined with the tensor concatenation of a feature fusion network to enhance the detection ability of the model for small and medium-sized targets. Finally, the ELAN and SPPCSPC modules of the framework are optimized, which improves the detection accuracy of small and medium-sized targets and enhances the effectiveness of feature extraction without adding too many parameters.
Abstract: Underwater target detection has practical significance in ocean exploration. This study proposes a FERT-DETR network suitable for underwater target detection to address the issues of complex underwater environments and limited target feature extraction due to occlusion and overlap. The proposed model first introduces a feature extraction module, Faster EMA, to replace the BasicBlock of ResNet18 in RT-DETR, which can significantly improve its capability to extract features of underwater targets while effectively reducing the number of parameters and depth of the model. Secondly, a cascaded group attention module, AIFI-CGA, is used in the encoding part to reduce computational redundancy in multi-head attention and improve attention diversity. Finally, a feature pyramid for high-level filtering named HS-FPN is used to replace CCFM, achieving multi-level fusion and improving the accuracy and robustness of detection. The experimental results show that the proposed algorithm, FERT-DETR, improves detection accuracy by 3.1% and 1.7% compared to RT-DETR on the URPC2020 and DUO datasets respectively, compresses the number of parameters by 14.7%, and reduces computational complexity by 9.2%. It can effectively avoid missed and false detection of targets of different sizes in complex underwater environments.
Abstract: This study proposes a model called E2E-DRNet to address issues in manual diabetic retinopathy (DR) diagnosis, including poor classification performance, laborious processes, minimal differences in grades of retinal images, and inconspicuous lesions. This model is based on EfficientNetV2 and incorporates the efficient channel attention (ECA) module. By processing and optimizing a DR dataset, the Focal Loss function is introduced to address sample imbalance. The model achieves refined DR classification through two stages. Experimental results demonstrate that the proposed model performs well on both public and clinical datasets. Additionally, it enhances the interpretability of lesion regions in fundus images, thereby improving the efficiency of DR lesion screening and overcoming the limitations of manual diagnosis.
Abstract: As an Internet infrastructure, DNS is rarely subjected to deep monitoring by firewalls, allowing hackers and Asia-Pacific Telecommunity (APT) organizations to exploit DNS covert tunnels for data theft or network control and posing a significant threat to network security. In response to the easily bypassed nature of existing detection methods and their weak generalization capabilities, this study enhances the characterization method of DNS traffic and introduces the pcap features extraction CNN-Transformer (PFEC-Transformer) model. This model uses characterized decimal numerical sequences as input, conducts local feature extraction through CNN modules, and then analyzes long-distance dependency patterns between local features by using the Transformer for classification. The research builds datasets by collecting internet traffic and data packets generated by various DNS covert tunnel tools and conducts generalization testing with publicly available datasets containing traffic from unknown tunneling tools. Experimental results demonstrate that the model achieves an accuracy of 99.97% on the testing dataset and 92.12% on the generalization testing dataset, effectively showcasing its exceptional performance in detecting unknown DNS covert tunnels.
Abstract: Most current recommendation models often overlook the importance of features during feature interactions, leading to low accuracy. To address this issue, an enhanced recommendation model combining feature selection and the cross network is proposed. The SENet network is employed to filter out unimportant features before feature interaction, enabling the extraction of more valuable interaction information. On this basis, parallel cross network and deep neural network are utilized to capture explicit and implicit feature interactions. Additionally, low-rank techniques are intro-duced in the cross network, transforming weight vectors into low-rank matrices to maintain model performance and reduce model training costs. Comparative experiments on the datasets of MovieLens-1M and Criteo demonstrate that the proposed recommendation model is significantly superior to other models in terms of AUC metrics, which proves the effectiveness of the proposed recommendation model.
Abstract: The rapid growth of security inspection demand drives the development of intelligent security inspection technology. Due to the unique characteristics of X-ray images, detecting small contraband items is challenging. This study proposes an improved YOLOv8s network for contraband recognition to address this issue. Firstly, the Focal L1 Loss function is introduced to enhance CIoU and optimize the position and aspect ratio of prediction boxes to improve the network’s ability to identify contraband items. Improved deformable convolution is added to the shallow backbone network to capture features of contraband items in different directions. LSKA is incorporated into the SPPF module to expand the network’s receptive field, while the Swin-CS module captures global information and supplements dimensional interaction. Finally, three stacked attention blocks are used for processing, enhancing the network’s sensitivity towards small targets. The improved network achieves an average precision mean of 96.1% on the SIXray dataset, a 5.4% improvement over YOLOv8s with mAP50-95 reaching 0.682, a 4.5% increase. Experimental results indicate that the proposed model can accurately generate prediction boxes, effectively handle contraband detection in complex scenarios, and validate algorithm effectiveness.
Abstract: The AI diagnostic model based on deep learning relies heavily on high-quality detailed annotated data for algorithm training, but is affected by label noise information. To enhance the robustness of the model and prevent noisy label memory, a noise label sample selection (NLSS) model is proposed to fully mine the hidden information of noise samples and alleviate model overfitting. Firstly, distributed feature representations of the image are extracted by taking hybrid enhanced images as input. Secondly, the contrasive loss function is introduced to compare the similarity between the predicted label distribution of the sample and the real label distribution for sample evaluation and selection. Finally, based on sample selection, supervised information of the noisy label is re-corrected by the pseudo-label promotion strategy of the label redistribution module. Taking the PET/CT dataset of non-small cell lung cancer (NSCLC) patients as an example, results show that the proposed models outperform comparison models, reducing the interference of label noise in the diagnosis of lymph node metastasis.
Abstract: Braille conversion technology is crucial for advancing information accessibility for the blind. With the rapid advancement of information globalization, the blind are increasingly exposed to bilingual information in both Chinese and English. While existing braille conversion systems have successfully translated Chinese and English into braille, they fall short in accurately converting punctuation, including poor differentiation of punctuation with multiple uses and lack of error correction for the mixed use of Chinese and English punctuation. Failure to address these issues may lead to misunderstanding of text by the blind. This study delves into these problems, designing and implementing a bilingual braille conversion system capable of distinguishing multipurpose punctuation and correcting the mixed use of punctuation. The performance of the system is evaluated by using a dataset based on BLCU Chinese Corpus. The results demonstrate that the proposed system accurately distinguishes multipurpose punctuation and corrects the mixed use of Chinese and English punctuation according to language types and context, outperforming other braille conversion systems. Overall, this research has significant potential for promoting information accessibility in China.
Abstract: With the continuous development of industrial automation, the three-dimensional reconstruction technology of workpieces is playing an increasingly important role in the manufacturing industry. In actual working environments, there is a common problem of stacking workpieces, which significantly impacts subsequent work including robot recognition and grasping. Currently, it is hard for 3D reconstruction to extract image feature points and achieve accurate feature registration in workpieces with weak textures. To address the above issues, this study proposes a 3D reconstruction method for stacked workpieces based on deep learning with multi-view stereo matching. Firstly, multiple images from different perspectives are input through a DCNv2-based feature pyramid network for feature extraction. Then, homography transformation is performed to construct cost volumes, and a unified cost volume is obtained through variance aggregation. In the regularization section of the cost volume, an SE channel attention module is introduced to improve the feature expression ability of the network and enhance the performance and generalization ability of the model. This method exhibits good performance on the Danish Technical University (DTU) dataset. The point cloud model of stacked workpieces generated by this method is of great significance for future applications of industrial automation.
Abstract: As the demand for unmanned aerial vehicle (UAV) applications continues to expand, the design of disturbance rejection controllers which aim to ensure that UAVs can complete designated tasks as required has received significant attention. Traditional control algorithms widely used currently exhibit good stability but poor disturbance rejection capability. To address this issue, a hybrid disturbance rejection controller based on an improved twin delayed deep deterministic policy gradient (TD3) algorithm is proposed. This method utilizes nonlinear model predictive control (NMPC) as the base controller and introduces a disturbance compensator based on improved TD3 for hybrid control. This approach combines the advantages of the NMPC controller as well as addresses the shortcomings in disturbance rejection of traditional control algorithms. This study introduces a multi-head attention (MA) mechanism and long short-term memory (LSTM) network into the Actor network of TD3, enhancing TD3’s ability to capture spatial management information and temporal correlation information. Additionally, a continuous logarithmic reward function is introduced to improve training stability and convergence speed, and training is conducted using random task scenarios with random disturbances to enhance model generalization. In experiments, the NMPC-MALSTM-TD3 architecture is compared with architectures using DDPG, SAC, TD3, and PPO algorithms as disturbance compensators. Experimental results demonstrate that the NMPC-MALSTM-TD3 architecture exhibits the most excellent disturbance rejection capabilities and a smaller influence on the stability and real-time performance of NMPC.
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.