Abstract:As the digital twin VR technology is increasingly widely applied, a method named RandLA-CGNet for large-scale indoor point cloud semantic segmentation is proposed to solve the problems such as the limited overall accuracy, low recognition accuracy for small objects, and blurred boundary segmentation in point cloud semantic segmentation of large-scale indoor buildings. In the encoder layer, a local-global context fusion (LGCF) module is constructed, preserving local neighborhood information while incorporating global contextual semantics. In the decoder layer, a norm-gated channel feature (NGCF) module is designed, which performs the adaptive recalibration of feature maps along the channel dimension to enhance useful information and suppress redundant noise, thereby enhancing sensitivity to details and boundaries, and improving the model’s refined recognition capability. Finally, focused cross-entropy loss (FCE loss), a hybrid loss function, is adopted to ensure stable convergence for the majority of samples and maintain overall accuracy. Additionally, this function increases the focus on hard samples and minority class samples, thereby enhancing the model’s segmentation performance in boundary regions and for rare classes. Experimental results show that the proposed model on the S3DIS dataset by employing 6-fold cross-validation increases OA, mAcc, and mIoU to 88.8%, 83.4%, and 71.9% respectively, an improvement of 0.8%, 1.4%, and 1.9% respectively compared with the baseline models. Compared to mainstream algorithms, it increases LG-Net by 0.5%, 1.0%, and 1.1% respectively, with the overall accuracy and mean intersection of union (IoU) 0.2% and 0.7% higher than FGC-AF respectively. While maintaining overall performance advantages, RandLA-CGNet improves the IoU for small objects and boundary detail segmentation by 1%–6%, significantly enhancing the recognition capability for low-frequency classes and complex boundaries. Finally, an effective solution is provided for the precise modeling of few-sample classes and detail boundaries in point cloud semantic segmentation tasks.