Abstract:With the widespread adoption of electronic health record (EHR), retrieving similar cases has become a critical task in supporting clinical decision-making, such as in auxiliary diagnosis and treatment planning. However, EHR data is characterized by high dimensionality, heterogeneity, and large volume. To effectively integrate multimodal clinical data and achieve efficient retrieval, this study proposes a multimodal clinical data retrieval model for similar cases based on deep hashing—MCDF. This model employs different methods for feature extraction tailored to the characteristics of various modalities, utilizing multi-layer perceptron (MLP) for structured text data, BioBERT for unstructured text data, and BioMedCLIP for image data, followed by feature fusion through a self-attention mechanism. A triplet loss function guides the model to directly generate hash codes that effectively represent the samples, enabling rapid comparisons for sample retrieval. This not only enhances retrieval accuracy but also significantly improves efficiency. Using the publicly available MIMIC-III dataset, the MCDF model is evaluated against traditional hashing methods (such as spectral hashing) and advanced hashing methods (such as deep hashing network) using mean normalized discounted cumulative gain (MNDCG) and mean average precision (MAP) metrics for evaluation. Experimental results demonstrate that the MCDF model outperforms all baseline models, validating the superiority of the proposed approach.