Multimodal mechanical wear fault diagnosis: Fusion of signal characterization and image information
Published in Results in Engineering, 2025
Recommended citation: Zhou, Q., Chai, B., Guo, Y., Li, T., Zhou, S., Wang, K., & Ye, Y.* (2025). "Multimodal mechanical wear fault diagnosis: Fusion of signal characterization and image information." Results in Engineering, 28, 107204. https://doi.org/10.1016/j.rineng.2025.107204
In the industrial sector, diagnosing bearing metal surface wear faults presents several challenges, including limited data sources, difficulty in detecting small defects, and redundancy in fault modes. The main goals of the research are to improve the detection accuracy of small wear defects, solve the problem of multi-scale defect localization, and achieve effective fusion of signal and image information. The method is based on the YOLOv8 architecture, utilizing the Faster-EMA backbone network and incorporating a multi-scale, lightweight channel-spatial attention mechanism to accurately localize defects of different scales. Meanwhile, the KernelWarehouse method is introduced to dynamically optimize convolutional kernels, enabling adaptation to changing industrial conditions and significantly improving feature extraction for wear modes such as cracks, pitting, and scratches. A novel Inner-MPDIoU loss function is proposed to enhance bounding box regression accuracy by jointly optimizing center distance and minimum envelope deviation. For comprehensive failure analysis, parallel Transformer branches process synchronized time-frequency domain signals, with cross-modal feature fusion achieved through a self-attention mechanism, achieving a detection accuracy of 82.8% and a real-time processing speed of 12.2 ms/plot. Compared with existing methods, the mean average precision (mAP) is improved by 7.1%, and the accuracy of failure mode diagnosis increases by 20.5%. This study offers an effective solution for industrial predictive maintenance, enhancing the reliability and efficiency of wear fault detection in real-world scenarios.