A bearing fault diagnosis method for hydrodynamic transmissions integrating few-shot learning and transfer learning

Testing and analysis of few-shot fault diagnosis models under cold-start conditions
Following the experimental protocol, the target-domain test data are partitioned into four subsets: Single-condition TS1 (800 r/min, 30 samples per fault class, 120 total), Single-condition TS2 (1200 r/min, 30 samples per fault class, 120 total), Single-condition TS3 (1500 r/min, 30 samples per fault class, 120 total) and Mixed-condition TS1 + 2 + 3 (combined data from three conditions, 90 samples per fault class, 360 total).SVM, WDCNN, FSL, WDCNN + TL, and FSL + TL methods were comparatively analyzed. Diagnostic results are illustrated in Fig. 5.
The results demonstrate that traditional SVM achieves low accuracy across all conditions (16.77%, 16.21%, 15.59%, and 16.19%), indicating its inability to capture complex fault patterns under few-shot constraints. WDCNN exhibits improved performance (70.13%, 69.60%, 69.01%, and 69.58%), validating the advantage of end-to-end deep learning in automatic feature extraction, though overfitting and robustness issues persist under data scarcity. The FSL method, leveraging contrastive learning via Siamese networks, achieves 80.24%, 79.73%, 79.16%, and 79.71% accuracy, outperforming WDCNN by 10.13% in mixed conditions, thereby confirming the efficacy of few-shot learning in low-sample regimes. However, FSL alone suffers from cross-domain distribution bias, resulting in accuracy fluctuations across speed conditions. By integrating transfer learning (TL) to fine-tune pre-trained source-domain parameters, WDCNN + TL attains 82.66%, 82.17%, 81.63%, and 82.15% accuracy, surpassing WDCNN and FSL by 12.57% and 2.44%, respectively, in mixed conditions. The above experimental results show that SVM performs poorly due to its reliance on linear assumptions, failing to capture nonlinear fault patterns under few-shot conditions. WDCNN improves accuracy by learning hierarchical features from raw data, yet overfits with limited samples. FSL leverages contrastive learning to enhance feature separability but suffers from cross-domain bias. WDCNN + TL addresses this via transfer learning: fine-tuning preserves generic features while aligning domains, balancing data efficiency and generalization.

Diagnostic Results of Different Methods Under Cold-Start Conditions.
To fully exploit the efficacy of few-shot learning under low-sample conditions and the advantage of transfer learning in leveraging pre-trained model priors under data scarcity, the proposed FSL + TL method first pre-trains the Siamese network on the CWRU dataset, followed by fine-tuning with limited target-domain samples. Experimental results demonstrate that this method achieves accuracy rates of 85.78%, 85.32%, 84.80%, and 85.30% across all operating conditions, outperforming all comparative methods. Specifically, it surpasses the FSL method by 5.54%, 5.59%, and 5.64% in TS1, TS2, and TS3, respectively, and exceeds the TL method by 3.12%, 3.15%, and 3.17% in the same conditions. The superior performance of FSL + TL stems from its dual-phase knowledge integration: pre-training on the CWRU dataset captures universal fault patterns, while fine-tuning with limited target-domain data aligns domain-specific features. This hybrid approach mitigates cross-domain bias inherent in pure FSL and enhances data efficiency beyond standalone TL, achieving consistent accuracy gains by balancing generalization and adaptation.
Notably, TS1 (lowest speed) yields the highest accuracy, with slight declines observed at higher speeds (TS2, TS3), as shown in Fig. 6. This aligns with mechanical vibration principles, where low-speed conditions amplify fault-related features.

Influence of Rotational Speed on Diagnostic Accuracy.
For intuitive evaluation, Fig. 7 presents confusion matrices for all five methods under mixed conditions using 360 test samples.

Confusion Matrices of Diagnostic Results.
Further evaluation metrics—precision, recall, F1-score, and macro-average—are quantified in Table 4.
Analysis of confusion matrices and evaluation metrics across methods reveals that SVM exhibits significantly lower performance in all metrics. Specifically, its prediction rate for the normal state (Class 0#) is merely 36%, with severe misclassification observed for inner race (Class 1#) and outer race (Class 3#) faults, indicating ineffective vibration signal feature extraction under cold-start few-shot conditions, leading to widespread fault misidentification. In contrast, WDCNN achieves 100% correct predictions for the normal state, with a macro-average F1-score of 0.7—4.1× higher than SVM. However, it still struggles to distinguish subtle fault features between inner and outer race faults under few-shot constraints. The FSL method substantially reduces inter-fault confusion, elevating the macro-average F1-score to 0.8. With transfer learning integration, WDCNN + TL and FSL + TL further improve macro-average F1-scores to 0.82 and 0.85, respectively, confirming TL’s efficacy in mitigating data scarcity. These findings demonstrate that the FSL + TL method, by synergizing few-shot learning and transfer learning, achieves optimal overall accuracy and minimizes inner/outer race fault misclassification. FSL employs contrastive learning to extract discriminative features from limited data, enhancing separability between subtle fault classes. Meanwhile, TL bridges domain gaps by fine-tuning pre-trained models on hydrodynamic transmission data, selectively preserving domain-invariant features while aligning target-specific characteristics.
Fault-class-specific analysis shows all methods perform best on Class 0#, while Classes 1# and 3# remain challenging, with peak F1-scores of 0.8. Class 2# exhibits consistently lower recall than precision, suggesting under-detection risks for this fault type.
Testing and analysis of few-shot fault diagnosis models under imbalanced fault samples
In real industrial scenarios, fault samples for certain operating conditions may accumulate over time, while other fault categories remain few-shot. This experiment evaluates model generalization and robustness under imbalanced data by fine-tuning the CWRU pre-trained model on combined TR1, TR2, and TR3 datasets, followed by testing on the mixed-condition TS1 + 2 + 3 set. Comparative analysis is conducted between FSL + TL(Siamese network pre-training + fine-tuning with limited target-domain samples) and FSL + TL + AM(FSL + TL enhanced with an attention mechanism), as shown in Fig. 8.The objective is to validate whether these methods stably improve cross-condition generalization under data imbalance and whether the attention mechanism mitigates inner/outer race fault misclassification. Diagnostic results are shown in Fig. 9.

WDCNN with Integrated Attention Mechanism.

Diagnostic Results Under Imbalanced Fault Samples.
Under mixed conditions, the FSL + TL method achieves an overall accuracy of 86.42%, while the FSL + TL + AM method improves accuracy to 88.75%—a 2.33% enhancement, primarily due to the attention mechanism (AM) dynamically enhancing discriminative features (e.g., impact harmonics) for inner/outer race faults. Figure 10 presents confusion matrices for both methods using 360 test samples under mixed conditions, while Table 5 quantifies their evaluation metrics per fault class. Although residual confusion persists between inner and outer race faults due to limited samples in specific target-domain conditions, both methods achieve high performance. Notably, the attention mechanism critically improves discrimination: the inner race F1-score increases from 0.82 (FSL + TL) to 0.85 (FSL + TL + AM), and the outer race F1-score rises from 0.82 to 0.84. AM effectively addresses data imbalance by prioritizing fault-sensitive frequency bands, demonstrating robustness in complex industrial scenarios.

Confusion Matrices of Diagnostic Results.
Furthermore, our analysis reveals that the accuracy of the FSL + TL method improves with increasing fine-tuning sample size, as shown in Fig. 11. The accuracy under target-domain few-shot fine-tuning reaches 85.30%, which increases to 86.42% (a 1.12% improvement) when fine-tuned with additional samples. More target-domain data reduces overfitting by providing richer feature distributions for parameter optimization, thereby boosting generalization on unseen industrial scenarios.

Impact of Fine-Tuning Sample Size on Accuracy.
Finally, a comprehensive analysis of macro-averaged evaluation metrics and model accuracy (Fig. 12) demonstrates that the proposed FSL + TL method, through the integration of transfer learning and few-shot strategies, significantly enhances model robustness and generalization capability. The incorporation of the attention mechanism further optimizes critical feature extraction, markedly improving the model’s generalization capacity and classification stability for complex fault patterns.

Macro-Average Metrics and Accuracy Comparison.
Analysis of FSL + TL + AM’s computational efficiency
To substantiate our claims regarding real-time and low-computational demands, we conducted a detailed analysis of model parameters, computational complexity, and inference efficiency. The benchmark comparison results are summarized in Table 6. Our FSL + TL + AM framework achieves efficient industrial deployment through the following optimizations:
-
(1)
Parameter Efficiency: The proposed framework achieves a compact parameter size of 1.49 M, significantly lower than conventional meta-learning model, through two key design choices: the Siamese-WDCNN architecture employs wide convolutional kernels (64 × 1 stride) to retain low-frequency fault features with reduced layer depth, and selective parameter fine-tuning updates only 35% of parameters (fully connected and attention layers) during transfer learning, minimizing memory overhead while preserving pre-trained knowledge.
-
(2)
FLOPs Optimization: With 3.68G FLOPs per inference, computational efficiency is ensured via attention-guided feature pruning and hardware-aware layer fusion, cutting memory access latency. These optimizations balance accuracy and computational load for edge devices.
-
(3)
Inference Speed: Tested on an NVIDIA RTX 3060 (PyTorch 1.40), the model processes samples in 14.08 ms (excluding data loading), meeting stringent industrial real-time requirements. This efficiency stems from architectural lightweighting and deployment optimizations.
These results confirm that FSL + TL + AM balances accuracy and efficiency, making it deployable on edge devices for real-time hydrodynamic transmission monitoring.
link