Detecting deformation mechanisms of metals from acoustic emission signals through knowledge-driven unsupervised learning
Design of a knowledge-driven unsupervised learning model
Figure 1 illustrates the computational workflow of the proposed unsupervised learning model. The framework evaluates various machine learners as potential base learners, including Gaussian Discriminant Analysis (GDA), Logistical Regression (LR), Support Vector Machine (SVM), Kernel Perceptron, Back-Propagation Neural Network (BPNN), Gradient Boosting Decision Tree47 (GBDT), and Random Forests48. Each base learner processes the power spectrum densities (PSD) of raw AE waveforms (Fig. 1a (1–3)) (see “Methods”). In this framework, each base learner functions as a Gradient-driven Supervised Classifier (GDC), where hyperparameters are adjusted based on gradients of an externally evaluated loss function through the classical back-propagation. For a given GDC configured with specific hyperparameters, it analyzes a collection of PSD data to identify corresponding deformation mechanisms. A KIALF (Fig. 1a (4)) is introduced for autonomous model training and evolution. Unlike traditional supervised learning loss functions, this new function computes aggregate loss based on domain knowledge, enabling unsupervised learning without labeled training samples. After training all base learners, the optimal model is selected based on the overall performance (Fig. 1a (5)). Following the principle of Ockham’s razor49,50, simpler models with competitive performance are preferred. The base learner that best balances these criteria is ultimately chosen as the final GDC model. Once trained, the model can be readily applied to monitor AE signals from new samples of the same material without further tuning. Integrated with the avalanche theory51,52, the approach can dynamically identify deformation mechanisms and provide early failure warnings (Fig. 1a (6–8)).

a The proposed unsupervised learning strategy framework is illustrated as follows: (1) The Acoustic Emission (AE) spectrum derived from experimental input is divided into three regions based on their statistical features: Region 1 (R1) is dislocation dominant, Region 3 (R3) is crack dominant, and Region 2 (R2) is a mix of both mechanisms. (2) Corresponding acoustic waveforms of the AE spectrum. (3) Power spectrum densities (PSD) are extracted from AE waveforms. (4) The Knowledge-Infused Aggregate Loss Function (KIALF) dispatches base learners (GDC) in an unsupervised fashion. (5) The best overall performing base learner is elected as the ultimate choice for the backbone model. (6) New AE waveforms from subsequent experiments are input into the system. (7) The signals are separated and further analyzed using the proposed approach. (8) Early failure warning is conducted based on the separated signals. b Essential steps of the proposed approach. A set of unlabeled AE signals Z is applied to a base learner GDCn, yielding a probabilistic classification output COn. The ratios of crack-related signals in two randomly sampled intervals ri = [si, ei] and rj = [sj, ej] are calculated as \({{{\mathcal{T}}}}_{n}({r}_{i})\) and \({{{\mathcal{T}}}}_{n}({r}_{j})\), respectively. The aggregate trend metric \({{{\mathcal{L}}}}_{{\mbox{Trend}},n,i,j}\) is computed for the interval pair. The full-period loss KIALF (\({{{\mathcal{L}}}}_{n}\)) is obtained via repeated sampling and backpropagated to optimize the base learner parameters θn. The optimized learner GDCnopt is produced by minimizing \({{{\mathcal{L}}}}_{n}\).
Figure 1b illustrates the essential steps internally executed by the proposed approach. Specifically, if a base learner, GDCn, is applied onto an unlabeled AE signal zm, a probabilistic classification output, COn,m, is obtained, indicating the likelihood that the signal corresponds to the dislocation or cracking mechanism as recognized by the learner. Each base learner (GDCn) undergoes independent training to minimize its respective loss function, yielding an optimized version denoted as GDCnopt.
The tailor-designed aggregate loss function is constructed based on our domain knowledge, the ratio of crack signals during deformation should gradually increase from 0 to 1. To quantify this expected trend, we define \({{{\mathcal{T}}}}_{n}(r)\) as the ratio of crack-related signals identified by learner GDCn over time interval r, given by:
$${{{\mathcal{T}}}}_{n}(r)=\frac{{\sum}_{m:{z}_{m}\in r}{\mathrm{ln}}\;({{{\rm{CO}}}}_{n,m})}{{\sum}_{m:{z}_{m}\in r}1}$$
(1)
To quantify alignment with domain knowledge, we define an aggregate trend metric \({{\mathcal{L}}}_{{{\rm{Trend}}},n,i,j}\) based on two randomly sampled intervals (ri = [si, ei] and rj = [sj, ej]), computed as
$${{{\mathcal{L}}}}_{{{\mbox{Trend}}},{{\mbox{n}}},{{\mbox{i}}},{{\mbox{j}}}}=\left({{{\mathcal{T}}}}_{n}\left({r}_{j}\right)-{{{\mathcal{T}}}}_{n}\left({r}_{{i}}\right)\right)\delta \left({s}_{{i}}\,,\,{e}_{{i}}\,,{s}_{{j}}\,,\,{e}_{{j}}\right)$$
(2)
in which δ is an auxiliary function, defined as follows:
$$\delta ({s}_{{i}}\,,\,{e}_{{i}}\,,{s}_{{j}}\,,\,{e}_{{j}})=\left\{\begin{array}{c}\begin{array}{cc}1 & {If}{s}_{i} < {e}_{i} < {s}_{j} < {e}_{j}\end{array}\\ \begin{array}{cc}0 & {else}\end{array}\hfill\end{array}\right.$$
The loss score decreases as the signal ratio trend aligns more consistently with the forecast according to the aforementioned domain knowledge.
Third, we aggregate interval-specific loss terms into a comprehensive loss function for the full observation period through repeated sampling. A temporally-weighted scheme is applied to account for the greater influence of longer intervals on the overall trend. The tailor-defined loss function for the training stage, \({{{\mathcal{L}}}}_{n}\), can be specified in a normalized form:
$${{{\mathcal{L}}}}_{n}=\frac{{\sum }_{i}^{N}{\sum }_{j}^{N}({e}_{i}-{s}_{i}+{e}_{j}-{s}_{j}){{{\mathcal{L}}}}_{{\mbox{Trend}},n,i,j}}{{\sum }_{i}^{N}{\sum }_{j}^{N}\left({e}_{i}-{s}_{i}+{e}_{j}-{s}_{j}\right)}$$
(3)
Here, N represents the total number of sampling iterations performed across the observation period.
We note: (1) Our framework employs a domain-informed loss function derived entirely from mathematical encodings of materials science principles governing acoustic emission (AE) signal distributions. Operating without labeled data, the system utilizes only AE timestamp sequences acquired during standard monitoring, enabling unsupervised signal sequence reconstruction and base learner optimization. (2) The architecture is intentionally modular—any machine learning model capable of self-optimization through gradient-based updates can be incorporated. This design creates an adaptive ecosystem where the continuous integration of new compatible algorithms systematically enhances predictive performance, while maintaining rigorous adherence to physical constraints encoded in the loss function. The approach uniquely bridges domain knowledge with flexible machine learning, avoiding the data hunger of purely statistical methods while preserving interpretability.
Furthermore, our framework leverages the intrinsic diversity of base learners, where each specialized variant emphasizes distinct signal features. This design intentionally amplifies discriminative capabilities for identifying characteristic patterns among different deformation mechanisms. To systematically evaluate and harness these complementary strengths, we implement a lightweight scoring model that quantifies the predictive performance of each optimized base learner (GDCnopt). The scoring metric prioritizes detection fidelity for the most physically significant signal differentiators, ensuring the ensemble focuses on mechanistically relevant features rather than incidental correlations. The overall performance of a model, denoted as Perf (where higher values indicate better performance), can be evaluated as follows:
$${{\rm{Perf}}}=\left(1-k\right)/{{{\mathcal{L}}}}_{n}+k/\,{\mathrm{lg}}({{{\rm{Num}}}}_{n})$$
(4)
The objective function comprises two key components: (1) a performance term evaluating the domain-informed loss \({{{\mathcal{L}}}}_{n}\)’s effectiveness, and (2) a complexity penalty enforcing parsimony through Ockham’s razor principle. This dual structure ensures optimal balance between physical fidelity (encoded in \({{{\mathcal{L}}}}_{n}\)) and model simplicity53. Here, Numn represents the number of trainable parameters in the base learner. The coefficient k is introduced as a tradeoff parameter. Since modern neuro-computing models typically entail a large number of hyperparameters, resulting in a relatively large lg(Numn) term, the value of k is expected to properly balance the two terms in Eq. (4). In this study, k is empirically optimized to 0.1. Through this optimization framework, the system automatically selects the highest-performing base learner to serve as the backbone model, ensuring optimal representation of the underlying physical processes. i.e., \({{\mathcal{L}}}={{\arg }}{\max }_{n}{{\rm{Perf}}}\).
In summary, as illustrated in Supplementary Fig. S1, the key distinction between the proposed framework and traditional supervised learning methods is its ability to train the model without requiring any labeled data. This capability is achieved by analyzing trends and properties derived from a collection of samples and comparing them with expected trends and properties based on established domain knowledge. By incorporating this knowledge into the training process, the new framework enables effective task learning in an unsupervised manner.
Performance of the proposed approach
Figure 2 shows the results of applying the proposed approach to the experimental dataset for porous 316L stainless steel under uniaxial tension13. Figure 2a illustrates three distinct stages identified using a peer statistical method13. The percentages of AE signals in these stages are 70.48%, 27.96%, and 1.56%, respectively. The figure also shows that the maximum stress of the porous material reaches 45 MPa at the 11,800 s, after which the material exhibits continuous fractures. Figure 2b, c present the results obtained using the proposed approach with the optimally identified classifier (a BPNN, details provided later). The model successfully identified 12,867 dislocation signals (Fig. 2b) and 413 crack signals (Fig. 2c) from a total of 13,280 AE signals. Both deformation mechanisms are observed to co-exist throughout the entire tensile process. Even in the early stage (region 1), where dislocation signals predominate, the presence of crack signals is significant. Similarly, in the later stage (region 3), where crack signals dominate, dislocation signals are also present and non-negligible.

a Time series of all AE signals (colored lines) overlaid with the stress-time curve (black). Blue, green, and orange indicate dislocation-dominant (Region 1), mixed (Region 2), and crack-dominant (Region 3), respectively. b, c AE signals separated into dislocation (b) and crack (c) spectra, with 21 and 14 superjerks (dashed lines), respectively. d, e Temporal evolution of power-law energy exponents (ε) for dislocation (d) and crack (e) signals, calculated by maximum likelihood estimation between adjacent superjerks. The sample size between consecutive superjerks in (d) is 147, 287, 313, 2185, 266, 1129, 1705, 123, 21, and 9, respectively. The sample size between consecutive superjerks in (e) is 88, 71, 57, 22, 9, 17, 20, 29, and 5, respectively. The red dashed line indicates the mean field value (ε = 1.33). Data in (d and e) are shown as mean ± standard deviation. Source data are provided as a Source Data file.
To explore the performance impact of base learners on the proposed approach, we compared top 5 best-performing machine learning algorithms as our base learner. These include LR, GDA, SVM, GBDT, and BPNN. The outcomes for each classifier of all AE signals are collectively analyzed using maximum likelihood estimation (MLE)54 to construct a maximum likelihood (ML) curve. MLE is a method that directly determines whether a group of identified AE signals follows only one deformation mechanism. A single deformation mechanism leads to a horizontal ML plateau extending from the lowest energy kink to a higher energy kink in the ML curve. In contrast, co-occurrence of multiple deformation mechanisms leads to a decrease of the ML curve at high energy54. As a result, by measuring the harmonic mean of the plateau lengths in the curves (HMLE, detail in “Methods”), we quantitatively assess the avalanche statistics of each recognized deformation mechanism. The left panel of Fig. 3a illustrates the performance of each classifier evaluated by overall performance and HMLE for the candidate classifiers explored. The classifier’s overall performance, evaluated by Pref (Eq. (4)), is consistent with the statistical performance of HMLE.. For both criteria, the BPNN with 237 trainable parameters achieves the best overall performance and highest HMLE amongst all candidates.

a Performance of candidate classifiers and peer methods evaluated by four metrics: HMLE (purple, harmonic mean of plateau lengths by maximum likelihood estimation), number of trainable parameters (green), aggregate loss impact \({{{\mathcal{L}}}}_{n}\) (blue), and overall performance (orange), where \({{{\mathcal{L}}}}_{n}\) is the domain-informed loss, Numn is the number of trainable parameters for candidate classifiers, and k is a tradeoff parameter. Candidate classifiers for comparison include Gaussian Discriminant Analysis (GDA), Logistical Regression (LR), Support Vector Machine (SVM), Gradient Boosting Decision Tree (GBDT), and Back-Propagation Neural Network (BPNN). Peer methods include k-means, ScaNet45 and an Imbalanced Discriminative Neural Network (IDNN) with focal loss55. Circles and error bars indicate mean and standard deviation from 100 resamples. b Temporal evolution of crack signal ratios identified by different methods during tensile loading. Regions are defined by the statistical method13. Curves are computed using a 1200 s sliding window, with shaded areas denoting 90% confidence intervals. c Maximum likelihood estimation (MLE) curves of dislocation (dark blue) and crack (orange) signals in Regions 1 and 3 classified by the statistical method (+) and the proposed method (○). Sample sizes: statistical method—9360 (dislocations), 204 (cracks); proposed approach—9198 (dislocations), 93 (cracks). d MLE curves for Region 2 (green) and signals identified as dislocations and cracks by the proposed method. Sample size: 3636 for dislocation, 80 for cracks, and 3716 for mixture. Data in (d and e) are shown as the mean ± standard deviation. e Normalized cumulative AE amplitude of separated crack signals versus ideal fracture model (black line), demonstrating physical consistency. Source data are provided as a Source Data file.
Figure 3b illustrates the ratios of crack signals as a function of load time for different separation methods. The proposed approach identifies less than 10% of crack signals before 10,000 s, with a significant increase in ratio as the load progresses towards the fracture point. Since dislocations dominate the early stages of the tensile process and the ratio of initial crack signals should not exceed 50%, the results from the proposed approach align more closely with the domain knowledge encoded in Eqs. (1)–(3).
Ablation study of the proposed approach
We performed quantitative ablation studies to evaluate three key components of our framework: classifier selection criteria, loss function formulation, and optimization strategy. Our proposed classifier selection criterion Eq. (4), combines model performance (first term) with complexity measured by trainable parameters (second term). Individual evaluation of these terms revealed opposing trends: while GBDT showed superior raw performance (GBDT > BPNN > LR > SVM > GDA), LR exhibited optimal simplicity (LR > BPNN > GDA > SVM > GBDT). Crucially, only the combined evaluation yielded results (BPNN > GBDT > LR > SVM > GDA) consistent with HMLE validation. Supplementary analyses confirmed BPNN’s crack signal identification most closely matched domain knowledge across loading conditions (Supplementary Table S1), validating our dual-term selection approach.
To evaluate the robustness of our loss function formulation, we conducted additional ablation studies comparing alternative formulations: (1) a non-logarithmic version \(\left(\frac{{\sum}_{m:{z}_{m}\in r}{{{\rm{CO}}}}_{n,m}}{{\sum}_{m:{z}_{m}\in r}1}\right)\) and (2) a count-based version using absolute numbers of Type II signals \(\left({\sum}_{m:{z}_{m}\in r,{{{\rm{CO}}}}_{n,m} > 0.5}1\right)\). Using synthetic datasets with known labels (20,000 PSD spectra per dataset, containing two signal types with distinct average spectra and temporally-evolving Type II signal ratios following power-law distributions; see Supplementary Note-02 and Fig. S2), we quantified classification accuracy across all candidate classifiers (BPNN, LR, GDA, SVM, GBDT). Results (Supplementary Tables S2–S6) demonstrate that removing the logarithmic transformation consistently degraded performance (reduced accuracy, F1 scores, and increased RMSE and Area across all datasets). Notably, the count-based approach failed to produce meaningful classification, confirming the importance of ratio-based evaluation in our formulation.
To assess the effectiveness of our optimization approach, we compared gradient-based optimization against a non-gradient driven optimization method, specifically a genetic algorithm (GA). Performance metrics (Supplementary Fig. S3 and Tables S7–S10) demonstrate that gradient-optimized classifiers (BPNN, LR, GDA, and SVM) consistently surpass their GA-trained counterparts in accuracy, F1 score, RMSE, and Area across all evaluated conditions. This performance advantage is particularly notable given the substantially longer training times required by the GA approach, highlighting both the efficiency and effectiveness of gradient-based optimization for this application.
Comparison with other machine learning methods
The latest unsupervised method for identifying deformation mechanisms from AE signals is ScaNet45, a deep learnable scattering network with Gaussian mixture model clustering. The most efficient supervised method for this task is the neural network with Focal loss55, which is trained using labeled data and a loss function designed to handle imbalanced data56 (denoted as IDNN). IDNN is particularly well-suited for metallic structural materials where the distribution of AE signals is highly skewed (e.g., in the dataset reported in Fig. 2a, 70.48% of pseudo-labels correspond to dislocations, leaving only 1.56% for cracks). In this study, we compare the proposed approach with three top-performing peer methods: k-means, ScaNet, and IDNN.
Using three characteristic synthetic datasets with known labels (Supplementary Note-02 and Fig. S2), we quantitatively evaluated our method against established approaches (k-means, ScaNet, and IDNN). The results (Supplementary Fig. S4 and Tables S11–S14) demonstrate that our framework consistently achieves superior performance across all evaluation metrics, regardless of temporal evolution patterns in Type II signal ratios. Notably, it simultaneously: (1) attains the highest accuracy and F1 scores, and (2) yields the lowest RMSE and Area measures. This robust performance advantage holds across all three synthetic datasets with differing trend characteristics and different regions.
We then compare the proposed approach with the above 3 best-performing existing methods using the dataset shown in Fig. 2a. In this case, the number of clusters in k-means and ScaNet is set to 2, in order to recognize the two deformation mechanisms shown in Fig. 2. Moreover, the IDNN is the same as the network used in the proposed approach, and are trained via supervised learning with pseudo-labels. The pseudo-labels are derived from previous statistical results13, and assume that all the AE signals in region 1 are from dislocation movement, whereas all the AE signals in region 3 are from cracks. The right panel of Fig. 3a and Supplementary Table S1 illustrate that the performance of the proposed approach (HMLE) shows better performance than the above 3 existing methods. Figure 3b shows that the ratio of crack signals evolving temporally. The IDNN method shows similar trend with the proposed approach, as the present IDNN method share same neural network with the present approach and using a large number of pseudo-labels according to the knowledge of previously known statistical results13. However, the performance of the proposed approach significantly surpasses that of k-means and ScaNet. Both of these methods indicate approximately 50% crack AE signals at the beginning of the tensile process, which contradicts established material science knowledge. Furthermore, the AE signals identified by ScaNet for the entire AE spectrum in Fig. 2a are further analyzed using MLE, with results shown in Supplementary Fig. S5. The MLE curves for dislocation and crack signals identified by ScaNet exhibit clear mixing behavior and do not reach a plateau. These characteristics suggest that the clustering-based methods, such as k-means and ScaNet, result in greater misclassification errors for mixed AE signals compared to the proposed approach.
Comparison with statistical theory and classic fracture theory
We compared the avalanche statistics of the classified dislocation and crack AE signals from the proposed approach with the results obtained using statistical method via pseudo-labels. As depicted in Fig. 3c, for region 3 of the stress-time curve (Fig. 2a) where the crack mechanism dominates, the MLE curve identified by the proposed approach exhibits a comparable over more than 3 decades plateau length. The energy exponent ε = 1.25 ± 0.03 from the proposed approach compared to ε = 1.31 ± 0.03 by the statistical method. For region 1 of the stress-time curve (Fig. 2a), when the dislocation mechanism dominates, the MLE curve from the proposed approach shows a comparable energy exponent by the statistical method. These results indicate that for cases where only one deformation mechanism dominates the deformation process, the AE signals do not exhibit noticeably enhanced avalanches statistics compared to the peer statistical methods. However, in regions where both mechanisms co-exist with non-negligible fractions (region 2 of the stress-time curve in Fig. 2a), the proposed approach demonstrates superior. As shown in Fig. 3d, the MLE curve from the proposed approach shows a meaningful plateau length with the energy exponent ε = 2.18 ± 0.03 for dislocation AE signals and ε = 1.48 ± 0.07 for crack AE signals. In contrast, the statistical method cannot distinguish the mixed AE signals as there is no plateau in the MLE curve.
We further compare the AE signals with classic fracture theory57, where a log-periodic correction to scaling can effectively fit the cumulative amplitude of crack AE signals. This relationship is derived from the predictions of seismic energy release through a renormalization group approach for the regional fault network with a discrete hierarchy. The relationship between cumulative amplitude of AE events and the fracture evolution can be expressed as follows:
$${{\rm{cumAmp}}}\left({{\rm{t}}}\right)=A+B{({t}_{f}-t)}^{m}\left[1+C\cos \left(2\pi \frac{\log ({t}_{f}-t)}{\log \lambda }+\psi \right)\right]$$
(5)
where A, B, and C represent constants, λ describes the wavelength of the oscillations in the log space, and ψ denotes a phase shift. This model has found applications in predicting the failure of large earthquakes and cyclic loading of thermal barrier coatings57,58. Figure 3e compares the cumulative amplitude and calculated ideal curve of crack signals, and the crack signals identified by the proposed approach align closely with the fracture theory.
link
