Identification of biomarkers associated with M1 macrophages in the ST-segment elevation myocardial infarction through bioinformatics and machine learning approaches

Inhomogeneity of DEGs in STEMI samples
We confirmed the successful standardization of the GSE59867 dataset using the normalizeBetweenArrays function, as evidenced by PCA plots (Fig. 1A) and boxplots (Fig. 1B). We then used the hclust package and hierarchical clustering method to form a dendrogram (Supplementary Fig. 1). We set the fixed‒height cutoff as 0.025. In this way, GSM1620790 which above the fixed height was removed as an outlier to ensure the robustness of the subsequent filtration results. Differential gene expression analysis was performed to identify genes implicated in STEMI. The findings revealed 362 DEGs between the control and MI groups in GSE59867, comprising 293 downregulated and 69 upregulated genes (Fig. 1C). The heatmap revealed that these DEGs exhibited a clear classification effect between the control and MI groups. However, it also revealed a certain degree of heterogeneity within the STEMI samples (Fig. 1D).

PCA plot (A) and boxplots (B) of the GSE59867 dataset after standardization. Volcano plot (C) and heatmap (D) of DEGs between the control and STEMI samples in the GSE59867 dataset.
M1 macrophages are positively correlated with STEMI occurrence
To determine the associations between STEMI and immune cells, we evaluated the proportions of infiltrating immune cells in both the control and STEMI samples using the CIBERSORTx algorithm. Compared with the control group, six immune cell types were upregulated, including M1 macrophage, M2 macrophage, CD4 memory T cells, mast cells, CD8 + T cells, and activated memory T cells (P < 0.05, Fig. 2A,B). Furthermore, these immune cell types exhibited significant intercorrelations (Fig. 2C). To further investigate the specific gene types associated with STEMI patients, we conducted a WGCNA. Selecting an appropriate soft threshold before constructing a scale-free gene expression network is essential, as it influences the stability of network construction. The first soft threshold of 9, with an R2 value exceeding 0.85, was selected for conducting the WGCNA (Fig. 3A). As the soft threshold increased, the mean connectivity decreased (Fig. 5B), whereas the frequency of connectivity increased rapidly (Fig. 5C). The logarithm of the mean connectivity was linearly correlated with the logarithm of the connectivity frequency (Fig. 5D). In the module‒feature relationship network, both the blue module (with a correlation coefficient of − 0.32 and a P < 0.001) and the brown module (with a correlation coefficient of 0.4 and a P < 0.001) were significantly correlated with M1 macrophages (Fig. 3E). The genes from the brown and blue modules intersected with 362 DEGs between the STEMI patients and the control patients, and we obtained a total of 295 overlapping genes (Fig. 3F). Additionally, the constructed PPI network retained 82 genes for subsequent consensus clustering analysis (Fig. 3G).

(A) The distribution of immune cell infiltration proportions in both the control and STEMI samples. (B) Statistical differences in immune cell profiles between the control and STEMI groups. (C) Correlation heatmap of immune cell interactions.

(A) Relationship between the soft threshold and R square value. (B) Relationships between the soft threshold and mean connectivity. (C) Frequency distribution diagram of the connectivity of gene nodes. (D) Linear correlation between the logarithm of the mean connectivity and the logarithm of the connectivity frequency. Selection of a soft threshold for constructing the WGCNA network. (E) Correlation heatmap between immune cell infiltration proportions and module genes. (F) Venn diagram of genes associated with M1 macrophages and DEGs. (G) PPI network of overlapping genes between M1 macrophage-related genes and DEGs.
DEGs related to M1 macrophages can effectively distinguish STEMI patients
We conducted consensus clustering analysis to ascertain whether M1 macrophages-related DEGs can classify STEMI patients. The delta area plot demonstrated the highest rate of decrease in K values between 2 and 3 (Fig. 4A). When K was set to 2, the consensus CDF plot displayed a stable variation in CDF values within the range of 0.2 to 0.8 (Fig. 4B); the consensus matrix heatmap revealed a robust clustering effect for two distinct groups of patients (Fig. 4C). Considering all factors, K = 2 was considered the optimal number for classifying STEMI patients. Furthermore, M1 macrophage-related genes were downregulated in M1-type patients and upregulated in M2-type patients (Fig. 4D). The findings revealed that the classification of STEMI patients by M1 macrophages is highly relevance. To further delineate differences between immune subtypes, we identified differentially expressed genes (DEGs) between the two subtypes. The results revealed 200 DEGs between the two subtypes, as shown in the volcano plot and heatmap (Fig. 4D,E; FC = 1.5).

Delta area (A), CDF (B), and consensus matrix (C) plots of clustered subtypes. (D) Expression distribution of M1 macrophages in the two subtypes. (E) Volcano plot of DEGs in the two subtypes.
Cultivating feature gene selection between STEMI subtypes using machine learning algorithms
The 31 feature genes were obtained by taking the intersection of 200 DEGs between STEMI subtypes and 82 PPI hub genes (Fig. 5A). Then, machine learning algorithms were used to select the most important feature genes. First, the XGBoost algorithm was employed for analysis. Figure 5B shows the trend of the Cox negative log-likelihood value during the training process. The top 10 important features are shown in Fig. 5C,G. Second, LASSO regression analysis was performed on 31 feature genes, and the cross-validation method was used for iterative analysis. The results revealed that the model’s root mean square error was lowest when λ = 0.09 and when there were 8 variables (Fig. 5D,E,G). Third, we created a random forest model by constructing decision trees and partitioning genes using the log-rank rule. The Gini coefficient method was used to reduce the precision and mean square error. The top 15 feature genes of variable importance were output (Fig. 5F), and the 10 feature genes whose “Mean Decrease Gini” coefficient was greater than 1.6 were selected for further screening (Fig. 5G). By integrating the candidate genes selected by the above three machine learning algorithms, four genes (AKT3, GJC2, HMGCL, and RBM17) were considered feature genes for the immune subtypes (Fig. 5G,H).

(A) Overlapping genes between subtype-related DEGs and immune-related DEGs. (B) Plot of the number of iterations of the training process of the XGBoost algorithm versus the Cox negative log-likelihood value. (C) Bar chart of the top 10 genes and their corresponding importance scores screened by the XGBoost algorithm. (D) Cross-validation curve for LASSO regression analysis. (E) Path diagram of the LASSO coefficients and the 8 feature genes. (F) Bar chart of the top 15 genes and their variable importance scores screened using the random forest model. (G) Key genes obtained using different types of machine learning. (H) Venn diagram of the potential genes selected through three machine learning algorithms.
The expression and predictive performance of the 4 feature genes in GSE59867 and GSE62646
To further validate the reliability of the selected immune subtype feature genes, we assessed the expression and predictive performance of these feature genes in the GSE59867 and GSE62646 datasets. The validation dataset GSE62646 was also standardized, as shown in the boxplot (Supplementary Fig. 2). After cluster analysis, no samples were excluded from the GSE62646 dataset (Supplementary Fig. 3). As anticipated, in both the GSE59867 and GSE62646 datasets, the expression levels of AKT3, GJC2, HMGCL, and RBM17 were lower than those in the control group (P < 0.05, Fig. 6A). Furthermore, in the M1-type STEMI patient group, the expression levels of AKT3, GJC2, HMGCL, and RBM17 were lower than those in the M2-type STEMI patient group (P < 0.05, Fig. 6B). Notably, the area under the curve (AUC) values for AKT3, GJC2, HMGCL, and RBM17 were all greater than 0.5, regardless of sex, between control and STEMI patients or between M1-type and M2-type STEMI patients, indicating that these genes have strong diagnostic value (Fig. 6C,D). Finally, the diagnostic performance of AKT3, GJC2, HMGCL, and RBM17 in STEMI patients was confirmed through AUCs in the validation set GSE62646. The AUCs of AKT3, GJC2, HMGCL, and RBM17 in predicting STEMI in GSE62646 were 0.758, 0.732, 0.543 and 0.811, respectively (Fig. 6E).

(A) The expression levels of AKT3, GJC2, HMGCL, and REM17 in the STEMI and CAD groups in the GSE59867 dataset. (B) The expression levels of AKT3, GJC2, HMGCL, and REM17 in the M1 and M2 subgroups and in the CAD group in the GSE59867 dataset. (C) AUC curves of AKT3, GJC2, HMGCL, and REM17 between control and STEMI patients in the GSE59867 dataset. (D) AUC curves of AKT3, GJC2, HMGCL, and REM17 between M1-type and M2-type STEMI patients in the GSE62646 dataset. (E) AUC curves of AKT3, GJC2, HMGCL, and REM17 between control and STEMI patients in the validation dataset GSE62646.
Patient characteristics and real-time PCR verification of the key genes
The STEMI group comprised 14 Chinese men and 2 Chinese women with an average age of 62.7 ± 14.1 years. The control group comprised 6 Chinese men and 2 Chinese women with an average age of 65.5 ± 11.7 years. The baseline characteristics of the patients, including blood pressure, heart rate, history of hyperlipidaemia, hypertension, diabetes, and medication history (ACEI/ARB, β-blockers, antiplatelets, calcium antagonists, and statins), are presented in Supplementary Table 1. The CRP, cTnT, NT-proBNP, creatinine and IL-6 levels in the STEMI group were greater than those in the control group (P < 0.05, Supplementary Table 1). Real-time PCR analysis revealed differential expression of the AKT3, GJC2, HMGCL and RBM17 genes between the STEMI and control groups, with AKT3, HMGCL and RBM 17 being downregulated significantly in the STEMI group (Fig. 7, Supplementary Table 2).

Box plots showing the expression levels of AKT3 (A), GJC2 (B), HMGCL (C), and RBM17 (D) in the STEMI group compared with those in the control group (*P < 0.05).
Discussion.
In the present study, we first determined the research subject and related datasets. Using the CIBERSORTx algorithm and WGCNA, we revealed a comprehensive interaction between M1 macrophages and STEMI, consistent with many other studies. Using the ConsensusClusterPlus package, we delineated STEMI patients into two distinct molecular subtypes. To screen for important genes, a protein–protein interaction (PPI) network and Venn diagrams were utilized. The intersection of the PPI hub genes and DEGs between the STEMI subtypes constituted the main feature gene of interest. Finally, we obtained four key genes through machine learning and validation in datasets or in patients: AKT3, GJC2, HMGCL, and RBM17. Real-time PCR revealed that all four genes mentioned above were downregulated in the STEMI patients, among which AKT3, HMGCL and RBM17 were downregulated significantly (P < 0.05).
AKT, a serine‒threonine protein kinase, comprises three isoforms, namely, Akt1, Akt2, and Akt3. Among these, Akt3 is pivotal downstream of the PI3K signalling pathway, a highly conserved pathway found across eukaryotes. This pathway critically influences cardiac metabolism by fostering myocardial cell growth and survival, promoting coronary neovascularization, maintaining cardiac contractile function, and facilitating autophagy. Akt3 has been shown to specifically inhibit cholesterol ester accumulation and the formation of foam cells22, a pivotal initial event in the development of atherosclerosis. In rats, there is a notable reduction in AKT3 expression in the spinal cord following myocardial ischaemia‒reperfusion injury23. M1 macrophages are activated mainly by bacterial lipopolysaccharide (LPS) and IFNγ. Previous studies have shown that the PI3K‒Akt pathway negatively regulates LPS signalling and gene expression in monocytes/macrophages. The activation or overexpression of PI3K or Akt kinases results in reduced macrophage activation by LPS, thus restricting proinflammatory and promoting anti-inflammatory responses24. These findings are consistent with those of the present study, which revealed a decrease in AKT3 expression and an increase in M1 macrophage activity during the acute phase of MI. The activation of AKT3-related pathways may be benefit myocardial tissue recovery during acute myocardial infarction. In addition, Akt signalling serves as an integrative hub for various extracellular and intracellular signals that orchestrate macrophage biology. This includes regulating pro- and anti-inflammatory cytokine production, phagocytosis, autophagy, apoptosis, and metabolic processes24. The outcomes of this study provide compelling evidence of increased M1 macrophage infiltration and reduced AKT3 expression during the acute phase of STEMI, potentially revealing a novel avenue for the treatment of myocardial infarction.
RBM17, alternatively known as SPF45, is a protein-coding gene with significant involvement in RNA spliceosomes, intricate molecular complexes crucial for maintaining cell survival and overall cellular integrity. RBM17 encodes an RNA-binding protein that is a vital component within the splice complex, actively participating in the second catalytic step of alternative splicing25. Notably, RBM17 is frequently overexpressed in many tumours and plays a crucial role in cancer progression, while the downregulated expression of RBM17 mRNA is accompanied by the induction of cell cycle arrest and apoptosis26. A recent original article identified RBM17 as a novel response biomarker for immunotherapy in bladder cancer, as it was associated with increased activity in the cell cycle and therapeutic responses27. The targeted silencing of RBM17 impedes cell proliferation25,28. Many studies have investigated the relationship between RBM17 and cellular immunotherapy, but the interaction between RBM17 and macrophages has not yet been reported. In our study, for the first time, we found that the expression of RBM17 was downregulated during the acute phase of STEMI, which might indicate an impaired regeneration of cardiac cells and increased cardiac cell death. Therefore, it was speculated that activating RBM17 might be one of the methods used to respond to immune activation and regulate the cell cycle of myocardial cells in the acute stage of STEMI.
HMGCL, 3-hydroxy-3-methylglutaryl-CoA lyase, is a vital mitochondrial matrix protein and a key enzyme in both fatty acid and leucine metabolism. The final step in ketogenesis is catalysed by HMGCL to form the ketone body acetoacetate, which can then be converted to the other ketone bodies β-hydroxybutyrate (BHB) and acetone. A deficiency or absence of HMGCL may result in the aberrant accumulation of precursor substrates, such as 3-hydroxy-3-methylglutaric acid and 3-methylglutaric acid, and a lack of ketone bodies. This metabolic disturbance may contribute to organ damage, notably affecting critical organs such as the brain, heart, and liver, as ketone bodies impact a wide range of immune functions29,30,31. Innate immune cell-intrinsic ketogenesis is also dispensable for organismal metabolism and inflammation32. Studies have shown that HMGCL might be involved in immune escape and macrophage polarization33. Goldberg et al. reported that conditional ablation of HMGCL in neutrophils and macrophages or in bone marrow cells can affect glucose homeostasis in mice32. Accordingly, we infer that low expression of HMGCL in the myocardial infarction region is likely to affect cardiomyocyte and macrophage metabolism and death.
GJC2, spanning approximately 9.9 kb in length, comprises two exons. This gene encodes connexin 47 (Cx47), a gap junction protein comprising 439 amino acids. Connexins constitute a family of integral membrane proteins that are crucial in establishing gap junctional intercellular communication34. This communication facilitates the transfer of intercellular nutrients, ions, second messengers, and small molecules, thereby influencing various cellular processes, including cell death35. Cx47 regulates calcium signalling, the phosphorylation of ERK1/2, and the promotion of oligodendrocyte precursor cell proliferation. Interference with Cx47 through RNA silencing results in reduced calcium ion influx, diminished phosphorylation of ERK1/2, and decreased oligodendrocyte precursor cell proliferation36. Cx47 ablation induces severe inflammation; thus, a prominent immune response occurs in mice lacking Cx4737,38. In immune cells such as macrophages, the expression of GJC2/Cx47 and their intercellular communication may affect the activation state and function of macrophages. It has long been recognized that an intense oedematous reactions and inflammatory responses confined to the postischemic region appear early after STEMI. Reduced connexin expression and an aggravated immune response might be the mechanisms that cause oedematous and inflammatory reactions immediately after myocardial infarction. To date, there is no literature indicating a clear association between GJC2 and macrophage polarization. We found for the first time an association between the GJC2 gene and STEMI or M1 macrophages. In the real-time PCR analysis, the GJC 2 levels did not significantly differ between the STEMI and the control group, but they tended to decrease, which was consistent with the results of the bioinformatics analysis. Increasing the sample size may result in significant differences between the two groups.
STEMI is a life-threatening condition with a significant incidence and mortality rate. Diagnosis traditionally relies on classic symptoms of myocardial ischaemia, electrocardiographic findings, and biomarkers such as creatine kinase, troponin I, cardiac troponin T, and myoglobin. However, these traditional biomarkers lack significance in guiding STEMI treatment. Current evidence suggests that various immune cells participate in immune regulation following acute myocardial infarction, working together to clear necrotic tissue and rebuild damaged myocardium39,40. Failure to resolve immune responses adequately may contribute to extracellular matrix remodelling and interstitial myocardial fibrosis after STEMI41. Macrophages are crucial immune cells that serve as the primary responsive cells after myocardial infarction, and they regulate multiple stages following STEMI5,42. Identifying potential biomarkers associated with immune infiltration during the acute phase of STEMI, especially those related to macrophage function, holds considerable practical importance. Macrophages can differentiate from monocytes in the blood after passing through blood vessels. They can exhibit functional plasticity and polarize into two activated forms, M1 and M2, each with distinct immune functions. M1 macrophage polarization promotes myocardial cell damage after STEMI, whereas M2 macrophage polarization can inhibit myocardial cell damage, playing a critical role in the occurrence and prognosis of STEMI43. Though this view of proinflammatory M1 macrophages and M2 macrophages suppressing inflammation seems to be an oversimplification because these cells exploit very high level of plasticity in response to microenvironmental stimuli and thus represent a large scale of different immunophenotypes with overlapping functional properties44, our construction of a WGCNA network revealed that DEGs were strongly correlated with M1-type macrophages and weakly correlated with M2-type macrophages consistent with many other studies. We ultimately identified four feature genes related to M1 macrophage infiltration and helps with subtype classification when STEMI occurs. The AUCs of AKT3, GJC2, and RBM17 in predicting STEMI in GSE62646 were 0.758, 0.732, and 0.811, respectively. HMGCL did not show a diagnostic advantage in the GSE62646 dataset. However, it was also significantly downregulated in STEMI patients. The advantage of the current analysis is that the GSE59867 dataset is a human dataset, the analysis steps are detailed and logical, and it was validated using a different dataset from the GEO database and 24 Chinese human blood samples. This research provides a more comprehensive understanding of the molecular basis of STEMI pathogenesis and may facilitate the development of novel therapeutic strategies.
link