Artificial intelligence in polycystic ovary syndrome: a systematic review of diagnostic and predictive applications
Study selection
From 662 records initially identified (SCOPUS: 408, Web of Science: 153, PubMed: 101), 245 unique studies remained after removing duplicates and conference abstracts. Following title and abstract screening, 116 studies underwent full-text review. Ultimately, 80 studies met the inclusion criteria. The selection process is illustrated in the PRISMA flow diagram (Fig. 1).
Study characteristics
The 80 included studies spanned diverse geographical regions, with the largest contributions from India (n = 19), China (n = 15), and the United States (n = 12). Study designs comprised retrospective cohorts (n = 25), cross-sectional studies (n = 18), case–control studies (n = 14), and randomized controlled trials (n = 9). Sample sizes varied considerably, ranging from fewer than 50 participants to more than 30,000 records. The geographic and study design distributions are summarized in Fig. 2.

Geographic and study design distribution of included studies on AI applications in PCOS
Figure 2 shows the geographical distribution and study designs of the included studies, indicating that the most research activity in the field of AI applications in PCOS is from countries such as India, China and the United States. Also, the dispersion of study design types includes retrospective cohorts, cross-sectional studies, case-control studies and randomized controlled trials. The observed differences in geographical distribution and study design could be due to several factors such as access to data sources, research infrastructure, national research priorities and health policies. This diversity clearly demonstrates the importance of multicenter collaborations and standardization of methods to strengthen the validity and generalizability of AI research results in PCOS.
Overview of AI applications
This section provides a comprehensive summary of the reviewed studies, reflecting the broad scope of research conducted on the application of artificial intelligence (AI) in polycystic ovary syndrome (PCOS). Key information from each study, including the type of data used, sample size, subject area (imaging, clinical data/electronic health records, and molecular/biochemical data), the main AI algorithms employed, and the performance measures reported, are presented in an organized and categorized manner. This structure allows the reader to have an overview of the trends, diversity of methods, and quality of data in this area, and provides a prelude to more detailed analyses in the tables and sections that follow.
To provide a comprehensive overview of the different areas of application of artificial intelligence (AI) in polycystic ovary syndrome (PCOS), all eligible studies for this systematic review are summarized and summarized in Table 1. The table displays key information for each study, including the year of publication, country of study, subject area (imaging, clinical data and electronic health records, or biomolecular data), volume and nature of data used, and the main AI methods used.
This overview allows for a quick and transparent comparison of the different approaches, sample sizes, and analytical methods, and helps identify the dominant research trends in each subfield. This comprehensive review also provides a basis for more specialized analysis in subsequent tables, which are dedicated to each area in detail and describe the performance outcomes and key achievements of each category. Such an organization of information provides a coherent framework for understanding the progress, strengths, and limitations of current research in artificial intelligence for PCOS. Table 1 provides a snapshot overview of the 80 studies included in this review, summarizing their geographic origin, study domain, dataset size, and main AI approaches. This snapshot highlights the diversity of methods applied—from machine learning on small clinical cohorts to deep learning on large imaging datasets—and illustrates how research activity has been distributed across countries and methodological domains. To improve clarity, studies are thematically grouped into three domains:
-
Table 2. Imaging-based AI studies (ultrasound, MRI, segmentation, follicle counting).
-
Table 3. Clinical and EHR-based AI studies (structured health records, survey data, multimodal features).
-
Table 4. –Omics and biomarker-based AI studies (genomic, transcriptomic, proteomic, metabolomic, lipidomic datasets).
The detailed extraction of article information based on AI algorithms and methodological features, previously included as Table 1, has been moved to the Supplementary section (Supplementary Table S3).
Imaging-based AI studies
Studies related to the application of artificial intelligence in medical imaging of polycystic ovary syndrome (PCOS) constitute an important part of the research in this field. These studies have enabled more accurate diagnosis and quantification of key features of the disease by analyzing images from ultrasound, MRI, and other modalities. Table 2 provides a summary of the main characteristics and methods used in these imaging studies to clearly visualize the research trends and technologies used.
Table 2 summarizes studies that applied AI to clinical and electronic health record (EHR) data. These works often used structured features such as demographics, hormone levels, and metabolic indicators, with ensemble learning and traditional machine learning methods (e.g., Random Forest, SVM) being the most common. While many achieved high accuracy, most relied on regional or Kaggle-based datasets, underscoring the need for larger, multi-center validation.
The pie chart below (Fig. 3) shows the distribution of medical imaging modalities used in AI studies related to the diagnosis and analysis of polycystic ovary syndrome (PCOS). As can be seen, the use of ultrasound images is the most frequent, with a share of approximately 67%, reflecting the importance and widespread use of this non-invasive imaging modality in the diagnosis of PCOS. This is followed by MRI images, with a share of approximately 19%, playing an important role in advanced image analysis and radiomics. Also, studies focused on ultrasound image segmentation account for about 11% of the total studies, indicating a focus on more accurate feature extraction from images.

Distribution of medical imaging types used in AI studies for PCOS diagnosis
In addition, the more limited use of scleral images and specific follicle detection techniques, with shares of less than 4%, indicate the wide variety of imaging data that, despite the limited number of studies, are recognized as emerging and potential areas for AI research in PCOS.
Clinical and EHR-based AI studies
To shed light on the role of AI in the analysis of clinical data related to polycystic ovary syndrome (PCOS), a series of selected studies that used structured clinical data, including biochemical tests, clinical indices, and electronic health record (EHR) data, were reviewed. These studies have provided diverse solutions for the diagnosis, prediction, and management of PCOS using various machine learning algorithms. Table 3 presents key characteristics of these studies, including sample size, data type, AI models used, and key performance results, to provide a comprehensive picture of research trends and challenges in the field of clinical data.
Table 3 presents studies that focused on –omics and biomarker discovery in PCOS, spanning genomics, transcriptomics, proteomics, metabolomics, and lipidomics. These investigations applied machine learning to uncover novel molecular signatures and pathways, identifying candidate biomarkers such as HDDC3, SDC2, MAP1LC3A, and OVGP1. While these findings shed light on disease mechanisms and potential diagnostic tools, most studies were limited by small sample sizes and require experimental validation.
Omics and biomarker studies
AI studies in the field of biomarker and omics data analysis in polycystic ovary syndrome (PCOS) constitute an important part of molecular research. These studies have contributed to the identification of novel biomarkers and more accurate prediction models by analyzing genomic, transcriptomic, proteomic, metabolomic, and lipidomic data. Table 4 provides a summary of the main features, methods used, and key findings related to these studies.
Table 4 highlights imaging-based AI studies, where deep learning—particularly convolutional neural networks (CNNs)—was applied to ultrasound and MRI data for automated diagnosis and follicle segmentation. These models frequently reported very high accuracies, in some cases exceeding 98%, and demonstrated the potential to reduce variability in image interpretation. However, many relied on small or single-center datasets, emphasizing the importance of external validation before clinical adoption.
AI models and methodologies
A diverse range of AI algorithms was utilized in the included studies (Fig. 4). The most commonly applied models were:
-
Supervised machine learning: Support Vector Machines (SVM) (n = 20), Random Forest (n = 18), Decision Trees (n = 15), and Logistic Regression (n = 14).
-
Deep learning: Convolutional Neural Networks (CNN) (n = 10) and Recurrent Neural Networks (RNN) (n = 6).
-
Unsupervised learning: K-Means Clustering (n = 5) and Principal Component Analysis (PCA) (n = 4).
-
Explainable AI (XAI): SHAP (Shapley Additive Explanations) (n = 8), LIME (Local Interpretable Model-Agnostic Explanations) (n = 6), and Feature Importance Ranking (n = 10).

AI Model usage in PCOS studies
Studies applied AI models to different aspects of PCOS, including diagnosis (n = 40), phenotype classification (n = 18), disease risk prediction (n = 12), and treatment response modeling (n = 11). Most studies utilized electronic health records (n = 25), imaging datasets (n = 20), genetic profiles (n = 15), and biochemical markers (n = 21) as primary data sources [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16].
Performance metrics
AI models demonstrated varying performance across studies. The median reported accuracy for supervised machine learning models ranged from 78% to 95%, with deep learning models often achieving higher accuracy. Specific performance metrics included:
-
Diagnostic accuracy: CNN-based models reported accuracy up to 98.2%, outperforming traditional logistic regression models (range: 75%-88%).
-
Sensitivity and specificity: The highest sensitivity (94%) was reported in studies using ensemble learning techniques, while specificity values ranged between 80%-97%.
-
AUC-ROC values: Studies evaluating AI models for PCOS detection reported AUC-ROC values between 0.78 and 0.99, demonstrating high discriminative power.

Correlation between AI performance metrics
Figure 5 Correlations between AI Performance Metrics provides an overview of the relationships between key metrics for evaluating the performance of AI models, including Accuracy, Sensitivity, Specificity, and AUC-ROC. The accuracy of these models is close to 1.00, indicating their high accuracy in classifying positive and negative cases and emphasizing their reliability in practical applications. Sensitivity, whose values are close to 1.00, indicates a high ability of the models to identify true positives; this feature is very important in areas such as medical diagnosis. Also, a feature score close to 1.00 means a balanced performance in distinguishing between two classes and reducing false positive and false negative errors. AUC-ROC values close to 1.00 strengthen the model’s ability to distinguish between classes at different thresholds. Overall, the high correlation of these metrics indicates their close relevance in evaluating AI models, suggesting that improvements in one metric can lead to improvements in other metrics, thus providing a valid framework for evaluating AI systems.
The analysis presented in Fig. 6, titled “Trends in AI Model Performance for PCOS Diagnosis”, shows significant variations in the diagnostic capabilities of different AI models. Among them, convolutional neural networks (CNN) exhibited the best performance metrics, achieving approximately 95% accuracy along with significant sensitivity and specificity. In comparison, Random Forest models showed stronger performance, but slightly lower than CNN. Support vector machines (SVM) and decision trees also showed moderate effectiveness, similar to the results of traditional logistic regression models. These results emphasize the importance of selecting appropriate AI methods for PCOS diagnosis and indicate the potential of CNN as a valuable tool in clinical settings to improve early diagnosis and patient outcomes.

Trends in AI model performance for PCOS diagnosis
Explainable AI (XAI) in PCOS research
A persistent challenge in applying artificial intelligence (AI) to medicine is the “black-box” nature of high-performing models, where even clinicians cannot easily understand how predictions are generated. This limitation was also evident in PCOS research. Out of the 80 studies included in this review, only about one quarter (n ≈ 20, 25%) incorporated explainable AI (XAI) methods, while the majority (75%) did not—an imbalance clearly illustrated in Fig. 7. For clinicians, this lack of transparency creates a dilemma: they are asked to trust algorithmic predictions without a clear rationale that can be validated against established medical knowledge.

Adoption of explainable AI (XAI) methods in PCOS AI research (n = 80). Only about one quarter of studies (25%, n ≈ 20) applied XAI methods such as SHAP, LIME, or Grad-CAM, while the majority (75%, n ≈ 60) did not incorporate interpretability. This underutilization highlights a major barrier to clinical adoption, as shown in the distribution of studies.
Where XAI was applied, the most common tool was SHAP (Shapley Additive Explanations), particularly in clinical and EHR-based studies, which consistently identified biologically plausible predictors such as BMI, AMH, testosterone, follicle count, and insulin resistance. In imaging-based studies, Grad-CAM and saliency maps were used to overlay heatmaps on ultrasound or MRI scans, reassuring radiologists that CNNs were focusing on relevant ovarian structures rather than irrelevant features. A smaller number of studies experimented with LIME, DALEX, partial dependence plots (PDPs), or QLattice, offering additional insight into feature interactions, though adoption was inconsistent. Notably, omics-focused studies, despite proposing novel molecular biomarkers, almost never applied systematic XAI—limiting both reproducibility and clinician confidence.
Taken together, the underutilization of XAI remains a major barrier to translation: while many models reported accuracies above 95% and AUCs exceeding 0.90, such performance metrics are not sufficient on their own for clinical acceptance. Embedding explainability into model design, rather than treating it as an afterthought, will be essential to build trust, facilitate regulatory approval, and ensure that AI systems for PCOS are not only accurate but also transparent, reliable, and clinically actionable.
Overview of AI applications in PCOS
In studies related to the use of artificial intelligence (AI) in polycystic ovary syndrome (PCOS), the diversity of research methods has been categorized into six thematic categories. First, machine vision-based models have been used for image recognition and classification, including ovarian follicle segmentation and cyst identification in ultrasound, radiology, and pathology images, which have significantly improved the accuracy and reproducibility of diagnosis, and some models based on convolutional neural networks (CNN) have provided an accuracy of more than 95%. Second, machine learning methods have been used to discover new genetic markers and biomarkers and identify genetic risk factors in PCOS by analyzing hormonal patterns, metabolic profiles, inflammatory markers, and gene expression data, which has led to improved phenotypic classification and personalized treatment. Third, supervised and unsupervised models are used for early diagnosis, risk assessment, and disease progression prediction, which are useful for preventive interventions and active patient management. Fourth, artificial intelligence has played a role in supporting clinical decisions and patient management through its use in electronic health records (EHR), facilitating diagnosis, optimizing treatment plans, and monitoring treatment responses. Fifth, despite very high performance, the difficulty of interpreting AI models has prevented their widespread adoption in the clinic, but some studies have used explainable artificial intelligence (XAI) tools such as SHAP and LIME, which have improved transparency, physician trust, and compliance with ethical standards. Finally, emerging studies have explored the application of generative AI and large language models (LLMs) such as ChatGPT for patient consultation, clinical education, and automated reporting, showing potential in medical text summarization, differential diagnosis, and treatment planning, although concerns remain about the accuracy and reliability of these technologies [10, 24, 50, 65, 69, 77, 87, 98, 106].
Risk of bias and quality assessment
Although many of the included studies reported strikingly high accuracies—often well above 90%—our quality assessment showed that much of this apparent strength rests on fragile ground. The main weaknesses came from patient selection and applicability: most investigations drew on very small hospital samples or repeatedly used the same Kaggle dataset, making it hard to know whether the results apply to women with PCOS in different clinical or cultural settings. The AI models themselves were usually described in detail and applied correctly, but the reference standards against which they were tested were often uncertain or inconsistent. This was especially true for –omics and biomarker studies, where no universal “gold standard” exists, leaving the models to be judged against shifting definitions. Another recurring issue was that flow and timing—how patients were recruited, how data were split, and whether the models were externally validated—were not always clear or adequately reported. Only a handful of larger EMR or imaging studies managed to overcome these limitations by using broad, multi-thousand sample sets and external validation cohorts, which gave more confidence in their findings. The one review-type study assessed with ROBIS also scored poorly, reflecting non-systematic selection and narrative reporting rather than rigorous synthesis. Taken together, these findings remind us that while the performance metrics are encouraging, the evidence base remains fragile, prone to bias, and at risk of overfitting. The real challenge now is to move beyond bright but brittle results and invest in larger, multicenter, and more transparent designs that can carry AI for PCOS into real clinical practice.
link
