Study design
The Avon Longitudinal Study of Parents and Children (ALSPAC) is a longitudinal birth cohort that recruited pregnant women resident in Avon, UK with expected delivery dates between April 1991 and December 1992. The initial number of pregnancies enrolled was 14,541, with 13,988 children alive at age 1. When the oldest children were ~7 years of age, an attempt was made to bolster the initial sample with eligible cases who had not joined originally. Including such children, the total cohort size is 14,901 children who were alive at 1 year of age.
At age 18, participants were sent ‘fair processing’ materials describing ALSPAC’s intended use of their administrative records and were given clear means to consent or object via a written form. Administrative data were not extracted for participants who objected, or who were not sent fair processing materials. Further details on ALSPAC have been published elsewhere23,24. The study website contains details of all data through a fully searchable data dictionary and variable search tool: Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees (NHS Haydock REC: 10/H1010/70). Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time.
Participants
Participants who were assigned female at birth (N = 7225) and had information on exposures and outcomes were eligible for inclusion in this study (N = 2698; 37%) (Fig. 2).

Black boxes represent the sample included in the main analysis and grey boxes show the additional exclusions to establish the sample used in the additional analysis regarding oral contraceptive use.
Exposures
Participants were asked about menstruation in nine questionnaires throughout adolescence (8–17 years)25. We defined menstrual symptoms based on the measure closest in time, but no more than two years, before January of the final compulsory year of schooling (September–July, age 15/16). Therefore, menstrual data was extracted from one of four questionnaires and participants were aged 13 (N = 169), 14 (N = 501), 15 (N = 996), or 16 (N = 1032).
Heavy or prolonged bleeding
In the questionnaires, participants were asked “Have you ever had any of the following symptoms associated with your period: heavy or prolonged bleeding?” and could answer either yes or no. We derived a binary heavy or prolonged bleeding variable (‘yes’ or ‘no’).
Menstrual-related pain
Depending on which of the four questionnaires was used, participants were either asked “Have you ever had any of the following symptoms associated with your period: severe cramps?” (with response options yes or no) or “Have you ever had any of the following symptoms associated with your period: pain with your period” and “if yes, were they mild, moderate, or severe?”. We derived a binary menstrual pain variable; ‘severe cramps or moderate/severe pain’ or ‘no severe cramps or mild/no pain’25.
Outcomes
Outcome data came from linkage to the National Pupil Database (NPD) ( For each outcome, we utilised a continuous (statistical power and granularity) and a binary (recognised metrics to enhance interpretability) definition.
Absences
Absences were defined as the number of half day sessions missed (authorised, e.g. illness reported by parents, or unauthorised, e.g. term-time holidays or truancy) divided by the total number of available sessions in the final year of compulsory schooling when pupils take final exit examinations (age 15/16). Participants with implausible values for the number of available sessions (zero or more than 330, N = 54) were excluded. We also dichotomised absences based on whether participants were persistently absent (10% or more), following the definition monitored at a school level in the UK26.
Educational attainment
Educational attainment was based on GCSE (General Certificate of Secondary Education) qualifications, which are compulsory qualifications in a range of subjects usually taken at age 15/16. At the time GCSEs were completed by this cohort, they were graded A* (highest) to G (grade C reflects a standard pass), and U (unclassified). We used GCSE and equivalent total point score, which is a continuous measure (range 0–540), using up to eight highest GCSE grades, where six points are given for each grade increase (grades and score: A* = 58; A = 52; B = 46; C = 40; D = 34; E = 28; F = 22; G = 16; U = 0). Scores above 464 (eight A* grades) reflect pupils who took Advanced Subsidiary level (AS-level) exams early (advanced qualifications usually taken the year after GCSEs for those who continue in education). We also used a binary educational attainment outcome reflecting whether participants achieved five A*-C GCSEs (standard passes), including Maths and English, which was an important performance indicator and a frequently used criteria for further education27.
Confounders
We selected confounders that could plausibly cause both menstrual symptom exposures (or self-reported values) and educational outcomes, including ethnicity28, socioeconomic position28,29, childhood adversity30,31, child and maternal mental health29,32, body mass index (BMI)1,29,33, intelligence quotient (IQ)34,35, and age at menarche1,29,33,36. This is particularly challenging to determine for menstrual symptom exposures as the literature is somewhat limited; however, we have opted to be inclusive in our confounder selection as including a variable that is only related to the outcome (and not on the causal pathway between exposure and outcome, i.e., a competing exposure) would increase precision and not bias the association37. Therefore, whilst we conceptualise all variables included as confounders; we recognise that they could be competing exposures (see Supplementary Fig. 1).
We included multiple socioeconomic factors reported by mothers via questionnaires during pregnancy, including occupation (dichotomised into ‘manual’ or ‘non-manual’ based on the 1991 British Office of Population and Census Statistics classification), maternal education (‘Certificate of Secondary Education/vocational’, ‘O level’, ‘A level’, or ‘degree’), home ownership (‘owner or private renter’ or ‘renter or non-homeowner’), financial difficulties score (dichotomised to ‘any difficulties (score of 1 or above)’ or ‘no difficulties (score of 0)), and smoking pre-pregnancy (‘any smoking’ or ‘no smoking’) as confounders. Other confounders include child ethnicity (reported by mothers via questionnaire during pregnancy; ‘white’ or ‘non-white’), age at menarche (mother- or self-reported via questionnaires during puberty), body mass index (BMI (kg/m2); measured at a clinic visit) at 12.8 years, maternal-reported depression in the past 2 years (‘yes’ or ‘no’) collected via questionnaire when the participant was aged 12.1 years, internalising and externalising problems at 9.6 years (each with a score ranging from 0 to 20 and measured with the Strengths and Difficulties Questionnaire; measured at clinic visit)38, and intelligence quotient at age 8 (measured with the Weschler Intelligence Scale for Children at clinic visit)39. Adverse childhood experiences (ACEs), including parental separation, sexual abuse, and physical abuse before age 11, were either reported prospectively by their mother or retrospectively self-reported in adulthood. Multiple questionnaire items corresponded to each ACE (see Supplementary Table 8 for details). Binary variables for each ACE construct (‘any’ or ‘none’) were derived for participants who responded to at least 50% of the relevant questions for a given ACE; for participants with fewer than 50% of the relevant questions the ACE was coded as missing40.
Statistical analysis
Analyses were conducted in Stata (version 18.0)41. We present descriptive statistics (prevalence or mean) for outcomes and confounders, separately for those who did and did not experience each menstrual symptom. Multivariable linear regression models were used to estimate the association between each menstrual symptom and the continuous outcomes: school absence and GCSE score, adjusting for all confounders. We report the coefficient, 95% confidence intervals (CIs), and p values. Absences were defined as the percentage of sessions absent. This variable had a positively skewed distribution, so we performed a log-transformation, meaning the regression coefficients represent the ratio of geometric means of absences comparing those with and without the menstrual symptom. We exponentiate these ratios of geometric means, and the resultant coefficient can be interpreted as the percentage difference in school absences between the exposed and unexposed groups. When GCSE score is the dependent variable, the regression coefficients represent the difference in score between the exposed and unexposed groups. For the binary outcomes (‘persistent absence’ and ‘achieved 5A*-C GCSEs including English and Maths’), multivariable logistic regression models were conducted to estimate the association between the exposures and outcomes, adjusting for all confounders. We report the odds ratios (ORs), 95% CIs, and p values. Logistic regression was selected due to the mathematical properties of ORs compared with risk ratios (RRs; outlined in the Discussion); however, estimates cannot be interpreted as RRs as the outcomes are relatively common42.
Missing data
We did not impute exposures and outcomes due to uncertainty about the ability to predict missing values of menstrual symptoms, and high data availability in outcomes due to the use of linked data. Of 2698 participants included in our main analyses (with complete data on all exposures and outcomes; Figs. 1), 1424 (52.8%) had missing data on at least one confounder. We used multiple imputation (MI) to address missing confounder data (Table 1). We included all observed variables used in analyses, including exposures and outcome, in MI equations. Additional auxiliary variables, including maternal depression during pregnancy (measured with the Edinburgh Postnatal Depression Scale)43, BMI at age 7, and SDQ at age 11, were used to impute maternal depression, BMI, and internalising and externalising problems, respectively. Results were calculated across 60 impute datasets, guided by results of Monte Carlo error tests (which assess statistical reproducibility of the imputation44) and associations were generated by pooling across these datasets using Rubin’s rules45. The prevalence/means of each confounder before and after MI are presented in Supplementary Table 2.
Oral contraception
Oral contraceptives are often prescribed to ameliorate menstrual symptoms46,47 and could potentially impact absences or attainment indirectly through mood, physical, or cognitive side effects. It is challenging to identify if and how contraception should be accounted for in this analysis as we do not have data on the relative timing of contraceptive use and menstrual symptoms; however, it is more likely the menstrual symptom preceded contraceptive use as participants are reporting whether they have ever experienced menstrual symptoms (compared with contraception use in the last 12 months), meaning adjusting for contraceptive use would be inappropriate over-adjustment (adjusting for a variable on the causal pathway between exposure and outcome). However, contraceptive use may modify the effect of menstrual symptoms on educational outcomes, i.e., in people who report ever experiencing the symptom but whose symptoms have improved following contraception initiation, educational consequences may be diminished. Therefore, in a sensitivity analysis, we stratified the analyses by past year oral contraceptive use (‘use’ or ‘no use’), which was self-reported in the same questionnaire as the participants reported their menstrual symptoms. This was conducted in participants with complete data on contraceptive use only (Fig. 1), using a likelihood ratio test to assess statistical evidence of interaction.
Additional analyses
We conducted a series of additional analyses to explore whether our results were influenced by the specific definitions of menstrual symptoms, minimise residual confounding, and exclude unauthorised absences.
Alternative menstrual symptom exposure definitions
Using the imputed data, we conducted confounder adjusted linear regression to assess the associations between four exposures and the continuous outcomes (percentage absence and GCSE points score):
-
a.
A three-level variable based on whether participants went to the doctor for heavy or prolonged bleeding (‘heavy or prolonged bleeding and went to the doctor’, ‘heavy or prolonged bleeding but did not go to the doctor’, or ‘no heavy or prolonged bleeding’) to explore whether any effects of heavy or prolonged bleeding are more severe for participants who sought medical help.
-
b.
A three-level variable based on whether participants went to the doctor for menstrual pain (‘pain and went to the doctor’, ‘pain but did not go to the doctor’, or ‘no pain’) to explore whether any effects of menstrual pain are more severe for participants who sought medical help.
-
c.
A four-level variable to separate heavy bleeding from prolonged bleeding. Participants were asked how many days bleeding they usually have during each period and were able to report the exact number of days or, if they were unsure, select one of three categories: 3 days or less, 4–6 days, or 7 days or more. Prolonged bleeding was defined as ‘7 days or more’ if participants reported bleeding for 7 days or more in either the categorial or continuous response option and ‘less than 7 days’ if participants reported 6 days bleeding or less in either response option (no participants responded to both the categorical and continuous option). This was used alongside responses to the main heavy or prolonged bleeding variable to derive a four-level variable: ‘heavy and prolonged bleeding’ (‘yes’ to heavy or prolonged bleeding AND reporting 7 days or more bleeding), ‘heavy bleeding only’ (‘yes’ to heavy or prolonged bleeding, BUT reporting 6 or fewer days bleeding), ‘prolonged bleeding only’ (‘no’ to heavy or prolonged bleeding, BUT reporting 7 days or more bleeding), or ‘neither heavy nor prolonged’ (‘no’ to heavy or prolonged bleeding AND reporting 6 days or fewer bleeding’).
-
d.
A four-level variable to explore the effects of co-occurring menstrual symptoms: ‘heavy bleeding and pain’, ‘heavy bleeding only’, ‘pain only’, and ‘neither heavy bleeding nor pain’.
Prior educational attainment
To minimise residual confounding, we adjusted for attainment in Key Stage 1 (KS1) Standard Assessment Tests (SATs). These were completed at age 6/7 years, which is prior to menarche for all participants. We used SATs summary score, which is a continuous measure (range 0–15) where higher scores reflect higher levels of attainment, derived by summing scores obtained in reading, writing, and mathematics (each ranging from 0 to 5). Further detail can be found on the National Pupil Database (NPD) () In the complete case sample with KS1 data available (N = 1120), we conducted the main multivariable linear and logistic regression models adjusting for all confounders with and without additional adjustment for KS1 attainment.
Authorised absences
The definition of absences used in our main continuous and binary analyses include both authorised and unauthorised (including term-time holidays and truancy) absences; however, it is possible that these could be associated differently with menstrual symptoms. As a sensitivity analysis, therefore, we conducted the multivariable linear and logistic regression models for absences, including only authorised absences in the main definition (excluding truancy and term-time holiday) in the complete case sample (N = 1274).
We have followed the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines in the reporting of this study (supplement pp 13–16)48.
link

