More schooling is associated with lower hemoglobin A1c at the high-risk tail of the distribution: an unconditional quantile regression analysis | BMC Public Health

Data and analytic sample
Data came from the US Health and Retirement Study (HRS), a national longitudinal sample of non-institutionalized adults 50 years and older, and their spouses of any age, that began in 1992. [24] New cohorts of participants have been added every six years after 1998 to maintain a steady state population, and participants are surveyed biennially. A diabetes substudy collected HbA1c for a subset of HRS participants in 2003; in addition, HbA1c and other biomarker data was collected in 2006 for a randomly selected half of the sample, and in 2008 for the other half; biomarker data were subsequently collected every four years.
The eligible sample included all HRS participants with at least one HbA1c measurement between 2003 and 2006 when they were 50 years or older (N = 21,840). Individuals were excluded for missing exposure (N = 89) and covariate data (N = 18); one participant was removed due to their HbA1c measurement being recorded prior to their first HRS interview, resulting in an analytic sample of 21,732 participants (99% of eligible).
Exposure
Our exposure, educational attainment, was created using self-reported total years of schooling. Education in HRS ranges from 0 to 17 years of schooling, where 17 years includes those with 17 or more years of education (17 or more years: N = 2,327). Due to data sparseness of participants with fewer than 5 years of education, we coded those with fewer than 5 years of education to 5 years to reduce the impact these outliers may have on estimates (N = 776). To assess possible heterogeneities between participants with different types of credentials, and since educational policies tend to target specific levels of education (e.g. compulsory schooling laws and child labor laws targeted K-12 education, while other policies only addressed college education), education was stratified into two levels: fewer than 12 years of education (N = 4,801; 22%) and 12 or more years of education (N = 16,931; 78%), where completing 12 years of education typically corresponds to earning a high school diploma. Education was modeled linearly within these two educational strata.
Outcome
Our outcome was the participant’s first recorded HbA1c value (2003–2016) measured at or after age 50. HbA1c is glycosylated hemoglobin and reflects blood glucose over the prior 2–3 months; HbA1c values between 5.71% and 6.49% are consistent with pre-diabetes and values greater than 6.5% are consistent with diabetes [25, 26, 27]. HbA1c was measured using automated ion-exchange high-performance liquid chromatography that recorded the percentage of glycosylated hemoglobin in dried blood spot samples [24].
Covariates
All models were adjusted for sex (female; male), race (Non-Hispanic White; Non-Hispanic Black; Latinx/Hispanic; other), birthplace (non-Southern US; Southern US; foreign), indicator for birth year (1905–1966), indicator for year of HbA1c measurement (2003–2016), mother’s education (5–17(+) years, linear), father’s education (5–17(+) years, linear), as well as missing indicators for mother’s education and father’s education. Sex was included as an indicator of the socially stratifying effects of gender [28], race as an indicator of the socially stratifying effects of systemic racism [29–30], and parent’s education as a proxy for childhood socioeconomic status. Chi-square tests were used to evaluate if there were significant differences in categorical covariates by education level (e.g., less than 12 years of education versus 12 or more years of education). See supplemental Table S1 for additional details on covariates.
Race was categorized to include an “other” category in all models for precision, but results were not reported for this group due to the heterogeneous composition and consequent lack of interpretability of estimates.
Birthplace was classified by location within the US (i.e., non-Southern vs. Southern) because studies have found increased risk for adverse later-life health outcomes for those born in the Southern US [31, 32, 33, 34]. A subset of participants (N = 414, 2%) were known to be born in the US, but were missing information on the region of birth; participants where the region of birth was unknown were assumed to be born in the non-Southern US.
Birth year was modeled as an indicator variable to capture differences by individual year. Due to a small number of participants falling in the tail ends of the birth year range, values were recoded to facilitate model convergence: those born before 1917 (N = 240, 1%) were recoded as 1917; those born in 1966 (N = 52, 0.2%) were recoded as 1965.
Parental education variables (5–17(+) years) were used as a proxy for family socioeconomic status (SES) and modeled continuously. However, HRS participants in the Asset and Health Dynamics Among the Oldest Old (AHEAD) cohort (born 1900–1923) recorded parent’s education as a dichotomized measure (less than 8 years of education, 8 or more years of education) rather than continuous. Dichotomized measures of parent’s education were replaced with continuous values from a previously validated imputation method using measures of childhood socioeconomic status. [35] Additional missingness in parental education was imputed using the sample mean (mother’s education: NMissing = 2,101 (10%), sample mean = 10 (SD 3.9); father’s education: NMissing = 3,644 (17%), sample mean = 10 (SD 4.2)) and a missing indicator was added for proper model adjustment. This allowed for retention of participants with missing parental education where missingness is informative (e.g., if the parent was not in the household). [36].
Statistical analysis
We used linear regressions and unconditional quantile regressions (UQR) to model the relationship between education and HbA1c [37, 38, 39]. UQR evaluates changes in quantiles of the outcome’s unconditional distribution for a one-unit change in the mean of the exposure. We estimated parameters of the linear regression model using ordinary least squares (OLS). We fit UQR models at each unit quantile between the 1st-99th quantiles of the unconditional HbA1c distribution. We used bootstrapping (500 repetitions) to estimate 95% confidence intervals (CIs) for parameters of the linear regression and UQR models; education was modeled as a linear term within both education strata and all models were adjusted for the covariates specified in the preceding section.
To visualize the change in the sample distribution of HbA1c implied by UQR results, we created plots to show the counterfactual HbA1c distribution for a one-year increase in the sample’s mean education. First, we binned the factual, or observed, data into quantiles (1st-99th). We then added the UQR estimation of the association between education and HbA1c to the observed data by quantile, creating a potential counterfactual distribution. Finally, we plotted the observed and counterfactual distributions. Further details about constructing these datasets and plots are provided elsewhere. [39].
Sensitivity analysis
We conducted three additional analyses to see if results were robust to different analytic decisions. First, to determine if results were sensitive to how the exposure was operationalized (i.e., coded for analysis), we re-coded education as a three category variable (< 12 years, 12–15 years, and 16 or more years of education) rather than two. Second, given that the HRS collects data from numerous birth cohorts (from 1905 to 1966) and given that educational attainment has tended to increase over time, we conducted analyses stratified by HRS entry cohort (which are defined by the HRS using participant’s birth year), to test the sensitivity of the association to secular trends in education. Finally, while medication could be an important contributing factor in the education-HbA1c relationship, medication usage is downstream from education and is therefore a potential mediator of the education-HbA1c relationship. Adjusting for mediators can bias estimates, [40] so the main analyses did not include adjustment for medication.
Software and code
All data cleaning and analysis was performed in R. [41] We used the dineq package for fitting UQRs. All code was reviewed by the second author as is recommended practice [42] and can be found on GitHub.
link