Large language models predict cognition and education close to or better than genomics or expert assessment

Chandler, D., Levitt, S. D. & List, J. A. Predicting and preventing shootings among at-risk youth. Am. Econ. Rev. 101, 288–92 (2011).

Article

Google Scholar

Berk, R., Berk, D. & Drougas. Machine learning risk assessments in criminal justice settings (Springer, 2019).

Watts, D. J. Common sense and sociological explanations. Am. J. Sociol. 120, 313–351 (2014).

Article

Google Scholar

Kleinberg, J., Ludwig, J., Mullainathan, S. & Obermeyer, Z. Prediction policy problems. Am. Econ. Rev. 105, 491–95 (2015).

Article
PubMed
PubMed Central

Google Scholar

Cranmer, S. J. & Desmarais, B. A. What can we learn from predictive modeling? Political Anal. 25, 145–166 (2017).

Article

Google Scholar

Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: Lessons from machine learning. Perspect. Psychol. Sci. 12, 1100–1122 (2017).

Article
PubMed
PubMed Central

Google Scholar

Hofman, J. M. et al. Integrating explanation and prediction in computational social science. Nature 595, 181–188 (2021).

Article
PubMed

Google Scholar

Tate, A. E. et al. Predicting mental health problems in adolescence using machine learning techniques. PLoS One 15, 1–13 (2020).

Article

Google Scholar

Salganik, M. J. et al. Measuring the predictability of life outcomes with a scientific mass collaboration. Proc. Natl Acad. Sci. 117, 8398–8403 (2020).

Article
PubMed
PubMed Central

Google Scholar

Donoho, D. 50 years of data science. J. Comput. Graph. Stat. 26, 745–766 (2017).

Article

Google Scholar

Rahal, C. et al. The rise of machine learning in the academic social sciences. Technical Report, Center for Open Science (2021).

Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).

Brown, T. et al. Language models are few-shot learners. Adv. neural Inf. Process. Syst. 33, 1877–1901 (2020).

Google Scholar

Dale, R. GPT-3: what’s it good for? Nat. Lang. Eng. 27, 113–118 (2021).

Article

Google Scholar

van Loon, A. C. Predictability hypotheses: a metatheoretical and methodological introduction. 227–244 (2023).

Yan, J. & Rahal, C. On the unknowable limits to prediction. arXiv Preprint (2024).

Savcisens, G. et al. Using sequences of life-events to predict human lives. Nat. Comput. Sci. 4, 43–56 (2023).

Lundberg, I. et al. The origins of unpredictability in life outcome prediction tasks. Proc. Natl. Acad. Sci. 121, e2322973121 (2024).

Article
PubMed
PubMed Central

Google Scholar

Fast, L. A. & Funder, D. C. Personality as manifest in word use: correlations with self-report, acquaintance report, and behavior. J. Personal. Soc. Psychol. 94, 334 (2008).

Article

Google Scholar

Rodriguez, A. J., Holleran, S. E. & Mehl, M. R. Reading between the lines: the lay assessment of subclinical depression from written self-descriptions. J. Personal. 78, 575–598 (2010).

Article

Google Scholar

Abramov, P. S. & Yampolskiy, R. V. Automatic IQ estimation using stylometric methods. In Handbook of Research on Learning in the Age of Transhumanism, 32–45 (IGI Global, 2019).

Cöltekin, C. Predicting educational achievement using linear models. Proc. GermEval 2020 Task. 1, 23–29 (2020).

Google Scholar

Jones, S. “ensure that you stand out from the crowd”: a corpus-based analysis of personal statements according to applicants’ school type. Comp. Educ. Rev. 57, 397–423 (2013).

Article

Google Scholar

Alvero, A. et al. Essay content and style are strongly related to household income and sat scores: evidence from 60,000 undergraduate applications. Sci. Adv. 7, eabi9031 (2021).

Article
PubMed
PubMed Central

Google Scholar

Van der Laan, M. J., Polley, E. C. & Hubbard, A. E. Super learner. Stat. Appl. Genet. Mol. Biol. 6, 1–123 (2007).

Polley, E. C. & Van Der Laan, M. J. Super learner in prediction, 226 (2010).

Blau, P. M. & Duncan, O. D. The American occupational structure. 19, 453–458 (1967).

Becker, G. S. Human capital; a theoretical and empirical analysis, with special reference to education (National Bureau of Economic Research; distributed by Columbia University Press, New York, 1964).

Clark, D. & Royer, H. The effect of education on adult mortality and health: evidence from Britain. Am. Econ. Rev. 103, 2087–2120 (2013).

Article
PubMed

Google Scholar

Zajacova, A. & Lawrence, E. M. The relationship between education and health: reducing disparities through a contextual approach. Annu. Rev. Public Health 39, 273–289 (2018).

Filippova, A. et al. Humans in the loop: incorporating expert and crowd-sourced knowledge for predictions using survey data. Socius 5, 2378023118820157 (2019).

Article

Google Scholar

Caplan, B. The case against education (Princeton University Press, 2019).

Baeriswyl, F., Wandeler, C. & Trautwein, U. Auf einer anderen schule oder bei einer anderen lehrkraft hätte es für’s gymnasium gereicht: Eine untersuchung zur bedeutung von schulen und lehrkräften für die übertrittsempfehlung. Z. Für. Pädagog. Psychol. 25, 39–47 (2011).

Urhahne, D. & Wijnia, L. A review on the accuracy of teacher judgments. Educ. Res. Rev. 32, 100374 (2021).

Article

Google Scholar

Zellner, M., Abbas, A. E., Budescu, D. V. & Galstyan, A. A survey of human judgement and quantitative forecasting methods. R. Soc. Open Sci. 8, 201187 (2021).

Article
PubMed
PubMed Central

Google Scholar

Dick, D. M. Gene-environment interaction in psychological traits and disorders. Annu. Rev. Clin. Psychol. 7, 383–409 (2011).

Article
PubMed
PubMed Central

Google Scholar

Plomin, R. & Daniels, D. Why are children in the same family so different from one another? Behav. Brain Sci. 10, 1–16 (1987).

Article

Google Scholar

Plomin, R. Commentary: Why are children in the same family so different? Non-shared environment three decades later. Int. J. Epidemiol. 40, 582–592 (2011).

Article
PubMed
PubMed Central

Google Scholar

Hart, S. A. Precision education initiative: Moving toward personalized education. Mind Brain Educ. 10, 209–211 (2016).

Article
PubMed
PubMed Central

Google Scholar

Plomin, R. Blueprint: how DNA makes us who we are (Mit Press, 2018).

Morris, T. T., Davies, N. M. & Smith, G. D. Can education be personalised using pupils’ genetic data? Elife 9, e49962 (2020).

Article
PubMed
PubMed Central

Google Scholar

Risi, J., Sharma, A., Shah, R., Connelly, M. & Watts, D. J. Predicting history. Nat. Hum. Behav. 3, 906–912 (2019).

Article
PubMed

Google Scholar

Power, C. & Elliott, J. Cohort profile: 1958 British birth cohort (national child development study). Int. J. Epidemiol. 35, 34–41 (2006).

Article
PubMed

Google Scholar

Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).

Article
PubMed
PubMed Central

Google Scholar

Selzam, S. et al. Predicting educational achievement from DNA. Mol. Psychiatry 22, 267–272 (2017).

Article
PubMed

Google Scholar

von Stumm, S. et al. Predicting educational achievement from genomic measures and socioeconomic status. Dev. Sci. 23, e12925 (2020).

Article

Google Scholar

Krapohl, E. et al. Multi-polygenic score approach to trait prediction. Mol. Psychiatry 23, 1368–1374 (2018).

Article
PubMed

Google Scholar

Kyle, K., Crossley, S. A. & Jarvis, S. Assessing the validity of lexical diversity indices using direct judgements. Lang. Assess. Q. 18, 154–170 (2021).

Article

Google Scholar

Kyle, K., Crossley, S. & Berger, C. The tool for the automatic analysis of lexical sophistication (TAALES): version 2.0. Behav. Res. methods 50, 1030–1046 (2018).

Article
PubMed

Google Scholar

Crossley, S. A., Kyle, K. & McNamara, D. S. Sentiment analysis and social cognition engine (SEANCE): an automatic tool for sentiment, social cognition, and social-order analysis. Behav. Res. Methods 49, 803–821 (2017).

Article
PubMed

Google Scholar

Choi, S. W. & O’Reilly, P. F. PRSice-2: polygenic risk score software for biobank-scale data. Gigascience 8, giz082 (2019).

Article
PubMed
PubMed Central

Google Scholar

Becker, J. et al. Resource profile and user guide of the polygenic index repository. Nat. Hum. Behav. 5, 1744–1758 (2021).

Article
PubMed
PubMed Central

Google Scholar

Privé, F., Arbel, J. & Vilhjálmsson, B. J. Ldpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2020).

Article
PubMed Central

Google Scholar

Plomin, R. & Von Stumm, S. Polygenic scores: prediction versus explanation. Mol. psychiatry 27, 49–52 (2022).

Article
PubMed

Google Scholar

Chen, T. et al. Xgboost: extreme gradient boosting. R. Package Version 0. 4-2 1, 1–4 (2015).

Google Scholar

Wright, M. N. & Ziegler, A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. arXiv (2015).

Bates, S., Hastie, T. & Tibshirani, R. Cross-validation: what does it estimate and how well does it do it? J. Am. Stat. Assoc. 1–12 (2023).

Turkheimer, E. Three laws of behavior genetics and what they mean. Curr. Dir. Psychol. Sci. 9, 160–164 (2000).

Article

Google Scholar

Canivez, G. L. & Watkins, M. W. Long-term stability of the wechsler intelligence scale for Children—Third edition. Psychol. Assess. 10, 285 (1998).

Article

Google Scholar

Jones, S. E. Against technology: from the Luddites to neo-Luddism (Routledge, 2013).

Feigenbaum, J. & Gross, D. P. Automation and the fate of young workers: evidence from telephone operation in the early 20th century. Technical Report, National Bureau of Economic Research (2020).

Selwyn, N. Education and technology: key issues and debates (Bloomsbury Academic, London; New York), 3rd edn. (2021).

Foltz, P. W., Yan, D. & Rupp, A. A. The past, present, and future of automated scoring. In Handbook of Automated Scoring, 1–10 (Chapman and Hall/CRC, 2020).

Wainschtein, P. et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 54, 263–273 (2022).

Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5, 153–163 (2017).

Article
PubMed

Google Scholar

Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T. & Walther, A. Predictably unequal? The effects of machine learning on credit markets. J. Financ. 77, 5–47 (2022).

Article

Google Scholar

Baker, R. S. & Hawn, A. Algorithmic bias in education. Int. J. Artif. Intell. Educ. 32, 1052–1092 (2021).

Zheng, Z. et al. Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries. Nat. Genet. 56, 767–777 (2024).