Large language models predict cognition and education close to or better than genomics or expert assessment
Chandler, D., Levitt, S. D. & List, J. A. Predicting and preventing shootings among at-risk youth. Am. Econ. Rev. 101, 288–92 (2011).
Google Scholar
Berk, R., Berk, D. & Drougas. Machine learning risk assessments in criminal justice settings (Springer, 2019).
Watts, D. J. Common sense and sociological explanations. Am. J. Sociol. 120, 313–351 (2014).
Google Scholar
Kleinberg, J., Ludwig, J., Mullainathan, S. & Obermeyer, Z. Prediction policy problems. Am. Econ. Rev. 105, 491–95 (2015).
Google Scholar
Cranmer, S. J. & Desmarais, B. A. What can we learn from predictive modeling? Political Anal. 25, 145–166 (2017).
Google Scholar
Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: Lessons from machine learning. Perspect. Psychol. Sci. 12, 1100–1122 (2017).
Google Scholar
Hofman, J. M. et al. Integrating explanation and prediction in computational social science. Nature 595, 181–188 (2021).
Google Scholar
Tate, A. E. et al. Predicting mental health problems in adolescence using machine learning techniques. PLoS One 15, 1–13 (2020).
Google Scholar
Salganik, M. J. et al. Measuring the predictability of life outcomes with a scientific mass collaboration. Proc. Natl Acad. Sci. 117, 8398–8403 (2020).
Google Scholar
Donoho, D. 50 years of data science. J. Comput. Graph. Stat. 26, 745–766 (2017).
Google Scholar
Rahal, C. et al. The rise of machine learning in the academic social sciences. Technical Report, Center for Open Science (2021).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
Brown, T. et al. Language models are few-shot learners. Adv. neural Inf. Process. Syst. 33, 1877–1901 (2020).
Dale, R. GPT-3: what’s it good for? Nat. Lang. Eng. 27, 113–118 (2021).
Google Scholar
van Loon, A. C. Predictability hypotheses: a metatheoretical and methodological introduction. 227–244 (2023).
Yan, J. & Rahal, C. On the unknowable limits to prediction. arXiv Preprint (2024).
Savcisens, G. et al. Using sequences of life-events to predict human lives. Nat. Comput. Sci. 4, 43–56 (2023).
Lundberg, I. et al. The origins of unpredictability in life outcome prediction tasks. Proc. Natl. Acad. Sci. 121, e2322973121 (2024).
Google Scholar
Fast, L. A. & Funder, D. C. Personality as manifest in word use: correlations with self-report, acquaintance report, and behavior. J. Personal. Soc. Psychol. 94, 334 (2008).
Google Scholar
Rodriguez, A. J., Holleran, S. E. & Mehl, M. R. Reading between the lines: the lay assessment of subclinical depression from written self-descriptions. J. Personal. 78, 575–598 (2010).
Google Scholar
Abramov, P. S. & Yampolskiy, R. V. Automatic IQ estimation using stylometric methods. In Handbook of Research on Learning in the Age of Transhumanism, 32–45 (IGI Global, 2019).
Cöltekin, C. Predicting educational achievement using linear models. Proc. GermEval 2020 Task. 1, 23–29 (2020).
Jones, S. “ensure that you stand out from the crowd”: a corpus-based analysis of personal statements according to applicants’ school type. Comp. Educ. Rev. 57, 397–423 (2013).
Google Scholar
Alvero, A. et al. Essay content and style are strongly related to household income and sat scores: evidence from 60,000 undergraduate applications. Sci. Adv. 7, eabi9031 (2021).
Google Scholar
Van der Laan, M. J., Polley, E. C. & Hubbard, A. E. Super learner. Stat. Appl. Genet. Mol. Biol. 6, 1–123 (2007).
Polley, E. C. & Van Der Laan, M. J. Super learner in prediction, 226 (2010).
Blau, P. M. & Duncan, O. D. The American occupational structure. 19, 453–458 (1967).
Becker, G. S. Human capital; a theoretical and empirical analysis, with special reference to education (National Bureau of Economic Research; distributed by Columbia University Press, New York, 1964).
Clark, D. & Royer, H. The effect of education on adult mortality and health: evidence from Britain. Am. Econ. Rev. 103, 2087–2120 (2013).
Google Scholar
Zajacova, A. & Lawrence, E. M. The relationship between education and health: reducing disparities through a contextual approach. Annu. Rev. Public Health 39, 273–289 (2018).
Filippova, A. et al. Humans in the loop: incorporating expert and crowd-sourced knowledge for predictions using survey data. Socius 5, 2378023118820157 (2019).
Google Scholar
Caplan, B. The case against education (Princeton University Press, 2019).
Baeriswyl, F., Wandeler, C. & Trautwein, U. Auf einer anderen schule oder bei einer anderen lehrkraft hätte es für’s gymnasium gereicht: Eine untersuchung zur bedeutung von schulen und lehrkräften für die übertrittsempfehlung. Z. Für. Pädagog. Psychol. 25, 39–47 (2011).
Urhahne, D. & Wijnia, L. A review on the accuracy of teacher judgments. Educ. Res. Rev. 32, 100374 (2021).
Google Scholar
Zellner, M., Abbas, A. E., Budescu, D. V. & Galstyan, A. A survey of human judgement and quantitative forecasting methods. R. Soc. Open Sci. 8, 201187 (2021).
Google Scholar
Dick, D. M. Gene-environment interaction in psychological traits and disorders. Annu. Rev. Clin. Psychol. 7, 383–409 (2011).
Google Scholar
Plomin, R. & Daniels, D. Why are children in the same family so different from one another? Behav. Brain Sci. 10, 1–16 (1987).
Google Scholar
Plomin, R. Commentary: Why are children in the same family so different? Non-shared environment three decades later. Int. J. Epidemiol. 40, 582–592 (2011).
Google Scholar
Hart, S. A. Precision education initiative: Moving toward personalized education. Mind Brain Educ. 10, 209–211 (2016).
Google Scholar
Plomin, R. Blueprint: how DNA makes us who we are (Mit Press, 2018).
Morris, T. T., Davies, N. M. & Smith, G. D. Can education be personalised using pupils’ genetic data? Elife 9, e49962 (2020).
Google Scholar
Risi, J., Sharma, A., Shah, R., Connelly, M. & Watts, D. J. Predicting history. Nat. Hum. Behav. 3, 906–912 (2019).
Google Scholar
Power, C. & Elliott, J. Cohort profile: 1958 British birth cohort (national child development study). Int. J. Epidemiol. 35, 34–41 (2006).
Google Scholar
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).
Google Scholar
Selzam, S. et al. Predicting educational achievement from DNA. Mol. Psychiatry 22, 267–272 (2017).
Google Scholar
von Stumm, S. et al. Predicting educational achievement from genomic measures and socioeconomic status. Dev. Sci. 23, e12925 (2020).
Google Scholar
Krapohl, E. et al. Multi-polygenic score approach to trait prediction. Mol. Psychiatry 23, 1368–1374 (2018).
Google Scholar
Kyle, K., Crossley, S. A. & Jarvis, S. Assessing the validity of lexical diversity indices using direct judgements. Lang. Assess. Q. 18, 154–170 (2021).
Google Scholar
Kyle, K., Crossley, S. & Berger, C. The tool for the automatic analysis of lexical sophistication (TAALES): version 2.0. Behav. Res. methods 50, 1030–1046 (2018).
Google Scholar
Crossley, S. A., Kyle, K. & McNamara, D. S. Sentiment analysis and social cognition engine (SEANCE): an automatic tool for sentiment, social cognition, and social-order analysis. Behav. Res. Methods 49, 803–821 (2017).
Google Scholar
Choi, S. W. & O’Reilly, P. F. PRSice-2: polygenic risk score software for biobank-scale data. Gigascience 8, giz082 (2019).
Google Scholar
Becker, J. et al. Resource profile and user guide of the polygenic index repository. Nat. Hum. Behav. 5, 1744–1758 (2021).
Google Scholar
Privé, F., Arbel, J. & Vilhjálmsson, B. J. Ldpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2020).
Google Scholar
Plomin, R. & Von Stumm, S. Polygenic scores: prediction versus explanation. Mol. psychiatry 27, 49–52 (2022).
Google Scholar
Chen, T. et al. Xgboost: extreme gradient boosting. R. Package Version 0. 4-2 1, 1–4 (2015).
Wright, M. N. & Ziegler, A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. arXiv (2015).
Bates, S., Hastie, T. & Tibshirani, R. Cross-validation: what does it estimate and how well does it do it? J. Am. Stat. Assoc. 1–12 (2023).
Turkheimer, E. Three laws of behavior genetics and what they mean. Curr. Dir. Psychol. Sci. 9, 160–164 (2000).
Google Scholar
Canivez, G. L. & Watkins, M. W. Long-term stability of the wechsler intelligence scale for Children—Third edition. Psychol. Assess. 10, 285 (1998).
Google Scholar
Jones, S. E. Against technology: from the Luddites to neo-Luddism (Routledge, 2013).
Feigenbaum, J. & Gross, D. P. Automation and the fate of young workers: evidence from telephone operation in the early 20th century. Technical Report, National Bureau of Economic Research (2020).
Selwyn, N. Education and technology: key issues and debates (Bloomsbury Academic, London; New York), 3rd edn. (2021).
Foltz, P. W., Yan, D. & Rupp, A. A. The past, present, and future of automated scoring. In Handbook of Automated Scoring, 1–10 (Chapman and Hall/CRC, 2020).
Wainschtein, P. et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 54, 263–273 (2022).
Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5, 153–163 (2017).
Google Scholar
Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T. & Walther, A. Predictably unequal? The effects of machine learning on credit markets. J. Financ. 77, 5–47 (2022).
Google Scholar
Baker, R. S. & Hawn, A. Algorithmic bias in education. Int. J. Artif. Intell. Educ. 32, 1052–1092 (2021).
Zheng, Z. et al. Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries. Nat. Genet. 56, 767–777 (2024).
link
