Large language models predict cognition and education close to or better than genomics or expert assessment

Large language models predict cognition and education close to or better than genomics or expert assessment
  • Chandler, D., Levitt, S. D. & List, J. A. Predicting and preventing shootings among at-risk youth. Am. Econ. Rev. 101, 288–92 (2011).

    Article 

    Google Scholar 

  • Berk, R., Berk, D. & Drougas. Machine learning risk assessments in criminal justice settings (Springer, 2019).

  • Watts, D. J. Common sense and sociological explanations. Am. J. Sociol. 120, 313–351 (2014).

    Article 

    Google Scholar 

  • Kleinberg, J., Ludwig, J., Mullainathan, S. & Obermeyer, Z. Prediction policy problems. Am. Econ. Rev. 105, 491–95 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Cranmer, S. J. & Desmarais, B. A. What can we learn from predictive modeling? Political Anal. 25, 145–166 (2017).

    Article 

    Google Scholar 

  • Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: Lessons from machine learning. Perspect. Psychol. Sci. 12, 1100–1122 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hofman, J. M. et al. Integrating explanation and prediction in computational social science. Nature 595, 181–188 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Tate, A. E. et al. Predicting mental health problems in adolescence using machine learning techniques. PLoS One 15, 1–13 (2020).

    Article 

    Google Scholar 

  • Salganik, M. J. et al. Measuring the predictability of life outcomes with a scientific mass collaboration. Proc. Natl Acad. Sci. 117, 8398–8403 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Donoho, D. 50 years of data science. J. Comput. Graph. Stat. 26, 745–766 (2017).

    Article 

    Google Scholar 

  • Rahal, C. et al. The rise of machine learning in the academic social sciences. Technical Report, Center for Open Science (2021).

  • Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).

  • Brown, T. et al. Language models are few-shot learners. Adv. neural Inf. Process. Syst. 33, 1877–1901 (2020).

    Google Scholar 

  • Dale, R. GPT-3: what’s it good for? Nat. Lang. Eng. 27, 113–118 (2021).

    Article 

    Google Scholar 

  • van Loon, A. C. Predictability hypotheses: a metatheoretical and methodological introduction. 227–244 (2023).

  • Yan, J. & Rahal, C. On the unknowable limits to prediction. arXiv Preprint (2024).

  • Savcisens, G. et al. Using sequences of life-events to predict human lives. Nat. Comput. Sci. 4, 43–56 (2023).

  • Lundberg, I. et al. The origins of unpredictability in life outcome prediction tasks. Proc. Natl. Acad. Sci. 121, e2322973121 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Fast, L. A. & Funder, D. C. Personality as manifest in word use: correlations with self-report, acquaintance report, and behavior. J. Personal. Soc. Psychol. 94, 334 (2008).

    Article 

    Google Scholar 

  • Rodriguez, A. J., Holleran, S. E. & Mehl, M. R. Reading between the lines: the lay assessment of subclinical depression from written self-descriptions. J. Personal. 78, 575–598 (2010).

    Article 

    Google Scholar 

  • Abramov, P. S. & Yampolskiy, R. V. Automatic IQ estimation using stylometric methods. In Handbook of Research on Learning in the Age of Transhumanism, 32–45 (IGI Global, 2019).

  • Cöltekin, C. Predicting educational achievement using linear models. Proc. GermEval 2020 Task. 1, 23–29 (2020).

    Google Scholar 

  • Jones, S. “ensure that you stand out from the crowd”: a corpus-based analysis of personal statements according to applicants’ school type. Comp. Educ. Rev. 57, 397–423 (2013).

    Article 

    Google Scholar 

  • Alvero, A. et al. Essay content and style are strongly related to household income and sat scores: evidence from 60,000 undergraduate applications. Sci. Adv. 7, eabi9031 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Van der Laan, M. J., Polley, E. C. & Hubbard, A. E. Super learner. Stat. Appl. Genet. Mol. Biol. 6, 1–123 (2007).

  • Polley, E. C. & Van Der Laan, M. J. Super learner in prediction, 226 (2010).

  • Blau, P. M. & Duncan, O. D. The American occupational structure. 19, 453–458 (1967).

  • Becker, G. S. Human capital; a theoretical and empirical analysis, with special reference to education (National Bureau of Economic Research; distributed by Columbia University Press, New York, 1964).

  • Clark, D. & Royer, H. The effect of education on adult mortality and health: evidence from Britain. Am. Econ. Rev. 103, 2087–2120 (2013).

    Article 
    PubMed 

    Google Scholar 

  • Zajacova, A. & Lawrence, E. M. The relationship between education and health: reducing disparities through a contextual approach. Annu. Rev. Public Health 39, 273–289 (2018).

  • Filippova, A. et al. Humans in the loop: incorporating expert and crowd-sourced knowledge for predictions using survey data. Socius 5, 2378023118820157 (2019).

    Article 

    Google Scholar 

  • Caplan, B. The case against education (Princeton University Press, 2019).

  • Baeriswyl, F., Wandeler, C. & Trautwein, U. Auf einer anderen schule oder bei einer anderen lehrkraft hätte es für’s gymnasium gereicht: Eine untersuchung zur bedeutung von schulen und lehrkräften für die übertrittsempfehlung. Z. Für. Pädagog. Psychol. 25, 39–47 (2011).

  • Urhahne, D. & Wijnia, L. A review on the accuracy of teacher judgments. Educ. Res. Rev. 32, 100374 (2021).

    Article 

    Google Scholar 

  • Zellner, M., Abbas, A. E., Budescu, D. V. & Galstyan, A. A survey of human judgement and quantitative forecasting methods. R. Soc. Open Sci. 8, 201187 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Dick, D. M. Gene-environment interaction in psychological traits and disorders. Annu. Rev. Clin. Psychol. 7, 383–409 (2011).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Plomin, R. & Daniels, D. Why are children in the same family so different from one another? Behav. Brain Sci. 10, 1–16 (1987).

    Article 

    Google Scholar 

  • Plomin, R. Commentary: Why are children in the same family so different? Non-shared environment three decades later. Int. J. Epidemiol. 40, 582–592 (2011).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hart, S. A. Precision education initiative: Moving toward personalized education. Mind Brain Educ. 10, 209–211 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Plomin, R. Blueprint: how DNA makes us who we are (Mit Press, 2018).

  • Morris, T. T., Davies, N. M. & Smith, G. D. Can education be personalised using pupils’ genetic data? Elife 9, e49962 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Risi, J., Sharma, A., Shah, R., Connelly, M. & Watts, D. J. Predicting history. Nat. Hum. Behav. 3, 906–912 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Power, C. & Elliott, J. Cohort profile: 1958 British birth cohort (national child development study). Int. J. Epidemiol. 35, 34–41 (2006).

    Article 
    PubMed 

    Google Scholar 

  • Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Selzam, S. et al. Predicting educational achievement from DNA. Mol. Psychiatry 22, 267–272 (2017).

    Article 
    PubMed 

    Google Scholar 

  • von Stumm, S. et al. Predicting educational achievement from genomic measures and socioeconomic status. Dev. Sci. 23, e12925 (2020).

    Article 

    Google Scholar 

  • Krapohl, E. et al. Multi-polygenic score approach to trait prediction. Mol. Psychiatry 23, 1368–1374 (2018).

    Article 
    PubMed 

    Google Scholar 

  • Kyle, K., Crossley, S. A. & Jarvis, S. Assessing the validity of lexical diversity indices using direct judgements. Lang. Assess. Q. 18, 154–170 (2021).

    Article 

    Google Scholar 

  • Kyle, K., Crossley, S. & Berger, C. The tool for the automatic analysis of lexical sophistication (TAALES): version 2.0. Behav. Res. methods 50, 1030–1046 (2018).

    Article 
    PubMed 

    Google Scholar 

  • Crossley, S. A., Kyle, K. & McNamara, D. S. Sentiment analysis and social cognition engine (SEANCE): an automatic tool for sentiment, social cognition, and social-order analysis. Behav. Res. Methods 49, 803–821 (2017).

    Article 
    PubMed 

    Google Scholar 

  • Choi, S. W. & O’Reilly, P. F. PRSice-2: polygenic risk score software for biobank-scale data. Gigascience 8, giz082 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Becker, J. et al. Resource profile and user guide of the polygenic index repository. Nat. Hum. Behav. 5, 1744–1758 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Privé, F., Arbel, J. & Vilhjálmsson, B. J. Ldpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2020).

    Article 
    PubMed Central 

    Google Scholar 

  • Plomin, R. & Von Stumm, S. Polygenic scores: prediction versus explanation. Mol. psychiatry 27, 49–52 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Chen, T. et al. Xgboost: extreme gradient boosting. R. Package Version 0. 4-2 1, 1–4 (2015).

    Google Scholar 

  • Wright, M. N. & Ziegler, A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. arXiv (2015).

  • Bates, S., Hastie, T. & Tibshirani, R. Cross-validation: what does it estimate and how well does it do it? J. Am. Stat. Assoc. 1–12 (2023).

  • Turkheimer, E. Three laws of behavior genetics and what they mean. Curr. Dir. Psychol. Sci. 9, 160–164 (2000).

    Article 

    Google Scholar 

  • Canivez, G. L. & Watkins, M. W. Long-term stability of the wechsler intelligence scale for Children—Third edition. Psychol. Assess. 10, 285 (1998).

    Article 

    Google Scholar 

  • Jones, S. E. Against technology: from the Luddites to neo-Luddism (Routledge, 2013).

  • Feigenbaum, J. & Gross, D. P. Automation and the fate of young workers: evidence from telephone operation in the early 20th century. Technical Report, National Bureau of Economic Research (2020).

  • Selwyn, N. Education and technology: key issues and debates (Bloomsbury Academic, London; New York), 3rd edn. (2021).

  • Foltz, P. W., Yan, D. & Rupp, A. A. The past, present, and future of automated scoring. In Handbook of Automated Scoring, 1–10 (Chapman and Hall/CRC, 2020).

  • Wainschtein, P. et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 54, 263–273 (2022).

  • Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5, 153–163 (2017).

    Article 
    PubMed 

    Google Scholar 

  • Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T. & Walther, A. Predictably unequal? The effects of machine learning on credit markets. J. Financ. 77, 5–47 (2022).

    Article 

    Google Scholar 

  • Baker, R. S. & Hawn, A. Algorithmic bias in education. Int. J. Artif. Intell. Educ. 32, 1052–1092 (2021).

  • Zheng, Z. et al. Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries. Nat. Genet. 56, 767–777 (2024).

  • link