Fostering an Evidence-Based Ecosystem for AI in Education

The United States faces a critical challenge in addressing the persistent learning opportunity gaps in math and reading, particularly among disadvantaged student subgroups. According to the 2022 National Assessment of Educational Progress (NAEP) data, only 37% of fourth-grade students performed at or above the proficient level in math, and 33% in reading. The rapid advancement of generative AI (GenAI) technologies presents an unprecedented opportunity to bridge these gaps by providing personalized learning experiences and targeted support. However, the current mismatch between the speed of GenAI innovation and the lengthy traditional research pathways hinders the thorough evaluation of these technologies before widespread adoption, potentially leading to unintended negative consequences.

Failure to adapt our research and regulatory processes to keep pace with the development of GenAI technologies could expose students to ineffective or harmful educational tools, exacerbate existing inequities, and hinder our ability to prepare all students for success in an increasingly complex and technology-driven world. The education sector must act with urgency to establish the necessary infrastructure, expertise, and collaborative partnerships to ensure that GenAI-powered tools are rigorously evaluated, continuously improved, and equitably implemented to benefit all students.

To address this challenge, we propose three key recommendations for congressional action:

  1. Establish the GenAI in Education Research Accelerator Program (GenAiRA) within the Institute of Education Sciences (IES) to support and expedite efficacy research on GenAI-powered educational tools.
  2. Adapt IES research and evaluation processes to create a framework for the rapid assessment of GenAI-enabled educational technology, including alternative research designs and evidence standards.
  3. Support the establishment of a GenAI Education Research and Innovation Consortium, bringing together schools, researchers, and education technology (EdTech) developers to participate in rapid cycle studies and continuous improvement of GenAI tools.

By implementing these recommendations, Congress can foster a more responsive and evidence-based ecosystem for GenAI-powered educational tools, ensuring that they are equitable, effective, and safe for all students. This comprehensive approach will help unlock the transformative potential of GenAI to address persistent learning opportunity gaps and improve outcomes for all learners, while maintaining scientific rigor and prioritizing student well-being.

During the preparation of this work, the authors used the tool Claude 3 Opus (by Anthropic) to help clarify and synthesize, and add accessible language around concepts and ideas generated by members of the team. The authors reviewed and edited the content as needed and take full responsibility for the content of this publication.

Challenge and Opportunity

Widening Learning Opportunity Gap 

NAEP data reveals that many U.S. students, especially those from disadvantaged subgroups, are not achieving proficiency in math and reading. In 2022, only 37% of fourth-graders performed at or above the NAEP proficient level in math, and 33% in reading—the lowest levels in over a decade. Disparities are more profound when disaggregated by race, ethnicity, and socioeconomic status; for example, only 17% of Black students and 21% of Hispanic students reached reading proficiency, compared to 42% of white students.

Rapid AI Evolution

GenAI is a transformative technology that enables rapid development and personalization of educational content and tools, addressing unmet needs in education such as lack of resources, 1:1 teaching time, and teacher quality. However, that rapid pace also raises concerns about premature adoption of unvetted tools, which could negatively impact students’ educational achievement. Unvetted GenAI tools may introduce misconceptions, provide incorrect guidance, or be misaligned with curriculum standards, leading to gaps in students’ understanding of foundational concepts. If used for an extended period, particularly with vulnerable learners, these tools could have a long-term impact on learning foundations that may be difficult to remedy.

On the other hand, carefully designed, trained, and vetted GenAI models that have undergone rapid cycle studies and design iterations based on data have the potential to effectively address students’ misconceptions, build solid learning foundations, and provide personalized, adaptive support to learners. These tools could accelerate progress and close learning opportunity gaps at an unprecedented scale.

Slow Vetting Processes 

The rapid pace of AI development poses significant challenges for traditional research and evaluation processes in education. Efficacy research, particularly studies sponsored by the IES or other Department of Education entities, is a lengthy, resource-intensive, and often onerous process that can take years to complete. Randomized controlled trials and longitudinal studies struggle to keep up with the speed of AI innovation: by the time a study is completed, the AI-powered tool may have already undergone multiple iterations or been replaced.

It can be difficult to recruit and sustain school and teacher participation in efficacy research due to the significant time and effort required from educators. Moreover, obtaining certifications and approvals for research can be complex and time-consuming, as researchers must navigate institutional review boards, data privacy regulations, and ethical guidelines, which can delay the start of a study by months or even years.

Many EdTech developers find themselves in a catch-22 situation, where their products are already being adopted by schools and educators, yet they are simultaneously expected to participate in lengthy and expensive research studies to prove efficacy. The time and resources required to engage in such research can be a significant burden for EdTech companies, especially start-ups and small businesses, which may prefer to focus on iterating and improving their products based on real-world feedback. As a result, many EdTech developers may be reluctant to participate in traditional efficacy research, further exacerbating the disconnect between the rapid pace of AI innovation and the slow process of evaluating the effectiveness of these tools in educational settings.

Gaps in Existing Efforts and Programs

While federal initiatives like SEERNet and ExpandAI have made strides in supporting AI and education research and development, they may not be fully equipped to address the specific challenges and opportunities presented by GenAI for several reasons:

  • GenAI has the ability to generate novel content and interact with users in unpredictable and personalized ways.
  • GenAI-powered educational technologies involve unique considerations in terms of data training, prompt engineering, and output evaluation, especially when considering the developmental stages of PreK-12 students.
  • GenAI raises specific ethical concerns, such as the potential for biased or inappropriate content generation, ensuring the accuracy and quality of generated responses, and protecting student privacy and agency. 
  • GenAI is evolving at an unprecedented pace. 

Traditional approaches to efficacy research and evaluation may not be well-suited to evaluating the potential benefits and outcomes associated with GenAI-powered tools in the short term, particularly when assessing whether a program shows enough promise to warrant wider deployment with students. 

A New Approach 

To address these challenges and bridge the gap between GenAI innovation and efficacy research, we need a new approach to streamline the research process, reduce the burden on educators and schools, and provide timely and actionable insights into the effectiveness of GenAI-powered tools. This may involve alternative study designs, such as rapid cycle evaluations or single-case research, and developing new incentive structures and support systems to encourage and facilitate the participation of teachers, schools, and product developers in research studies.

GenAiRA aims to tackle these challenges by providing resources, guidance, and infrastructure to support more agile and responsive efficacy research in the education sciences. By fostering collaboration among researchers, developers, and educators, and promoting innovative approaches to evaluation, this program can help ensure that the development and adoption of AI-powered tools in education are guided by rigorous, timely, and actionable evidence—while simultaneously mitigating risks to students.

Learning from Other Sectors 

Valuable lessons can be drawn from other fields that have faced similar balancing acts between innovation, research, and safety. Two notable examples are the U.S. Food and Drug Administration’s (FDA) expedited review pathways for drug development and the National Institutes of Health’s (NIH) Clinical and Translational Science Awards (CTSA) program for accelerating medical research.

Example 1: The FDA Model

The FDA’s expedited review programs, such as Fast Track, Breakthrough Therapy, Accelerated Approval, and Priority Review, are designed to speed up the development and approval of drugs that address unmet medical needs or provide significant improvements over existing treatments. These pathways recognize that, in certain cases, the benefits of bringing a potentially life-saving drug to market quickly may outweigh the risks associated with a more limited evidence base at the time of approval.

Key features include:

  1. Early and frequent communication between the FDA and drug developers to provide guidance and feedback throughout the development process.
  2. Flexibility in clinical trial design and evidence requirements, such as allowing the use of surrogate endpoints or single-arm studies in certain cases.
  3. Rolling review of application materials, allowing drug developers to submit portions of their application as they become available rather than waiting for the entire package to be complete.
  4. Shortened review timelines, with the FDA committing to reviewing and making a decision on an application within a specified timeframe (e.g., six months for Priority Review).

These features can accelerate the development and approval process while still ensuring that drugs meet standards for safety and effectiveness. They also acknowledge that the evidence base for a drug may evolve over time, with post-approval studies and monitoring playing a crucial role in confirming the drug’s benefits and identifying any rare or long-term side effects.

Example 2: The CTSA Program

The NIH’s CTSA program established a national network of academic medical centers, research institutions, and community partners to accelerate the translation of research findings into clinical practice and improve patient outcomes.

Key features include:

  1. Collaborative research infrastructure, consisting of a network of institutions and partners that work together to conduct translational research, share resources and expertise, and disseminate best practices.
  2. Streamlined research processes with standardized protocols, templates, and tools to facilitate the rapid design, approval, and implementation of research studies across the network.
  3. Training and development of researchers and clinicians to build a workforce equipped to conduct innovative and rigorous translational research.
  4. Community engagement in the research process to ensure that studies are responsive to real-world needs and priorities.

By learning from the successes and principles of the FDA’s expedited review pathways and the NIH’s CTSA program, the education sector can develop its own innovative approach to accelerating the responsible development, evaluation, and deployment of GenAI-powered tools, as outlined in the following plan of action.

Plan of Action

To address the challenges and opportunities presented by GenAI in education, we propose the following three key recommendations for congressional action and the evolution of existing programs.

Recommendation 1. Establish the GenAI in Education Research Accelerator Program (GenAiRA).

Congress should establish the GenAiRA, housed in the IES, to support and expedite efficacy research on products and tools utilizing AI-powered educational tools and programs. This program will:

  1. Provide funding and resources to researchers and educators to conduct rigorous, timely, and cost-effective efficacy studies on promising AI-based solutions that address achievement gaps.
  2. Create guidelines and offer webinars and technical assistance to researchers, educators, and developers to build expertise in the responsible design, implementation, and evaluation of GenAI-powered tools in education.
  3. Foster collaboration and knowledge-sharing among researchers, educators, and GenAI developers to facilitate the rapid translation of research findings into practice and continuously improve GenAI-powered tools.
  4. Develop and disseminate best practices, guidelines, and ethical frameworks for responsible development and deployment of GenAI-enabled educational technology tools in educational settings, focusing on addressing bias, accuracy, privacy, and student agency issues.

Recommendation 2. Under the auspices of GenAiRA, adapt IES research and evaluation processes to create a framework to evaluate GenAI-enabled educational technology.

In consultation with experts in educational research and AI, IES will develop a framework that:

  1. Identifies existing research designs and creates alternative research designs (e.g., quasi-experimental studies, rapid short evaluations) suitable for generating credible evidence of effectiveness while being more responsive to the rapid pace of AI innovation. 
  2. Establish evidence-quality guidelines for rapid evaluation, including minimum sample sizes, study duration, effect size, and targeted population.
  3. Funds replication studies and expansion studies to determine impact in different contexts or with different populations (e.g., students with IEPs and English learners).
  4. Provides guidance to districts on how to interpret and apply evidence from different types of studies to inform decision-making around adopting and using AI technologies in education.   

Recommendation 3. Establish a GenAI Education Research and Innovation Consortium.

Congress should provide funding and incentives for IES to establish a GenAI Education Research and Innovation Consortium that brings together a network of “innovation schools,” research institutions, and EdTech developers committed to participating in rapid cycle studies and continuous improvement of GenAI tools in education. This approach will ensure that AI tools are developed and implemented in a way that is responsive to the needs and values of educators, students, and communities.

To support this consortium, Congress should:

  1. Allocate funds for the IES to provide grants and resources to schools, research institutions, and EdTech developers that meet established criteria for participation in the consortium, such as demonstrated commitment to innovation, research capacity, and ethical standards.
  2. Direct IES to work with programs like SEERNet and ExpandAI to identify and match potential consortium members, provide guidance and oversight to ensure that research studies meet rigorous standards for quality and ethics, and disseminate findings and best practices to the broader education community.
  3. Encourage the development of standardized protocols and templates for data sharing, privacy protection, and informed consent within the consortium, to reduce the time and effort required for each individual study and streamline administrative processes.
  4. Incentivize participation in the consortium by offering resources and support for schools, researchers, and developers, such as access to funding opportunities, technical assistance, and professional development resources.
  5.  Require the establishment of a central repository of research findings and best practices generated through rapid cycle evaluations conducted within the consortium, to facilitate the broader dissemination and adoption of effective GenAI-powered tools.

Conclusion 

Persistent learning opportunity gaps in math and reading, particularly among disadvantaged students, are a systemic challenge requiring innovative solutions. GenAI-powered educational tools offer potential for personalizing learning, identifying misconceptions, and providing tailored support. However, the mismatch between the pace of GenAI innovation and lengthy traditional research pathways impedes thorough vetting of these technologies to ensure they are equitable, effective, and safe before widespread adoption.

GenAiRA and development of alternative research frameworks provide a comprehensive approach to bridge the divide between GenAI’s rapid progress and the need for thorough evaluation in education. Leveraging existing partnerships, research infrastructure, and data sources can expedite the research process while maintaining scientific rigor and prioritizing student well-being.

The plan of action creates a roadmap for responsibly harnessing GenAI’s potential in education. Identifying appropriate congressional mechanisms for establishing the accelerator program, such as creating a new bill or incorporating language into upcoming legislation, can ensure this critical initiative receives necessary funding and oversight.

This comprehensive strategy charts a path toward equitable, personalized learning facilitated by GenAI while upholding the highest standards of evidence. Aligning GenAI innovation with rigorous research and prioritizing the needs of underserved student populations can unlock the transformative potential of these technologies to address persistent achievement gaps and improve outcomes for all learners.

This idea is part of our AI Legislation Policy Sprint. To see all of the policy ideas spanning innovation, education, healthcare, and trust, safety, and privacy, head to our sprint landing page.

What makes AI and GenAI-powered educational tools different from traditional educational technologies?

AI and GenAI-powered educational tools differ from traditional educational technologies in their speed of development and deployment, as AI-generated content can be created and deployed extremely quickly, often with little time taken for thorough testing and evaluation. Additionally, AI-powered tools can generate content dynamically based on user inputs and interactions, meaning that the content presented to each student may be different every time, making it inherently more time-consuming to test and evaluate compared to fixed, pre-written content. Also, the ability of AI-powered tools to rapidly generate and disseminate educational content at scale means that any issues or flaws in the technology can have far-reaching consequences, potentially impacting large numbers of students across multiple schools and districts.

How do gaps in early grades impact students’ long-term educational outcomes and opportunities?

Students who fall behind in math and reading in the early years are more likely to struggle academically in later grades, leading to lower graduation rates, reduced college enrollment, and limited career opportunities.

What are some of the limitations of current educational interventions in addressing these learning opportunity gaps?

Current educational interventions often take a one-size-fits-all approach, failing to address the unique learning needs of individual students. They may also lack the ability to provide immediate feedback and adapt instruction in real-time based on student performance.

How has the rapid advancement of AI and GenAI technologies created new opportunities for personalized learning and targeted support?

Advancements such as machine learning and natural language processing have enabled the development of educational tools that can analyze vast amounts of student data, identify patterns in learning behavior, and provide customized recommendations and support. Personalization can include recommendations for what topics to learn and when, but also adjustments to finer details like amount and types of feedback and support provided. Further, content can be adjusted to make it more accessible to students, both from a language standpoint (dynamic translation) and a cultural one (culturally relevant contexts and characters). In the past, these types of adjustments were not feasible due to the labor involved in building them. With GenAI, this level of personalization will become commonplace and expected.

What are the potential risks or unintended consequences of implementing AI-powered educational tools without sufficient evidence of their effectiveness or safety?

Implementing AI and GenAI-powered educational tools without sufficient evidence of their effectiveness or safety could lead to the widespread use of ineffective interventions. If these tools fail to improve student outcomes or even hinder learning progress, they can have long-lasting negative consequences for students’ academic attainment and self-perception as learners.

When students are exposed to ineffective educational tools, they may struggle to grasp key concepts, leading to gaps in their knowledge and skills. Over time, these gaps can compound, leaving students ill-prepared for future learning challenges and limiting their academic and career opportunities. Moreover, repeated experiences of frustration and failure with educational technologies can erode students’ confidence, motivation, and engagement with learning.

This erosion of learner identity can be particularly damaging for students from disadvantaged backgrounds, who may already face additional barriers to academic success. If AI-powered tools fail to provide effective support and personalization, these students may fall even further behind their peers, exacerbating existing educational inequities.

How can we ensure that AI and GenAI-powered educational tools are developed and implemented in an equitable manner, benefiting all students, especially those from disadvantaged backgrounds?

By prioritizing research and funding for interventions that target the unique needs of disadvantaged student populations. We must also engage diverse stakeholders, including educators, parents, and community members, in the design and evaluation process to ensure that these tools are culturally responsive and address the specific challenges faced by different communities.

How can educators, parents, and policymakers stay informed about the latest developments in AI-powered educational tools and make informed decisions about their adoption and use?

Educators, parents, and policymakers can stay informed by engaging with resources, guidance and programs developed by organizations like the Office of Educational Technology, Institute of Education Sciences, EDSAFE AI Alliance and others on the opportunities and risks of AI/GenAI in education.

Emerging Technology

day one project

GenAI in Education Research Accelerator (GenAiRA)

Congress should foster a more responsive and evidence-based ecosystem for GenAI-powered educational tools, ensuring that they are equitable, effective, and safe for all students.

06.28.24
|
10 min read

Emerging Technology

day one project

A Safe Harbor for AI Researchers: Promoting Safety and Trustworthiness Through Good-Faith Research

Without independent research, we do not know if the AI systems that are being deployed today are safe or if they pose widespread risks that have yet to be discovered, including risks to U.S. national security.

06.28.24
|
6 min read

Emerging Technology

day one project

Update COPPA 2.0 to Strengthen Children’s Online Voice Privacy in the AI Era

Companies that store children’s voice recordings and use them for profit-driven applications without parental consent pose serious privacy threats to children and families.

06.28.24
|
9 min read

American Privacy Rights Act (APRA) Must Include Online Civil Rights, Says the Federation of American Scientists

Privacy laws are only effective if they include civil rights protections that ensure personal data is processed safely and fairly regardless of race, gender, sexuality, age, or other protected characteristics.

06.27.24
|
2 min read

link