Students’ performance dataset for using machine learning technique in physics education research
Ethics statements
The research conducts applied in this study has been reviewed and approved by the ethics committee at the Institute of Research and Community Service of Universitas Negeri Yogyakarta under the ethical approval number T/32.1/UN34.9/KP.06.07/2023 dated 1st October 2023. All necessary permissions were received systematically and ethically, as outlined in the research project proposal. The research aim to publish an open dataset of students’ responses on conceptual understanding, scientific ability, and learning attitude of physics questionnaires underwent a full review. School heads, physics teachers, and students (with parental consent) coordinated with the researchers about the study description, including the practical objectives and methods of the research project, and provided their consent to participate in the survey. The SPHERE dataset could be fully accessed particularly for physics teachers to evaluate their instruction, and they consented to the use of the demographic data to the extent necessary for the research project. Students’ and schools’ privacy was securely protected by fully anonymizing students’ and schools’ specific identities on the test administrations and the records reported in this dataset.
Participants and contexts
A total of 497 students (age: M = 16.2; SD = 0.6, gender ratio: male = 225, female = 272) were involved from three large and a small public high schools located in a suburban district of a high-populated province in Indonesia. While the size of school A, B, and D was greater than school C (Table S1), the current policy entailed schools implementing zoning-based enrollment system by distributing students more diversely among the schools15. Consequently, we could assume that all schools would be considered to have equal student input, even if schools can differ in status, teachers, sizes, facilities, and locations. Before this policy, schools A and D were stereotyped the most popular schools in the district. Most parents wished to bring their children to study there considering the reputation. The parents’ motivation was mostly based on the greater possibility of alumni acceptance in prestigious universities in Indonesia, if their children could be graduated from these schools. In recent years, zoning-based system was initiated by the Indonesian government aimed to diminish the status discriminating the most and least popular schools in the society. Regardless of the parents’ preference, students should enroll in the closest schools measured from their domiciles, though small quotas were facilitated for students from outside the zoning area which have some achievements. We represent this context by noting the domicile variable in Table S1. Students admitted that they mostly lived inside the zoning area.
Some variables related to demographics, accessibility to literature resources, and students’ physics identity have been summarized in Table S1. We decided to incorporate variables of literature accessibility and physics identity because some PER scholars suggest their potential impact on students’ learning in physics23,24. The importance of physics identity should be aligned with the intended occupation of those students planning to pursue a STEM field after high school. In summary, our participants were dominated by female students aged between 15–16 years old living with high-school-educated parents. It is a common value in the Indonesian society that the father is more responsible than the mother in terms of finding stable occupation to make money for their family spendings. Descriptive statistics reported in Table S1 demonstrate that our students had been digitally facilitated, yet still facing some difficulties to access printed books. Participants also identified that they studied physics at night before the tomorrow’s schedule and perceived that their families had provided positive support when they were studying physics.
Data collection
The students were in the eleventh grade or entering the “F phase” as suggested by the Indonesian curriculum. In this learning phase, students were taught about high school physics on measurement, vector, linear and rotational kinematics, linear dynamics, and projectile motion in the first semester. Their physics learning in the second semester was continued on rotational dynamics, fluids, light and mechanical waves, heat and thermodynamics. Formerly, our physics teachers decided to conduct an individual assessment in their schools. Though they have agreement of the concepts taught to the students based on the F phase curriculum, each school might have a separate set of test items or testing platforms. For the purpose of creating the SPHERE dataset, researchers and teachers agreed to standardize measures among the schools. To do this, we decided to adopt RBAs established by the PER community. Some RBAs utilized in the SPHERE dataset were carefully selected based on contents that had been learned by the students in the curriculum. Prior to the test administration, the original RBAs had been translated into the Indonesian language. The translation of the RBAs was then validated by five PER experts equipped with some teaching and research experiences in PER for more than ten years.
Data collection was conducted during the 2023/2024 academic year of high school physics (Fig. 1), with specific activities taking place as follows:

Timeframe of the data collection process to produce the SPHERE dataset.
First Semester
Assessments such as the Force Concept Inventory (FCI)13 and the Force and Motion Conceptual Evaluation (FMCE)14 were administered in November 2023 after students had been taught the learning contents.
Second semester
Remaining conceptual assessments including the Rotational and Rolling Motion Conceptual Survey (RRMCS)25, the Fluid Mechanics Concept Inventory (FMCI)26, the Mechanical Waves Conceptual Survey (MWCS)27, the Thermal Concept Evaluation (TCE)28, and the Survey of Thermodynamics Processes and First and Second Laws (STPFaSL)29 were conducted throughout the second semester (January to June 2024) in different sessions based on the student’s learning schedules. A laboratory work through the light diffraction experiment was carried out after the midterm. Students’ activity in the laboratory was observed using the Scientific Abilities Assessment Rubrics (SAAR)20 and their attitude toward physics was measured using the Colorado Learning Attitude about Science Survey (CLASS)21.
The survey was commenced in the first semester (November-December 2023) based on physics teachers’ recommendation, since they should teach their students about the physics contents that will be examined in the survey. At this time, FCI and FMCE were firstly administered. They were given to students separately on different days. The final test of the first semester (FINTEST1) used the teachers-developed items, and researchers only collected the score results from the teachers (Fig. 1). In the second semester (January-June 2024), we continued the survey with the remaining RBAs. We conducted a laboratory experiment on light diffraction considering the available measuring tools and materials facilitated by the involved schools. The organization of RBAs in the second semester might be different for each teacher. Physics teachers reserve the right by the curriculum to arrange their learning sequence independently on the basis of their analysis to meet the students’ needs. For instance, the administration of tests was started using the RRMCS in certain school, yet it was started using the MWCS in other schools. Therefore, the RBA should be administered based on the teachers’ recommendation to ensure their students’ readiness especially for the physics contents examined. Students were allocated approximately 60 minutes for most tests to complete each conceptual inventory. Exceptions were made for assessments requiring extended responses or reasoning, such as the FMCE and RRMCS, which included items with textual reasoning.
The experiment was aimed to measure visible light wavelengths. Students observed the LED light through a diffraction grating and found the rainbow pattern. See Fig. 2 for the schema of the experimental setup. Three slit numbers of diffraction grating observed in the experiment were 100, 300, and 600 lines/mm. There were no differences in the diffraction grating used in each school. Some readers who might be interested to study the experimental design can consult with a reference from Krulj and Nesic30. The experiment cookbook was provided by the researchers to ensure smooth communication of technical setup and tasks given to the students. In the laboratory, students were grouped by the teachers with the rule that four students should be maximum in each group (Fig. 3). Laboratory experiment and report writing took 2 weeks, with one week dedicated to conducting the experiment and the second week allocated for the deadline of the report submission. This light diffraction experiment was the first laboratory for students in the second semester. Meanwhile, in the first semester, the different physics laboratory had been conducted by the physics teachers. Yet, this time was before we granted permissions from schools to collect SPHERE dataset.

Visual schema of the light diffraction experiment.

Students collaboratively worked in conducting a light diffraction experiment. Informed consent has been obtained from all participants to publish their photograph.
Using rubrics provided by SAAR, students’ performance in doing the experiment was observed. A week after the experiment day, students should write the laboratory reports based on data measured by their corresponding group. These laboratory reports were then collected and then evaluated by the researchers using SAAR. The conceptual assessments in the second semester including RRMCS, FMCI, MWCS, TCE, and STPFaSL were administered on different days separately, based on schedules informed by physics teachers (Fig. 1). The CLASS was then given to students directly in one day based on the teachers’ suggestion to gauge students’ learning attitude toward physics. In this study, we employed the Google Form platform to record the students’ data, ensuring efficient data collection and standardization among schools. Hence, smartphone was allowed to use by the students to do the online tests. The RBAs used in this study were essentially conceptual thus a calculator should not be needed. On the other hand, for the need of data analysis in the light diffraction experiment, students could use a calculator. The researchers shared the Google Form links to the physics teachers in each school, and teachers provided the test to their students in the day of physics classes. During the test, physics teachers was informed to minimize the possibility of students’ cheating.
The conceptual constructs measured in the SPHERE dataset will be briefly described. The presentation of the origin of the instruments, example items, and response options are further detailed in the supplementary information. The first RBA employed in this study was the Force Concept Inventory (FCI) (Fig. S1, Table S2). This instrument was intended to measure students’ conceptual understanding of Newtonian mechanics. It was developed by Hestenes, Wells, and Swackhamer in 199213 and is still used to this day31. As an alternative conceptual inventory on Newtonian mechanics, the Force and Motion Conceptual Evaluation (FMCE) was then employed14 (Fig. S2, Table S3). The FMCE was unlike the FCI because some items were designed to probe students’ understanding on interpreting graph. The FCI and the FMCE were used at the first semester in this study. In the second semester, five conceptual RBAs was administered including an attitudinal survey and an assessment of the physics laboratory.
The first test in the second semester was the Rotational and Rolling Motion Conceptual Survey (RRMCS) (Fig. S3, Table S4). This was a two-tier multiple-choice test aimed at probing students’ understanding of rotational motion and notions associated with it25. Then, the subsequent topic taught to the students was fluid mechanics. We examined their conceptual understanding using the Fluid Mechanics Concept Inventory (FMCI) (Fig. S4, Table S5). It was developed as a conceptual inventory to explore students’ ideas of fluid mechanics concepts26. After that, students did the Mechanical Waves Conceptual Survey (MWCS) (Fig. S5, Table S6). This RBA was designed to evaluate students’ understanding of main topics in mechanical waves. To assess students’ understanding of heat concepts, the TCE was utilized in this study28 (Fig. S6, Table S7). The remaining conceptual assessment measured by the SPHERE dataset was examining students’ understanding of thermodynamics. In this regard, we employed the Survey of Thermodynamics Processes and First and Second Laws (STPFaSL)29 (Fig. S7, Table S8). For the students’ scientific ability performed in the physics laboratory described above and their attitude toward physics were probed using the Scientific Abilities Assessment Rubrics (SAAR) (Fig. S8, Table S9) and the Colorado Learning Attitudes about Science Survey (CLASS) (Fig. S9, Table S10) respectively. Those RBAs could be directly downloaded through the PhysPort platform ( Interested readers should be signed up as physics educators to get access as required by the PhysPort policy.
Labeling process of the students’ performance in physics
Multi-domain assessment in SPHERE measured students’ performance in physics learning in terms of three learning outcomes. They covers students’ conceptual understanding, scientific abilities, and learning attitude of physics. We argue that these variables could be analyzed to predict students’ performance at the end of the semester within the perspective of machine learning studies4,7,8,9. The predictive results provided by machine learning would be useful to give immediate feedback about the progress of students’ learning and assist teachers to monitor individual students’ needs for future learning. In the SPHERE, students’ performance at the end of the semester was classified dichotomously (1 = high achiever, 0 = low achiever). These grouping labels were obtained in two ways. First, we collected final test scores from the first and second semesters administered by physics teachers (FINTEST1 and FINTEST2). We employed these scores to classify students into those two groups. Machine learning model was trained to predict these classes accordingly. Second, before the second semester of high school physics learning was studied, physics teachers were surveyed. They were asked to make an early justification. They classified their predicted students’ status at the final test of the second semester on the basis of their ongoing observations and assessments from prior learning and their experience interacting with the students (TEACHPRED). This prediction task done by physics teachers was students-oriented justification method. It was proposed by Zhu and Urhahne32 and also has been approached by researchers in physics education33. We were motivated to collect prediction data performed by physics teachers since the aims of this study were to train the machine learning model and compare its performance with human-based judgment.
link
