Designing immersive virtual environments for assessing inquiry.

Document Type



American Educational Research Association (AERA)

Publication Date



This paper discusses preliminary findings from a recently funded project that to develop virtual performance assessments of scientific inquiry skills. Building off prior research on Immersive Virtual Environments (IVEs), we believe that virtual performance assessments could mitigate limitations associated with both physical performance assessments and paper and pencil item-based tests (Quellmalz & Haertel, 2004; NRC, 2006). The research questions addressed in this paper include: How can IVEs be utilized for assessment of scientific inquiry as defined by the NSES? What evidence is there of IVEs ability to reliably and validly measure students scientific inquiry skills?

Perspective(s) or theoretical framework
Our results from the past seven years show that IVEs enable students to engage in authentic inquiry tasks (problem finding and experimental design) and increase students engagement and self-efficacy (Ketelhut, 2007; Nelson, 2007; Clarke & Dede, 2009). This project focuses on assessment; therefore, we are using the Evidence-Centered Design (ECD) framework (Mislevy, Steinberg, & Almond, 2003; Mislevy & Haertel, 2006) to design our assessments.

ECD is used to increase construct validity by executing a rigorous procedure consisting of four stages: domain analysis, domain modeling, conceptual assessment framework and compilation, and four phase delivery architecture. The result is an assessment argument that links learning theories to student performances to data interpretation.

Methods, techniques, or modes of inquiry
Design-based research methods are being used to conduct a series of studies, including iterative pilot testing, alignment analyses, and cognitive analyses. These methods inform the development of the IVE and provide evidence of construct validity for the assessments (Nielsen, 1994; Quellmalz, Kreikemeier, DeBarger, & Haertel, 2007). A mixed-methods approach is being used during pilot testing to collect data on users performances and preferences. Performance data includes measures of the users ability to complete tasks, time to complete tasks, number of errors, and ratio of successes to failures. Preference data includes Likert-scale ratings as well as findings from interviews and focus groups.

Data sources, evidence, objects, or materials
The IVE used for our project is still in development; however, we are collecting pilot data from both formal and informal sources. Formal data sources include middle school students, while informal data sources include experts in education, software, and scientific inquiry.

Results and/or substantiated conclusions or warrants for arguments/point of view
Initial pilot results have articulated the need for numerous design features. First, users requested a more intuitive interface, which is especially salient given that the IVE must provide all users with an equal opportunity for success, since it is an assessment. Second, users wanted clear pathways and signs for better navigation. Third, users wanted the IVE size made smaller so that time is not wasted in travel. We are conducting additional pilots, and we will present more detailed results in our paper.

Scientific or scholarly significance of the study or work
There is a growing interest in using IVEs for assessments. Researchers and proponents of IVEs must build a case based on empirical proof that virtual performance assessments can measure learning with reliability and validity.

This document is currently not available here.