Exploring Data Science Education: From Tutorials to Assessment

Duke Statistical Science | Graduation with Distinction

Authors

Evan Dragich

supervised by Mine Çetinkaya-Rundel, PhD.

Published

April 20, 2023

Abstract

As data science continues to grow in popularity among university course offerings, it is becoming crucial to successfully measure students’ learning outcomes in introductory courses. To do this requires an assessment which could additionally be used to evaluate pedagogical techniques or curriculum interventions in data science courses.

To develop a blueprint for the assessment, a multi-institutional team of statistics and data science education researchers identified common data science content (e.g., data wrangling, interpreting visualizations), drawing from published guidelines and recommendations as well as introductory data science syllabi. A draft of the assessment was written and used to conduct three think-aloud interviews with field-relevant faculty members. The interviews consisted of both open-ended brainstorming on the assessment’s scope as well as individual examinations of each item for relevance, clarity, and efficacy in measuring the desired learning objective. Think-aloud interviews were also conducted with TAs of an introductory DS course to gauge item clarity and gain insight into the reasoning for their responses.

As well, given the recent rise in popularity of open source educational software, there is a growing demand for scalable data science pedagogical materials. Based on the Data Science in a Box introductory curriculum, we have developed an R package dsbox containing 10 interactive, self-contained, auto-graded tutorials covering concepts from basic data wrangling and visualization to modeling.

This work includes descriptions of the blueprint developed for both the assessment and package, as well as examples of assessment items and tutorials, and results from the faculty and student think aloud interviews. We also present next steps for the project including plans for larger scale piloting and further analyses.

Goals

By sharing the work, we hope that instructors will become familiar with an assessment they may use for designing introductory data science curriculum or researching classroom innovations. We also hope that this instrument can serve as an inspiration or a starting point to be tailored by future researchers more specifically to their courses or to another discipline (e.g. by adding more programming concepts to better serve a more computing-focused introductory data science class, etc.) We hope to officially release dsbox to CRAN by the end of the Spring 2023 semester, facilitating access to data science education to a broader community.