Introduction

In 2020, group of multi-institutional researchers recognized a gap in the data science education field for a validated introductory-level assessment, mirroring those for other more concretely-defined disciplines. Inspired by the CAOS project, which in 2005 produced a 40-item introductory statistics inventory, the team sought to develop an analog for data science concepts, modeled after CAOS’s balance of a broad, relevant scope with accessibility to learners with a variety of backgrounds and prerequisite knowledge. To create an assessment that would be appropriate to administer both as a pre-test at the beginning of an introductory course as well as a summative post-test, the process of writing items required care in the wording and presentation of nuanced, process-oriented topics such as ethics and data wrangling to accurately measure a student’s knowledge of those topics via a series of multiple choice questions. As data science programs’ curricula vary across institutions and even across departments at the same institution, this became a non-trivial task, as it first required a synthesis of which topics generally constitute the learning objectives of an introductory data science course.

While the rest of the team of researchers began reviewing curricula and drafting items earlier, I joined the project in January 2022 as an independent study student. As well, the rest of the group will continue to work on the assessment as it continues to evolve after my graduation from Duke in May 2023. Thus, the work presented in this document reflects only a portion of the life cycle of the data science assessment, through the lens of my contributions as an undergraduate student. The Background is a traditional literature review to motivate such a data science concept inventory assessment. The two Development sections detail my contributions and reflections while developing the assessment and the package respectively. The Discussion synthesizes the previous sections and reflects on my educational experience while looking ahead to the next steps for the projects. Finally, the Appendices include a prototype of the assessment at the time of writing, the set of ultimately discarded passages and items, and screenshots displaying the dsbox package in use.