In 2020, group of multi-institutional researchers recognized a gap in the data science education field for a validated introductory-level assessment, mirroring those for other more concretely-defined disciplines. Inspired by the CAOS project, which in 2005 produced a 40-item introductory statistics inventory, the team sought to develop an analog for data science concepts, modeled after CAOS’s balance of a broad, relevant scope with accessibility to learners with a variety of backgrounds and prerequisite knowledge. To create an assessment that would be appropriate to administer both as a pre-test at the beginning of an introductory course as well as a summative post-test, the process of writing items required care in the wording and presentation of nuanced, process-oriented topics such as ethics and data wrangling to accurately measure a student’s knowledge of those topics via a series of multiple choice questions. As data science programs’ curricula vary across institutions and even across departments at the same institution, this became a non-trivial task, as it first required a synthesis of which topics generally constitute the learning objectives of an introductory data science course.
While the rest of the team of researchers began reviewing curricula and drafting items earlier, I joined the project in January 2022 as an independent study student. As well, the rest of the group will continue to work on the assessment as it continues to evolve after my graduation from Duke in May 2023. Thus, the work presented in this document reflects only a portion of the life cycle of the data science assessment, through the lens of my contributions as an undergraduate student. The Background is a traditional literature review to motivate such a data science concept inventory assessment. The two Development sections detail my contributions and reflections while developing the assessment and the package respectively. The Discussion synthesizes the previous sections and reflects on my educational experience while looking ahead to the next steps for the projects. Finally, the Appendices include a prototype of the assessment at the time of writing, the set of ultimately discarded passages and items, and screenshots displaying the dsbox
package in use.
Çetinkaya-Rundel, Mine, and Victoria Ellison. 2021.
“A Fresh Look at Introductory Data Science.” Journal of Statistics and Data Science Education 29 (sup1): S16–26.
https://doi.org/10.1080/10691898.2020.1804497.
De Veaux, Richard D., Mahesh Agarwal, Maia Averett, Benjamin S. Baumer, Andrew Bray, Thomas C. Bressoud, Lance Bryant, et al. 2017.
“Curriculum Guidelines for Undergraduate Programs in Data Science.” Annual Review of Statistics and Its Application 4 (1): 15–30.
https://doi.org/10.1146/annurev-statistics-060116-053930.
Delmas, Robert C., Joan Garfield, Ann Ooms, and Beth L. Chance. 2007.
“Assessing Students’ Conceptual Understanding After a First Course in Statistics.” Statistics Education Research Journal 6: 28–58.
https://doi.org/10.52041/serj.v6i2.483.
Epstein, Jerome. 2013.
“The Calculus Concept InventoryMeasurement of the Effect of Teaching Methodology in Mathematics.” Notices of the American Mathematical Society 60 (08): 1018.
https://doi.org/10.1090/noti1033.
Godfrey, Kelly E., and Sanja Jagesic. 2016.
“Validating College Course Placement Decisions Based on CLEP Exam Scores: CLEP Placement Validity Study Results. Statistical Report.” https://eric.ed.gov/?id=ED574772.
Jorion, Natalie, Brian D. Gane, Katie James, Lianne Schroeder, Louis V. DiBello, and James W. Pellegrino. 2015.
“An Analytic Framework for Evaluating the Validity of Concept Inventory Claims.” Journal of Engineering Education 104 (4): 454–96.
https://doi.org/10.1002/jee.20104.
Jorion, Natalie, Brian Gane, Louis DiBello, and James Pellegrino. 2015.
“2015 ASEE Annual Conference and Exposition.” In, 26.497.1–12. Seattle, Washington: ASEE Conferences.
https://doi.org/10.18260/p.23836.
Mulford, Douglas R., and William R. Robinson. 2002.
“An Inventory for Alternate Conceptions Among First-Semester General Chemistry Students.” Journal of Chemical Education 79 (6): 739.
https://doi.org/10.1021/ed079p739.
Reinhart, Alex, Ciaran Evans, Amanda Luby, Josue Orellana, Mikaela Meyer, Jerzy Wieczorek, Peter Elliott, Philipp Burckhardt, and Rebecca Nugent. 2022.
“Think-Aloud Interviews: A Tool for Exploring Student Statistical Reasoning.” Journal of Statistics and Data Science Education 30 (2): 100–113.
https://doi.org/10.1080/26939169.2022.2063209.
Schanzer, Emmanuel, Nancy Pfenning, Flannery Denny, Sam Dooman, Joe Gibbs Politz, Benjamin S. Lerner, Kathi Fisler, and Shriram Krishnamurthi. 2022.
“SIGCSE 2022: The 53rd ACM Technical Symposium on Computer Science Education.” In, 22–28. Providence RI USA: ACM.
https://doi.org/10.1145/3478431.3499311.
Solomon, Erin D., Julie M. Bugg, Shaina F. Rowell, Mark A. McDaniel, Regina F. Frey, and Paul S. Mattson. 2021.
“Development and Validation of an Introductory Psychology Knowledge Inventory.” Scholarship of Teaching and Learning in Psychology 7 (2): 123–39.
https://doi.org/10.1037/stl0000172.
Study, Nancy, Steven Nozaki, Sheryl Sorby, Mary Sadowski, Heidi Steinhauer, Ronald Miller, and Kaloki Nabutola. 2018.
“2018 ASEE Annual Conference & Exposition.” In, 30231. Salt Lake City, Utah: ASEE Conferences.
https://doi.org/10.18260/1-2--30231.
Swanstrom, Ryan. n.d.
“Data Science Colleges and Universities.” https://ryanswanstrom.com/colleges/.
Zhang, Zhiyong, and Danyang Zhang. 2021.
“What Is Data Science? An Operational Definition Based on Text Mining of Data Science Curricula.” Journal of Behavioral Data Science 1 (1): 1–16.
https://doi.org/10.35566/jbds/v1n1/p1.