Discussion

While my work on the assessment ended before a large-scale student pilot, we anticipate that the research team will be able to obtain a set of introductory data science student responses by the end of the Spring 2023 semester. As we work to draft a research proposal with Duke’s IRB, we hope to again evaluate the wording and pacing of the assessment on the target population, making additional refinements before a true pilot intended to actually measure students’ learning and explore its validity and possible subscales. As the team prepares for such a step, the focus has shifted towards obtaining funding from an NSF grant, possibly by either expanding the scope of the curriculum to K-12 instructors, or by drilling down technically and building the assessment into a more robust format via Javascript and other advanced tools.

Although the assessment itself deals with introductory topics, I developed my knowledge of many advanced concepts in data science and programming through my work on this project. Summarizing into two general themes, working on the assessment and package has given me the unique opportunities to dive deeper into computer science topics such as web development and advanced GitHub usage, as well as to interact with others’ real-life code beyond the scope of the classroom or research lab. The first outcome is particularly relevant to me as I was not able to take as many computer science or statistical programming courses as I hoped at Duke. Thus, developing the GitHub Pages site with code styling, alt text, and navigation bars provided me with HTML, CSS, and GitHub Actions experience that I would not have gotten otherwise. As well, my only previous exposure to GitHub had been on a smaller scale, such as class projects or smaller research teams. Working on a community-wide package like dsbox was my first time using GitHub’s more advanced features, such as forking branches of others’ functioning code, creating pull requests, and approving others’. It was also rewarding to learn about all that goes into creating an R package, from the file structure, .Rd files, to CMD checks. The process of iteratively attempting and failing to submit dsbox gave me a much deeper appreciation for the wide library of open source software we use daily as statisticians.

Acknowledgements

This work would not have been possible without the incredible support and mentorship from my advisor, Dr. Mine Çetinkaya-Rundel, who has been by my side since the beginning of my Duke Statistical Science journey. I would also like to thank Drs. Alexander Fisher and Amy Herring for forming my committee and helping me make this work the best it can be. I have to acknowledge Dr. Joan Combs Durso, not just for her help with the logistical aspects of the Graduation with Distinction process, but also for her insightful workshops as well as everything she does to support the undergraduate student experience in the department. Of course, I also am immensely grateful for the three faculty members and students who offered their time to help improve our assessment. Special thanks goes again to Dr. Fisher, as well as Dr. Elijah Meyer, for graciously allowing us to use their STA199 students as subjects for our large-scale release, and specifically for working so well in tandem with us as we iterated the study design to satisfy IRB. Finally, I would be remiss not to thank my friends, family, and Statistical Science peers for providing feedback and emotional support, as well as every student for whom I have had the pleasure to be a TA, tutor, or study group facilitator, for helping me develop teaching and mentoring skills and for continuously reigniting my passion for both.