Open Datasets for Data Science Education


Teaching data science often requires instructors to provide example datasets to students who are learning to use tools and methods. While open datasets are increasingly common, open data that is prepared for specific use cases, such as teaching specific tools or methods, is less common. The ADSA community has expressed interest in curating a corpus of open datasets that could be used for data science education, capstone projects, and pedagogical research. While many open datasets exist for teaching and research, the ADSA community plans to provide a space for curated datasets and related curricula that others can modify and share at will. ADSA is also interested in partnering with organizations with similar interest in data science instruction who may benefit from such an open data corpus, such as The Carpentries.


Beyond the creation of an initial corpus of open data, the working group has discussed the possibility of building out infrastructure that would allow users to submit new datasets to the corpus, along with supporting materials such as lesson plans and other curricular elements. Allowing user-submitted content will also require a team (formal or informal) of curators to assist with management of the corpus.


Ajay Anand (University of Rochester)

Melissa Cragin (San Diego Supercomputer Center)

Peter Freeman (Carnegie Mellon University)

Rachel Hendricks (RECODE)

Stephanie Hicks (Johns Hopkins)

Yekaterina Kharitonova (UC Santa Barbara)

Brian Macdonald (Yale University)

Pamela Reynolds (UC Davis)