Data Science for Social Impact
09 DECEMBER 2021
FROM THE DESK OF CATHERINE CRAMER
(read more about Catherine Cramer)
ADSA Virtual Series KickOff: "Data science for social impact in university-based programs"
This online event was the first in a series of virtual content that was originally planned to be part of ADSA’s annual meeting in Savannah, GA in November 2021. The continuing pandemic and surge of the delta variant pushed most of the content planned for this in-person meeting to a series of virtual events. Additional virtual sessions are scheduled from now until March 2022; see the 2021-2022 ADSA Virtual Meeting page for details. The recording of this session can be found HERE.
A Data for Good growth map
As university data science programs have proliferated, so have “data for good” initiatives and projects. The virtual session “Data science for social impact in university-based programs”, held online on November 10, 2021, assumed that participants would already be steeped in the “why”, the potential positive impact of data for social good projects, and instead was designed to offer a kind of roadmap for “how”. While data science programs must concentrate on data intensive technologies, successful data for good programs have a different set of priorities requiring a different set of skills, notably developing and maintaining robust partnerships and developing a nuanced understanding of the context in question.
The opening presentations gave examples of two kinds of programs – summer programs that offer fellowships for data science students to work on collaborative projects with non-profit and government partners, and partnering with research organizations working in (mostly) urban communities that have been collecting and evaluating societal data for many years.
Dharma Dailey (University of Washington eScience Institute) outlined the notion of “data for good” and what designing a data for good program might entail, highlighting excerpts from the just-released Data for Good Growth Map, which is aimed at helping fledgling programs. Dailey described a data for good program at its core being “aspirational, relational, dynamic and contested’, a place to advance practice and understanding of what it takes to “do good” with data. Not surprisingly, there is no one definition of “for good”, as slightly different framings present as “common good”, “social good” and “public good”, each one emphasizing different kinds of social impact.
Dailey’s growth map surveyed 8 university programs in 2020, covering a wide range of topics with social dimensions. Commonalities include size, being small and intense programs with a 10-14 week timeframe engaging 8-40 students, and structured as project and team-based work supported by mentors. These university-based programs bring together the triumvirate missions of education, research and service, setting them apart from similar programs in government, non-profits or industry. The programs are augmented by a range of teaching modalities, and there’s no one answer for funding these projects - in kind support, university funds, foundation support, and funding from research grants are all potential sources.
Students build skills
Data for good programs are popular with students with a wide range of interests, and importantly are appealing to those who are underrepresented in STEM and who have varying levels of Data Science experience. These programs are an excellent place for diverse students to stick a toe – or take a plunge - into the water of data science. They enable student engagement with data science techniques, specifically reproducibility, version control and documentation skills, and bring in non-DS skills related to core science, public science, and the public interest. Notably, these programs usually support only certain stages of data for good project development, such as developing research questions and data discovery, or making a piece of scientific software more accessible.
What sets these projects apart is the deep engagement with stakeholders, with students often engaging directly with project partners. In fact, the most critical factor for success is developing strong partnerships. A viable project is structured around a well-defined research question and involves analytic tasks, and it is essential to have data in hand when the program starts. Specific needs include a clearly defined rationale for why and how the project outcome will make strong social impact, addressing ethical concerns, and the viability of the intervention beyond the life of the program. For those interested in starting a university-based data for good program, the eScience Institute is offering a workshop series in January and February 2022.
Amy Hawn Nelson from Actionable Intelligence for Social Policy (AISP) housed at the University of Pennsylvania described a very different kind of “for good” data science program, using applied research on cross-sector integration that helps state and local governments collaborate and responsibly use data to improve lives. AISP projects only use public sector data, and focus on community building, working with some of their 36 sites over many years. Nelson described in detail how AISP projects are unique, grounded in experience and focused on thinking about data sharing and integrated data systems as relational work, not a technical platform, integrating not just data but people, and describing herself as a “data sharing therapist”, with the admonition that “data sharing is not for the weak.”
AISP’s work is on cross-sector data curation – not necessarily “big data” – public utility work answering carefully constructed questions. See AISP’s Integrated Data Systems Map. A quick overview of several projects described high quality curated data for addressing homelessness, a lack of smoke alarms, yoga for good, the impact of economic development funds, and understanding the impact of early childhood funding and pre-school offerings. AISP focuses on standards creation and training, and has developed a framework for integrated data systems and toolkit for centering racial equity. Each project starts at a different place in the data lifecycle and the framing is important; AISP has released an assessment matrix for potential sites.
Eat your vegetables
While many data scientists view analytics and model building as the fun stuff, the value that AISP supports is what Nelson calls “the vegetables,” thinking through what cleaned data actually means – deduplication, standard definitions, and reconciliation, what Nelson also described as “the drudgery that is essential for data analytics,” helping their sites “eat their kale.” AISP looks for data scientists who are interested in administrative data reuse for evidenced-based policy, needing people who truly understand public sector data, those who come from a strong background in data science who also get human service sector training.
Ethical and secure data use is at the heart of AISP. Nelson points out that universities are looked to for this kind of work, being viewed as a kind of “Data Switzerland” carrying a high potential for trust. AISP’s main lane is restricted data, which come with legal agreements, and metadata are essential, explicitly including distinctions rarely documented in metadata – who is the owner (many think they are but aren’t), the steward (the content expert who works directly with agencies), and the custodian (trained in security, storage, compliance)?
The panel - Dharma Dailey, Karthikeyan Umapathy, David Uminsky, Gizem Korkmaz, Cass Dorius, and Rebecca Shearer responded to several questions from moderator Amy Hawn Nelson and from the audience:
Q: HOW DO YOU ENSURE ALIGNMENT WITH STAKEHOLDER NEEDS?Your programs often work in collaboration with government and non-profit organizations. What opportunities do students have to engage with stakeholders?
- Stakeholders define the research question and often the data sources
- Lots of engagement between students and stakeholders; students hear problems first-hand
- Very motivating for students as they can see the impact of their work
- Students incorporate insights from stakeholders and make sure the work aligns with what stakeholders want
- Often requires teaching yourself content in order to get the stakeholders what they need
Q: HOW DO YOU ENSURE THE TIME AND EFFORT OF GATHERING DATA, METADATA, OWNERSHIP/LEGAL AND ACCESS ISSUES ARE MANAGED?
- Keep data use cases close to home, keep them small, make sure all procedures are in place
- No one wants to share data with you unless they trust you. People need to know who is running the project, make personal connections so that people trust you with their data. “Data is shared at the speed of trust”
- Ensuring security and privacy is not a technical process, it’s a social process. You need technical people to support you but not talk so much in meetings
- Every data owner has to approve the use of their data and having Data Switzerland helps
Q: WHAT ARE YOU DOING TO CULTIVATE DIVERSE STUDENTS/PARTNERS? WHAT CHALLENGES ARE YOU FACING IN TERMS OF CULTIVATING DIVERSITY?
- We work to cultivate diverse partners and students. Many of us do not identify as data scientists ourselves.
- Recruitment and retention of diverse students are helped by focusing on students' desire to make an impact in their own communities.
Q: HOW DO YOU SELECT PROJECTS OR DATASETS TO WORK ON?
What characteristics must a project have to be considered for your program? And relatedly what are project exclusion criteria?
- Some programs work with same agencies year to year
- It’s important to build a network of scalable projects
- Develop sustainable and local projects, not just expensive summer programs
- Often need local projects in which stakeholders can work directly with students
- Look for projects with a potential for available data
- Make sure projects are aligned with what the students want to do
QUICK ROUND ROBIN: LESSONS LEARNED AND RECOMMENDATIONS FOR OTHER UNIVERSITIES INTERESTED IN STARTING THEIR OWN PROGRAMS?
- Ask everyone on the project team about their vacation plans
- Do not underestimate how long takes to get through your university legal
- Do not assume that new staff you are working with trust you
- Ask who is the actual owner of the data? Data collectors are not the owners, the funding agency is
- Plan meetings with stakeholders early before you start and be flexible
Want to get involved or learn more?
Contact the session speakers or the session co-Chairs: Doug Hague, Sarah Stone
Find the recording HERE
Additional resources brought up by audience members in the Zoom chat:
- AISP Intro to Data Sharing
- Miami Dade IDEAS Consortium
- GO-FAIR principles
- FAIR and CARE summaries
- CARE principles for indigenous governance
- UVA Biocomplexity Institute’s Data Science for the Public Good Young Scholars
- Iowa Integrated Data for Decision Making
- University of Chicago Data Science Institute
- Florida Data Science for Social Good
- The Leadership Alliance
- ASA JEDI Outreach Group