California Alliance for Data Science Education

23 FEBRUARY 2022


(read more about Catherine Cramer)

On December 15, 2021, ADSA streamed a new installment in our virtual sessions, titled Creating Inter-Institutional Data Science Pathways: Streamlining Access to Data Science Education in California, a 90-minute deep dive into the California Alliance for Data Science Education (CADSE), moderated by Eric Van Dusen from UC Berkeley. CADSE is a state-wide network of community colleges, state universities and the University of California system, representing a total of 2.5 million students spread over more than 100 campuses. The network was formed to address issues that emerge when schools need to offer data science at various levels and provide pathways for students to be successful in their pursuit of data science education. Common challenges include coordination among programs, alignment of visions, articulation (i.e., the process of comparing the content of courses that are transferred between postsecondary institutions), infrastructure, curriculum, faculty training, sharing of resources, and the logistics of helping students move across community college, CSU and UC systems. At the 1000’ level, CADSE aims to gain a deeper understanding of what data science education is (or should be), what is currently being built, and how to plan for the future.
Check out the full video on our YouTube Channel.

CADSE logo

CADSE held a large meeting in February 2021, with 149 attendees from 27 institutions gathering to survey what is happening in data science across the state. Registration for a followup meeting on February 25, 2022 is open! Engaging with school chancellors, CADSE acts as a connector, ensuring schools are kept abreast of new developments in the area of data science education and supporting the development of a class on the fundamentals of data science that will provide a common template for introductory classes on other campuses, using opensource materials from UC Berkeley’s Data 8 course. Another key effort is providing support for students who are transferring from one campus or system to another by training transfer mentors.

Van Dusen then introduced each speaker, who together represent the 3 systems within the CADSE network.  Aaron Frankel, Chair of the Data Science program at UCSD, described what is happening in data science education at the UC level, pointing out that since data science is not yet a well-defined academic field there is not one size that fits all. At UCSD data science is a stand-alone program, with 15 full-time faculty and 800 students in the data science major track. There are three threads in the UCSD major: stats/modeling, computation/coding, and domain/application. He described creating an academic unit as being “very labor intensive”, having to make decisions on things like course articulation and what a student’s time to degree might look like. He posed a question many are asking: what is the future of data science in an academic setting? Is it a bespoke hand-crafted major that might feel like two majors squished into one, or does it emerge within an existing academic department, or does it consist of data science courses popping up in a variety of departments?

While there are many pathways into data science, as Frankel pointed out, “the more pathways there are the broader the access is.” Data science attracts a diverse community from across the university, which he finds unique in STEM fields, while also acknowledging achievement gaps for women and under-represented minority (URM) students. Additional challenges highlighted in Frankel’s presentation: staffing practice-based courses, dealing with the extreme interdisciplinary-ness of data science and combining multiple fields, the large prerequisite chains, and course articulation being difficult because of specialization. “We need to coordinate so we at least have similar themes among courses so we can know how they transfer,” said Frankel.

Judith Canner, a professor from Cal State Monterey Bay, spoke next. She described Cal State as the largest 4-year public university system in the US, with a student population that is over one third Latinx and that has a large number of first-generation college attendees.  Cal State does not offer PhD programs but does offer masters degrees. Her school’s effort to create a data science program started with what she described as a process of “disjointed pathways across many different majors” that then “stumbled into negotiation about who owns data science” and lots of discussion about computer science offerings and bridging courses. The result of two years of negotiation is the school offering two minors – one that combines statistics and programming to move students into a machine learning pathway and one that’s more general data science.

Canner also described working on moving students into the next level, which includes an NIH-funded collaboration with UC Santa Cruz offering summer paid research with mentors to help Cal State students figure out their grad school path. These students are showcased to UC faculty, and then are recruited into labs. Canner also described training for post-docs to teach them how to mentor and teach at Cal State schools. She discussed efforts to bring community colleges into the pipeline, building what’s known as “2+2 agreements” – if a student completes an associate’s degree they are guaranteed the opportunity to graduate from a 4-year school, which is helpful in building out local connections between community colleges and Cat State schools.  She mentioned the need to create an introductory data science course, one that is similar to Data 8 but that articulates through interdisciplinary collaborations across departments, meaning the course content can be matched across departments and schools. To that end her school recently completed a survey on CSU computational needs in an effort to understand how to improve the sharing of resources. Finally, she mentioned the challenge of finding and hiring data scientists who want to teach, and that training faculty already on site is often a more productive route.

The last speaker was Solomon Russell from El Camino Community College. There are 116 community colleges in California, serving 1.8 million students. Several of these colleges teach UC Berkeley’s Data 8 course. Russell described the use of 2i2c (a non-profit that designs JupyterHub distributions to run on cloud infrastructure) to provide infrastructure to log into Jupyter assignments, and creating course materials on Canvas. His school has several collaborations with other community colleges, Cal State schools and UC schools, and participates in a community college data science conference held at UC Berkeley, which he described as a “force multiplier.”

Russell described the biggest challenge as navigating articulation and making it work for students. Some of their challenges include  taking the math requirements for data science and figuring out how to make them fit into what they already offer their students. El Camino offers pathways for certificate and associate degrees and they are working on implementing 2+2 agreements as described above, connecting community colleges and Cal State schools and ensuring that credits are transferable.

He also described equity gaps, resulting in lower completion rates for Black and Hispanic students. One approach to this problem is making data science courses available to students without a requirement to go through the computer science sequence first. He also mentioned the need to find data science faculty and – similar to Judith Canner – recommends “growing expertise within.”

Woman being thoughtful

During the Q+A that followed the panel there was discussion on how to give students a tangible sense of what it’s like to DO data science, mentioning exposure to extracting observations from tables and inference through variation, as examples. There is considerable support for the Data 8 course, as there aren’t a lot of constraints to taking it, e.g. students do not have to take computer science and only need some algebra to be successful. There was discussion about who “owns” data science in an academic setting - in terms of which department or school or faculty  – and a lack of support for creating courses that aren’t required, which is a resource issue.

Finally, the panel discussed how math fits into data science courses, and the future of math majors. Schools are rethinking what math looks like at various levels, given that making institutional change at such a large scale is always slow-going. Among other things this is a teacher training issue, getting existing faculty to move into this area and then move them into teaching with notebooks. Teachers might have a Masters degree in mathematics but not in the statistics necessary for data science, thereby prompting a need for training for mathematicians on how to teach data science courses. There is a need for both content and pedagogy. In addition, more advanced math is a known bottleneck for diversity in data science. And there’s a lot of conversation about math and data science at the high school level involving state standards and school boards. Bottom line: give teachers a pathway in but make it about the students.