Reflection on Common Challenges in Academic Data Science
August 1, 2023
One thing that struck me about the summit was the common challenges that programs often face in encouraging participation from scholars, especially those who are based in a specific domain. I am a PhD candidate in linguistics with a secondary affiliation with a data science institute, and during the summit, many of the same challenges that were discussed resonated with my experience as a graduate student. This made me wonder about how challenges at the very early career stage might influence future faculty’s involvement in the academic data science community.
Defining Data Science
In the summit, it was difficult to pin down a definition for a data scientist, or for data science as a field. Despite this, there was a general consensus that data science is interdisciplinary and overlaps with work that happens in many domains. However, many scholars that may be doing data-intensive work do not identify as data scientists, which limits the community that participates in academic data science. One question that came up for me in this summit is how we may identify and reach these groups.
I’m based in a social science department, and it took several years of involvement to get comfortable in the data science community. I also have many colleagues who do equally data-intensive work but have not yet entered the community. Data-intensive social scientists are often already siloed in their field as graduate students and often have much less formal training in computational methods, which can make it even harder to participate in the community. Expanding the connections between data science and domain specialists, especially graduate students, is something that I was inspired to take back to my home institution.
Faculty working across disciplines, either through research relationships or as joint hires between departments, face unique challenges around assessing their work and progress. In particular, we discussed the example of a faculty member with a primary appointment in one field and a secondary appointment in data science. When that faculty member is assessed for tenure, they must meet the requirements for tenure in the primary department, which often doesn’t have the same priorities as data science. In particular, interdisciplinary research does not always translate easily to tenure requirements, and means that this work is disincentivized and can make it difficult to engage with their secondary appointment in data science during the tenure process.
In graduate school, I have a primary affiliation with linguistics and a secondary affiliation with a data science institute. My research tends to be more interdisciplinary, and while I am supported by my advisor and department, I have experienced the additional challenges of communicating and sharing interdisciplinary research in both my domain and data science communities.
A secondary effect of the interdisciplinary issue discussed above is the increased workload associated with multiple affiliations. An example given in the summit was that scholars are expected to work at 100% in their home department, and if they carry an additional affiliation with a data science institute, this can end up being 120% commitment. This, on top of the already demanding workload in academia, can present an additional challenge to participating in the community. An important part of this discussion was that finding ways for data science work to replace rather than supplement existing commitments would go a long way towards alleviating this burden.
Especially in my first few years of grad school, I felt that my domain and data science affiliations were developing in parallel rather than in concert, which resulted in a heavier workload than I had originally envisioned for myself. In my later years of my program, I was able to bring these two things into harmony, especially as data science became more central to my professional goals. However, that early involvement was essential for building a foundation for the work that I’m doing now both in my research and professionally, and so I am grateful that I got involved at an early stage. I was inspired by these discussions to bring back what I’ve learned to help graduate students prioritize their time so they can experience the benefits of interdisciplinary research while mitigating extra workload.
Attending this summit as a graduate student helped me understand the broader structures that filtered down to impacting my individual journey through graduate school. My main takeaway was that scholars from many backgrounds can and should be involved in academic data science. Upon reflecting on my own experience during this conference, I realized that although there are some barriers to doing so, being involved at the earliest career stage possible benefits both the data science community and the scholars themselves. In the coming months, I hope to use the lessons that I’ve learned to help connect scholars in my field to the academic data science community, with a focus on those who are very early in their program.