Non-ableist Data Science
28 MARCH 2022
FROM THE DESK OF CATHERINE CRAMER
(read more about Catherine Cramer)
On January 26, as part of ADSA’s 2021-2022 Virtual Annual Meeting, Anat Caspi, Director of the Taskar Center for Accessible Technologies at the University of Washington, gave a presentation on Non-Ableist Data Science. The Taskar Center focuses on applying research in data science and machine learning toward improving quality of life for individuals with disabilities, addressing issues resulting from day-to-day friction caused by barriers within society. Her talk focused on how data science can be made more inclusive of both practitioners and consumers of all abilities, and was based on the outcomes from a participatory workshop that ran in 2020. Link to video recording HERE.
Many researchers are working on biases in data science and artificial intelligence. The Center’s focus is on how disability complicates dominant approaches, how we can address the challenges that dominant approaches impose on the non-ableist community, and the means by which technology mediates disabled people’s experience in the world. Pointing out that “technology is often used to alleviate harms, but non-ableism in data science has its own nuances”, Caspi described data science biases that are of particular concern for people with disabilities; namely, issues relating to privacy, consent, and the high stakes of being misclassified, and that excluding considerations of disability and the scholarship of disability means efforts to address data science biases aren’t comprehensive in accounting for bias.
In the interest of helping others adopt a non-ableist lens, the Taskar Center has adopted a bias-discovery toolkit developed by a group at the Assembly Program at MIT called “AI blindspot cards.” Caspi noted that even the toolkit, whose creators developed to help researchers identify key concepts that may bias their work, uses an ability-based term like ‘blindspot’ to describe and deprecate researchers’ inattention to bias in AI. The key concepts, however, are useful for the workshop discussion to explore the facets of biases in data science, and Caspi added several additional concepts to round out the discussion with a non-ableist lens. As Caspi noted, data science is becoming increasingly important, and how we analyze data and what we publish as our conclusions impacts lives, saying, “When the pipeline ignores variations among humans it can bias the decision-making process.”
Caspi pointed out that as there is a high degree of heterogeneity within the disability community, even well-intentioned participatory design does not necessarily address the context in which decisions are made nor technology deployed, thereby depriving individuals of power, saying participatory design could sometimes be used as a “smoke screen” that serves to deny people the power to shape or reshape data science. In fact, non-ableist individuals may not actually be involved, or have the power for shaping results or the resulting technologies. In other words, they might be in the pipeline but not actually included.
Caspi addressed the conundrum faced by for-profit companies, who might be interested in developing technologies that serve the relatively small non-ableist communities, yet realistically this may not be a profitable endeavor. She does, however, call out Apple’s Tim Cook, who, when presenting IOS accessibility features, was pressed by an investor to show a profitable business model and apparently said, “If you are just investing in us for business outcomes you shouldn’t be investing with us.”
Caspi also offered her view of process bias – the process of data science itself produces knowledge and technologies that impact us daily. These products form a pipeline of analytic tools designed to deploy systems and make inferences. Each of the steps can be interrogated for ableist bias. Algorithms are susceptible to making erroneous decisions, based on models that are simple and reductive. She defined ableist bias as systemic bias that privileges individuals who are non-disabled, a set of social and political structures that can marginalize and subject people to differentiation and shape discourse in pernicious ways. There are multitudes of disabilities, each of which contains its own, significantly different lived experience, which, irrespective of classification, have been marginalized culturally, historically, or politically.
What kind of data science research practices could produce more desirable futures for people with disabilities, and how can we measure the ways to change the course of the outcomes and other systemic interventions beyond focusing on the tools themselves? Caspi suggests turning theoretical discourse into practical critical exercises through methodic reflection on the ways values, priorities, and biases shape the data science process. The AI Blindspot Cards were utilized to give participants concrete, visually appealing prompts. The purpose of the cards is to break down the data science pipeline of planning, building, and deploying in order to enter into a discovery process.
Discussing the applications of data science as it applies to ability, Caspi points out that AI marketing might invoke disabled people, presenting images of AI as being a form of assistive technology designed to help people with their disability and negotiate barriers, yet often these processes don’t include the lived experience of people with disabilities, but instead are more of an afterthought. One example is the reinvented system for American Sign Language (ASL), largely produced without deaf people and without an understanding of the contextual breadth of ASL.
As one of the key concepts for reflection on the data science process, Caspi brought up Scope and Context, prompting questions around the limits of data science approaches, such as, is the right data being collected? Often the scope of data science is to save costs, yet when institutions invest in a project, they are not just investing in a business case but are looking for an outcome to benefit people or a community. Her example here was the approach that the Boston Public Schools took to address changing school start times, requiring the identification of optimal start times for everyone, including the school system itself in the form of cost savings. MIT constructed an algorithm to assess all of the options to trim bus routes (there were a novemtrigintillion of them – 1 followed by 120 zeroes), yet still managed to have blindspots with respect to who they were serving, such as missing parental needs, shift workers, and the non-ableist community. Caspi asks: how are you sure that you have representative data? Was your AI system trained on the same communities that will be affected by it? And how are they classified in your system? (E.g. people in wheelchairs have different needs from pedestrians.) Data collection and labeling lead to many misunderstandings of terms, methodologies, and classifications, revealing a lack of consistency.
Another question to be considered is whether the system is open to abuse by “bad actors”. The example here is a system designed to help people with disabilities in job hiring, yet it could also be used to discriminate against them. There is a risk for discrimination by proxy, for an algorithm to have negative effects on a specific community, as some attributes might be hidden but others are proxy characteristics. At issue is how people self-identify and disclose, and are thus at risk for being labeled and defined by others, which stems from a cultural understanding of ability, with disability often seen as a deviation. Explainability is key, understanding how a model identifies certain features is essential to mitigating proxy bias. Generalization errors lead to data not being used as intended because of cultural biases.
Caspi finished with the right to contest: being given the opportunity to contest decisions made by AI, giving agency to people who may not have been part of the work. The ultimate goal is building inclusive data sets, with presentation, visualization, and dissemination legible and visible to everyone. She observed that data analytic practices are entrenched with assumptions, for instance, that humankind can be properly represented by unimodal Gaussian distributed attributes. Such assumptions leave limited ways for practitioners to adequately represent diversity in populations. Instead, she recommends probing the use of Gaussian models and addressing populations with disabilities as distinct populations, and making sure that collected data sets are not just accessible but also guarded from discriminatory practices.