None of Us are Average
Emerging AI Technologies
for Inclusion of Underserved Populations
On December 1 2021 ADSA presented the online session Emerging AI Technologies for Inclusion of Underserved Populations (VIDEO LINK) as part of the 2021-2022 ADSA Virtual Meeting. The panel – all medical researchers who are members of the University of Texas AI Consortium called Matrix – talked about how AI technologies can help with the inclusion of underserved populations, addressing the persistent lack of representative data samples and how that gap can negatively impact deep learning models. The panel discussed the curation of more representative data and the movement toward democratizing AI, drawing attention to several case studies.
Chair Dhireesha Kudithipudi (UT San Antonio) set up the discussion by pointing out that San Antonio is a “minority-majority” city, and that UT San Antonio houses over 65% minority and/or first generation Hispanic students. But these demographics are not yet reflected in medical data used by UT researchers. While AI has helped make great strides in health and health care, with the potential to positively affect large groups of people, there’s much work to be done in the inclusion of underserved populations. She referred to this as a “research area close to our hearts”, with AI having great potential to benefit society, yet also acknowledging AI’s “blind-spots” when working in the real world and the need for curating new data sets that can help design inclusive AI that works for all, not just a few.
The panelists then introduced themselves and their work.
Amina Qutub (UT San Antonio) works in biomedical engineering and uses AI to characterize biological communication on all scales. In her experience, academic researchers often use surveys that reflect their own ethnicity and age, and use the surveys on surrounding communities who may not share those specific factors. She stressed the importance of identifying and acknowledging data bias and the need for incorporating more equitable data sets. One potential work-around is what she referred to as “being one’s own control” and using a longitudinal study – not comparing race against race but using one’s own neurosignature, and broadening the dataset to Identify individuals across many different behavior metrics. Thus a cohort would not be balancing race or gender but balancing health metrics, comparing people to themselves over time, across scales from the cellular up to the behavioral level. She stressed the importance of making sure each line of code can be explained and interpreted, resulting in “explainable AI”, ensuring that broad audiences are able to understand messaging about data acquisition and access.
Mohamad Habes (UT San Antonio) works in radiology and epidemiology, using AI for capturing and assessing neurological changes in the course of his work on Alzheimers and Parkinsons. He uses neuroimaging data gathered through MRI and positron imaging scans, which produce large amounts of data that need to be archived for use with analytic tools in order to look for biologically meaningful information. The capture of age-related lesions typically requires looking at 1000s of scans, requiring automated tools to quantify the data. He is looking to train models on data from across all populations. However, data from minority populations have not yet been integrated, even though their risk factors are different from those experienced by white and more affluent populations. His group is just starting to learn about differences found in data from non-white groups through multiple projects focusing on the Hispanic population in San Antonio. One research goal is to evaluate the contribution of white matter lesions to dementia, for which he is gathering MRI scans and blood samples from 200 local Hispanic community members. He will also use data gathered from that community to look at differences in gait prompted by dementia.
Paula Kay Shireman (UT Health, San Antonio) is a vascular surgeon whose research is in microbiology. Using predictive modeling and ML, she focuses on frail patients who have low social and economic status. Dr. Shireman pointed out that medical records were designed as billing systems and have not yet advanced, creating real challenges to the use of AI. She described algorithms that are mostly drawn from patient data from California, Massachusetts and New York, and algorithms based on claims data that leave out the granularity of health data, saying, “Algorithms fail to perform in the real world.” Her goal is for algorithms to be user-friendly and trusted, interpretable and actionable, fully integrated into clinical workflows.
The panel discussed how to ensure that data are representative across populations, and how to approach bias against recruiting minorities. The issue of how to build trust was explored, with suggestions on using students and social media to help recruit local community members, and visiting local community centers such as churches. Participant bias can arise when volunteers for academic research come from fairly narrow socio-economic groups (see for example this paper). Other barriers to inclusive recruitment are financial: imaging is expensive, substantial travel is often required, and bilingual support staff are often needed. An additional issue is the cultural burden of some diseases and the need to raise awareness of the seriousness of diseases such as Alzheimers. Another discussion point was how to represent groups that aren’t within discrete demographic groups, such as cross-race or cross-gender. Gender identity data are only recently being collected as part of standard medical practice.
The researchers talked about their goal of training AI models with as much balanced data as possible, building algorithms that are more sensitive to contextual situations and that can learn from very small samples to support learning on rare diseases. In other words, algorithms that understand that each of us is an individual. Systematic processes for data collection could be improved to build inclusive data sets, to allow for free-form data and to represent how people want to define themselves.
Collaborations and technical challenges were also highlighted in the discussion. In neuroimaging, noise is a systemic issue; addressing it requires collaboration between clinicians and data scientists and a platform for ongoing dialogue. Interoperability is a more general challenge, with many barriers to sharing health data across systems. While telemedicine has helped to reach more patients, its use also reveals disparities in the availability of hardware, software and connectivity. Yet the technical is easier to solve than the ethical or legal, with standards for equitable AI needed alongside FAIR AI models that are not biased toward any specific group. The panel stressed the importance of making sure to use methodologies that ensure security so that a broad audience trusts what you’re doing with their data.
These researchers are looking for new ways to design studies, but in the meantime, they know that they still have to work with imbalanced data. They closed with acknowledging the balance between representing minoritized groups while also protecting them from prejudice, an expression of the great responsibility that comes with the great power of AI.