Data Science Coast To Coast
The DS C2C seminar series, hosted jointly by seven academic data science institutes, provides a unique opportunity to foster a broad-reaching data science community.
In the first half of 2021, we will host five seminars, each featuring one faculty member and one postdoctoral fellow from two universities. Each speaker will give a 20-minute talk about ongoing projects and motivating issues, followed by 20 minutes of discussion with the audience. These seminars will be the launching point for follow-on research discussion meetings which will hopefully lead to fruitful collaborative research.
We strongly encourage you to sign up ahead of time, and indicate your area of research and your interest in follow-up discussions.
All talks will be at 3 pm EST/noon PST.
~ 16 June 2021 ~ Ocean Dynamics
Miguel Jimenez-Urias (Postdoctoral Fellow, Earth and Planetary Sciences, Johns Hopkins University), Oceanic stirring and Mixing of Passive Scalars: A Novel Closure
Laure Zanna (Professor of Mathematics & Atmosphere/Ocean Science, New York University), Blending Machine Learning and Physics to Improve Climate ModelsZoom Link
Dr. Miguel Jimenez-Urias
Scale-Dependent Shear Dispersion: Stirring and Mixing of Passive Tracers in the Ocean
Tracers that help regulate biogeochemical cycles in the ocean and atmosphere have complex spatial distributions due to the combined effect of stirring by the multi-scale shearing motions that are ubiquitous and persistent in the ocean, and the small-scale diffusive mixing resulting in spatially inhomogeneous, enhanced mixing rates. Computer models need to parameterize the effect of shear dispersion due to restrictions on computer power and numerical stability when running climate-scale ocean simulations. Such parameterizations, however, fail to represent scale dependency, an assumption not strictly applicable to the ocean. In this talk, we present new results describing scale-dependency of shear-dispersion by idealized oceanic flows that can lead to a better understanding and representation of tracer distribution in the oceans.
Dr. Rosemary Gillespie and Dr. Shelly Trigg
Data Integration Across Space and Time to Infer Biodiversity Dynamics
The world’s ecosystems are under serious threat due to ongoing stressors of the Anthropocene, notably habitat destruction, climate change, loss of biodiversity, disease, and the spread of invasive species. Biodiversity in particular is suffering catastrophic decline and tracking and understanding the factors affecting change is a major challenge that we are currently not meeting. Unless we develop new approaches, it will take centuries to document biodiversity and identify attributes that render ecological communities robust and resilient to change, and by then it will likely be too late. Here, we examine insights we can gain into biodiversity dynamics by looking at ways that we can first assess spatial patterns of diversity, abundance, and foodwebs, and determine the response of the organisms within these communities, to the changing environments that surround them. We have piloted an environmental DNA approach to generate estimates of abundance and interactions of macroorganisms in terrestrial systems across different spatial scales. By applying various theoretical and modeling approaches to the vast amounts of genetic data, we can encapsulate the “status” of a biological community in terms of its integrity and potential resilience to change. Moreover, by analyzing these data through slices in time (months, years, decades, or longer), we can assess how the community might accommodate, adapt, or collapse in response to change. These changes include habitat transformation, climate modification, fire, or disease. The critical data challenge is to integrate data that characterize the biological community, genomic data that reveal the response of any given taxon to that change, with past, present, and modeled climate change data. We highlight the role of historic collections from museums, and the physical record they provide of past environments.
Diversity in Animal Response to Environmental Change
How will ecosystems tolerate the climate and ocean change occurring now and predicted for the future? To begin addressing this question, we can subject different animals to different anthropogenic pressures and evaluate their responses. We can more sensitively and comprehensively assess responses by performing molecular surveys using omics technologies (e.g. genomics, proteomics, metabolomics, etc.), which allow us to more clearly see the cellular processes that underlie environmental tolerance and intolerance. This data can also help us compare between species since all species have these general molecules (DNA, proteins, metabolites) in common. I'm going to present data from different studies on marine invertebrates exposed to different environmental conditions, and describe how I used multiple data science approaches to distill large omics datasets into dominant biological pathways associated with environmental tolerance and intolerance. After summarizing responses across species and conditions, I will propose future directions and data science applications for the wealth of environmental omics data being generated.
Data equity and open science
Dr. Ciera Martinez and Dr. HV Jagadish
Open science in the wild: principles to build reproducible and collaborative data analysis workflows
The academic research system is not built to incentivize open science practices, but transparency and reproducible methodology allows researchers to critically assess and build upon results to fuel scientific discovery and supports a more collaborative and equitable research community. Open science and data practices are often presented as ideals, but rarely do we train for how to handle the intricacies that emerge from every unique research project life cycle. In this talk I will present the ERP (Explore, Refine, and Produce) workflow – a three-phase data analysis workflow that guides researchers to create reproducible and responsible data analysis workflows. Each phase is centered on how to make decisions based on the audience the research is communicated, the research products created, and the career aspirations of the researchers involved. We hope this work helps create a community of practice for how we design and train for reproducible data intensive research and helps demystify data analysis for both students new to research and current researchers who are new to data-intensive work.
Data Equity: A Core Requirement for Responsible Data Science
It was only recently that we regularly used to hear statements like “Let the data speak for themselves”. Today, we instead hear worries about fairness of data-driven systems and AI. Nevertheless, a focus on a specific formulation of fairness in one data science step is far too narrow to be the whole story. We need to address inequitable representation in the data record, inequities due to the data scientist’s world view being reflected in the model, inequities in the resulting outcomes, and inequities in access to fruits of the analysis. In this talk, I will lay out a research agenda in this direction, and invite you to join me.
Dr. Arya Farahi and Dr. Kate Starbird
Quantifying and Mitigating Sources of Bias in a Decision-Support System
Applications of AI decision-support systems are increasingly shaping the fabric of our society. These systems can exhibit and exacerbate undesired biases that might hurt the under-represented population. Therefore, it is critical to evaluate these systems not only from a lens of predictive power and the rate of error but also from a lens of trustworthiness and fairness. In this talk, I will focus on two specific sources of bias in a decision-support system and propose mitigation strategies. In the first part, I will discuss biases originated from historical decisions and are reflected in data. I propose a metric of quantifying disparity in data and illustrate how we can alleviate these historical biases by applying simple modification to a decision-making system. In the second part, I will shed light on biases that are originated from predictive models. Predictive models are a central part of any decision-making system. The end-user act based on the information provided by these models. Biased or untrustworthy information mislead the end-user or incentivize the public to mistrust the system. I will present our mitigation method KiTE. KiTE is a hypothesis-testing framework with provable guarantees that enables practitioners to (i) test whether a model provides trustworthy information with respect to each sub-group of a population and (ii) estimate and correct for prediction bias at the individual and group levels.
Revealing the "Big Lie”: Collaborative Data Science for Rapid Response to Online Disinformation
In this talk, I’ll present preliminary research results from ongoing efforts to understand the spread of disinformation about the 2020 Election. First, I’ll describe the mission, structure, and everyday work practices of the Election Integrity Partnership — a multi-stakeholder collaboration that addressed mis- and disinformation about the 2020 U.S. election in (near) real-time through rapid response data science. Next, I’ll take you through some of our analyses to show how the “Big Lie” — the sustained effort to sow doubt in the results of the 2020 election — took shape on social media platforms throughout the latter half of 2020. I’ll highlight the participatory nature of this disinformation campaign and reveal some of the “super spreader” accounts that helped produce and sustain it. Finally, I’ll note how some of the social media platforms have evolved their strategies to address this kind of disinformation and wrap up by talking about what might come next, both in terms of platform policies and future collaborations for rapid response to disinformation.
Robotics and human-computer interaction
Dr. Lydia Kavraki & Dr. Angela Radulescu
Robotics in the Era of Data Science
Advances in mechanisms, control theory and algorithms are delivering robots that explore the deep seas and distant planets, robots that work tirelessly in fulfillment centers, and robots that increasingly interact with people. This talk will touch upon recent developments in robotics with emphasis on our own work in motion planning. It will then discuss the tremendous impact that the integration of research in robotics, AI, and data science will have in our lives and society as a whole.
Bio: Lydia E. Kavraki is the Noah Harding Professor of Computer Science and the Director of the Ken Kennedy Institute at Rice University. Her interests span Robotics, AI, and Biomedicine. In 2020 she received the ACM-AAAI Allen Newell Award and the IEEE Robotics Pioneer Award. She is a member of the National Academy of Medicine, the Academy of Athens, and Academia Europaea. Information about her work can be found at http://www.kavrakilab.org
Towards naturalistic representation learning in health and disease
Humans learn more from their experiences than just how to behave in different situations. They also learn to organize experiences into internal representations that facilitate flexible behavior, in domains ranging from simple decision-making to goal-directed action in naturalistic, richly structured environments. In the first part of the talk, I will show that such representation learning relies on selective attention to constrain the dimensionality of environments that humans learn from; and that attention is in turn guided by inference over what features of the environment are relevant for the task at hand. In the second part of the talk, I will present ongoing work leveraging virtual reality (VR) in combination with eye-tracking to study representation learning in naturalistic settings. I will conclude with a discussion of how predictive modeling of behavior in VR may yield insights into cognitive factors that affect mental health.
Bio: Angela Radulescu is a Moore-Sloan Faculty Fellow at the New York University Center for Data Science. Angela earned her Ph.D. in Psychology and Neuroscience from Princeton University, where she did research in Yael Niv's group on computational mechanisms of selective attention during reinforcement learning. She is broadly interested in how humans learn from interaction from the environment, and in how learning can lead to changes in mental health.
Dr. Jeanne Holm
Using Data to Improve Equity
In the midst of a pandemic and economic stress, governments have to make real-time decisions on maximizing safety and minimizing economic and personal impact. How can we use data, behavioral science, and our shared need for safety to create a more connected ecosystem where government, residents, and businesses share information in more intertwined ways. Getting access to that information, equitably, is a challenge throughout the world. In the City of Los Angeles, we use data-driven decisions to pave the way to connect all 4,000,000 residents with the information and services they need to thrive. Learn how Los Angeles is using data science and leading-edge technology to connect all of our communities, residents, and businesses.
Dr. Alex Szalay
From Sky Surveys to Cancer: Spatial Data Everywhere
The talk describes a 25 year journey leading from the Sloan Digital Sky Survey to a wide range of projects in data science. There are many common threads: the need for extreme interactivity, the need for flexible data aggregation and the commonality of spatial data. The size of data sets have grown almost a million fold, but user expectations for almost instant results has not changed. The talk will describe the gradual evolution of the SciServer, and how new interactive metaphors to interact with hundreds of terabytes of turbulence simulations emerged. We will discuss how machine learning and AI tools are transforming science, from simulations to how large experiments are designed and executed. We will also emphasize that much of these new developments still rely on having unique high value data sets at our fingertips, and how the long term survival of these is entering a critical, endangered phase.
Dr. Talitha Washington
‘Why We Can’t Wait’: Using Social Justice to Transform Data Science
Located in the "Cradle of the Civil Rights Movement", the Atlanta University Center (AUC) Data Science Initiative has a keen focus to advance social justice through data science. The AUC is a consortium of four historically black colleges and universities (HBCUs) in Atlanta, Georgia: Clark Atlanta University, Morehouse College, Morehouse School of Medicine, and Spelman College. The inaugural director of the AUC Data Science Initiative, Dr. Talitha Washington, hopes to move data science towards ethics and fairness for Black America because “whatever affects one directly, affects all indirectly.”
For questions/comments, please contact email@example.com