Open Science & Open Scholarship
The Academic Data Science Alliance (ADSA) aims to nurture and support a community of researchers and educators who take responsibility for a just, equitable future where data science approaches are thoughtfully applied in all domains for the benefit of all. ADSA creates opportunities for collaboration and facilitates exchanges among individuals, organizations, and domains to advance the field of data science. These collaborations strengthen our community and communities around us. We promote the inclusion of people and research that inform data science in the context of societal perspectives and impacts on marginalized communities.
Consistent our Mission, Vision, Values, ADSA believes that data, insight, and knowledge should be accessible and co-created by and for everyone, and that barriers to access and collaboration are a detriment to the field. ADSA recognizes that knowledge should be shared widely and that working openly cultivates a trustworthy and transparent environment at all levels of engagement with the community. ADSA advocates for integrity of data science research and quality of education, communicating needs with the larger community of stakeholders and guiding policy. We know that ethics is essential to the development of data science as a field. We value and promote the incorporation of ethical decision-making in the development and application of data science tools and methods.
Links to related projects:
ADSA Guidance for Open Science
Recognizing that societies and associations play a crucial role in fostering discipline-specific norms and good practices, ADSA is pleased to share the following guidance on a range of open science-related practices.
The promotion of open science practices is integral to fostering innovation and collaboration within data science and across disciplines. To this end, ADSA strongly encourages researchers in our community to undertake the open science activities enumerated below. Ideally, all research outputs will be shared under licenses that encourage reuse with attribution (e.g., Creative Commons Attribution/CC BY), with the understanding that institutional and IP considerations may impact the conditions and timing under which outputs are disseminated. These practices, individually and collectively, foster replication, reproducibility, and transparency.
Share your research articles immediately upon publication, either by self-archiving copies of papers in a trusted open repository or by publishing through open access journals.
Preprints allow researchers to claim priority of discovery, receive community input, and demonstrate evidence of progress for funders and others. Deposit your submitted manuscripts, and subsequent versions, on a publicly accessible preprint server. Options include generalist repositories (e.g., Figshare, Open Science Framework, Zenodo), discipline-specific preprint servers (e.g., arXiv), and institutional repositories.
The independent confirmation of results and conclusions is critical for understanding scientific soundness and informing future research activities. Openly shared data can shed light on negative results and attempted research directions, with the potential to improve efficiency of the research process as well as lead to novel analyses and conclusions. Share any factual data that is needed for independent verification of research results via a trusted open repository - either generalist (e.g., Figshare, Open Science Framework, Zenodo) data-focused (e.g., Dataverse, Dryad), discipline-specific, or institutional. Ideally, data will be shared no later than publication.
Code & Software Sharing
Research data, especially in certain specialized fields, is often collected and stored in proprietary file formats that require software with costly licenses to open and analyze. If data are shared under such conditions, without accompanying code, algorithms, and software to allow others to open and analyze the files, this will represent a financial barrier to access that will delay or prohibit reuse of data, especially for underserved populations, citizen scientists, and early career researchers with less means. It might also present technical barriers that could limit the machine readability and reusability of data by assistive devices. Make any original code and software that are required to view data or to replicate analyses underpinning research available and accessible. Ideally, code and software will be shared no later than publication. Code and software of interest to the community but not linked to a specific publication should also be shared in a similar manner when possible. Code and software can be made available in a general repository, a discipline-specific repository, an institutional repository, and/or a repository specifically designed for code and software (e.g., BitBucket, GitHub, GitLab).
Unreported flexibility in data analysis decreases scientific credibility and invalidates common tools of statistical inference. By submitting a detailed study protocol and statistical analysis plan to a registry prior to conducting the work (i.e., preregistering with an analysis plan) the scientist makes a clearer distinction between planned hypothesis tests (i.e., confirmatory tests) and unplanned discovery research (i.e., screening or exploratory research). Preregistration is particularly important for studies that make an inferential claim from a sampled group or population, as well as studies that are reporting hypotheses. Submit a detailed study protocol and statistical analysis plan to a public registry prior to conducting the work. Report transparently on any deviations from preregistered plans.
Understanding the starting point for work—including assumptions—along with the final study and analysis can provide guidance to other researchers as to additional research avenues to explore. Protocols provide the context to interpret and understand how research results are derived. They can convey exactly what was done and the decisions/compromises that were made on route to a scientific discovery. Publicly share detailed descriptions of the methods, equipment, and reagents used in your experiment through a protocol sharing service (such as protocols.io) prior to or by the date of publication. Protocols can be made available in a general repository, a discipline-specific repository, an institutional repository, and/or a repository specifically designed for code and software (e.g., protocols.io).
Tangible Materials Sharing
Similar to code and data, broader access to research materials can accelerate research more broadly, promote the independent confirmation of results, and allow comparisons across research projects or products. The cross-cutting nature of data science means that a broad variety of materials (such as cell lines and geophysical specimens) may have value to researchers within and beyond the data science community. In some instances, materials such as cell lines embody a type of “machine” that, through cellular function and gene expression, can be used to make desirable products, such as a particular valuable protein. Where practical, deposit unique tangible materials (e.g. cell lines, plasmids/clones, antibodies, transgenic organisms) in public, widely-used repositories, such as Addgene for plasmids/DNA reagents/viruses and Jackson Labs for model systems lines.
Educational Resources Sharing
Open educational resources have been shown to increase student learning while breaking down barriers of affordability and accessibility. Share your educational and pedagogical resources (e.g., class notes, lesson plans, presentations, book chapters, textbooks, etc.) via trusted open repositories (e.g., OER Commons, OpenStax).
Rewards & Incentives
To help foster an open science culture at scale, it is necessary to properly incentivize open science activities. To that end, ADSA encourages data science schools and departments worldwide to formally embed open science considerations in hiring, review, tenure, and promotion processes. A necessary first step is to highlight its importance in the language of job advertisements, hiring criteria, internal review materials, and external letter requests. A second step is to adapt and adopt language developed by the National Academies of Sciences, Engineering, and Medicine (NASEM) Roundtable on Aligning Incentives for Open Science that can be embedded in annual reporting, job postings, and applications to (a) signal the department’s commitment to open science; and (b) begin a dialog with current and prospective members about their open science activities. Note that the NASEM-developed tool also includes a rubric to help departments evaluate the absolute and relative merits of received responses. By embedding clear signaling at key points of leverage, departments can greatly accelerate the adoption of practices that are to the ultimate benefit of the entire data science community. For an example of how to embed open science within promotion and tenure policy, see the University of Virginia School of Data Science.
Training & Support
Recognizing that the transition to an open science ecosystem is not frictionless, ADSA commits to serving as a conduit for disseminating training resources, good practices, lessons learned, and other materials as may be relevant to the data science community. This commitment includes, but is not limited to, participating in the Alliance for Open Scholarship, a coalition of professional societies collaborating to identify, articulate, and socialize appropriate open scholarship norms within their disciplines.
Additionally, ADSA recognizes that the costs associated with open science, including the time and effort required to properly prepare research outputs for maximal access and reuse, are nontrivial. We commit to working with the emerging cross-sector coalition of governmental agencies, private philanthropies, college and universities, professional societies, and others to ensure the transition to open science is equitable and transparent.
This guidance was jointly crafted with the Alliance for Open Scholarship.