A Taxonomy for Data Science Masters Degree Programs

PROJECT OVERVIEW

With the rapid emergence of data science degree programs it has become challenging for students, faculty, staff, and administrators to meaningfully compare program offerings. To start to address this challenge, ADSA established the Standardization and Transparency for Data Science Degrees (STIDS) Working Group. The initial charge for this Working Group was to create a careful taxonomy for data science competencies in Masters degree programs.

Here we present the STIDS Working Group Data Science Taxonomy for Masters Degrees. Using this taxonomy as a framework, the working group created a survey for those implementing Masters degree programs in data science (or similarly named degrees) to collect information about the skills and competencies covered across various degree programs. The ultimate goal is to make this information transparent and openly available for prospective students, administrators, and future employers of graduates.

Education icon

Our initial pilot survey results can be found on our STIDS page. The pilot survey collected information from 21 unique Data Science Masters degree programs. If you would like to contribute information about your institution's Masters degree program, you can access the survey form below. We plan to continue to collect data from institutions and offer new ways for potential students, program administrators, and employers to browse and search the data.

CONTRIBUTORS

Nairanjana Dasgupta (Washington State University)
H. V. Jagadish (University of Michigan)
Alexandra Johnson (Washington State University)
Purush Papatla (University of Wisconsin, Milwaukee)
Sungjune Park (UNC Charlotte)
Micaela Parker (ADSA)
Abani Patra (Tufts University)
Steve Van Tuyl (ADSA)
Brian Wright (University of Virginia)

Feedback Form

Have feedback on the Taxonomy? Let us know!

Add your program

Do you want to include your MS degree in our data collection? Click through to fill out the survey.

Woman being thoughtful

Questions?

If you have any questions, please email us!

Data Science Taxonomy

Below is the taxonomy developed by the STIDS working group for Masters degree programs in data science. These categories and topic areas represent common, and sometimes critical, competencies for data science at the Masters degree level. Many, but not all, of these topics are required for earning a Masters degree in data science.

Foundations of Analytics

Statistics

  • Random variables and probability
  • Data Collection Design, e.g. sampling (random, convenience, stratified), experimental replication, confounding, blocking
  • Inference, e.g. estimation (point and confidence) bias, precision, hypothesis testing: errors, false positives
  • Modeling (Stochastic), e.g. random errors, dimension reduction, diagnostics, feature selection
  • Multivariate Analysis, e.g. principal components, clustering, discriminant analysis
  • Statistical Learning and Bayesian Methods, e.g. bootstrapping and bagging, regularization

Mathematics

  • Arithmetic Geometry
  • Set theory and basic logic
  • Matrices and basic linear algebra
  • Networks and graph theory
  • Optimization
  • Calculus
  • Induction (and principles of recursion)
  • Information Theory

Data Analytics

  • Exploratory analysis
  • Artificial Intelligence
  • Data Mining
  • Machine Learning

Data Modeling

  • Model development and deployment
  • Model risks and mitigation strategies
  • Model analysis and validation
  • Data visualization


Systems and Implementation

Computing and Computer Fundamentals

  • Data Structures
  • Algorithms, e.g. Big O notation, analysis, proof of correctness
  • Simulations

Data Engineering

  • Database design
  • Data preparation and cleaning
  • Records retention and curation
  • Big data systems
  • Data security and privacy
  • Infrastructure, e.g. cloud computing, HPC

Software Development and Maintenance

  • Programming, e.g. R, Python, C, javascript/html, SQL
  • Collaboration and version control, e.g. git/Github


Data Science Project Design

Users and Impacted Groups

  • Implications of analysis and results
  • Defining the user and UX design
  • Story-telling with data
  • Human-centered design

Research Methods

  • Defining data-driven questions
  • Computational logic 
  • Data-driven decision making
  • Data/research lifecycle
  • Analysis and presentation of decisions

Data 

  • Data acquisition
  • Data governance
  • Data provenance and citation

Open Science by Design

  • Reproducibility, replicability, repeatability
  • Containers
  • Interactive computing

Visualization

  • Grammar of graphics
  • Static and dynamic visualization design


Data Science In Practice:
Professional Practice and Responsible Data Science

Responsible Practices with Data and Ethics

  • Legal consideration
  • Data privacy
  • Data security
  • Data governance
  • Research integrity
  • Analysis for security
  • FAIR and CARE principles
  • Understanding and uncovering bias
  • Interpretability and Explainability
  • Human impacts of design
  • Responsible data collection
  • Understanding impacted communities

Effective Collaboration in Teams

  • Working with stakeholders
  • Working with domain experts
  • Project management
  • Infrastructure cost and benefits
  • Product management
  • DevOps 

Communication

  • Technical writing skills
  • Communication (oral) and presentation skills
  • Documentation

 

REFERENCES USED IN THE DEVELOPMENT OF THIS TAXONOMY

Computing Competencies for Undergraduate Data Science Curricula, ACM Data Science Task Force

National Academies of Sciences, Engineering, and Medicine 2018. Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. https://doi.org/10.17226/25104.

EDISON Data Science Framework: Part 1. Data Science Competence Framework (CF-DS) Release 3

Learning-outcomes-for-Masters-level-Data-Science-Programs(2021-03-17) (unpublished survey responses collected by this working group)

Practice of Data Science, Defining a Field, a School and a Curriculum, Brian Wright, University of Virginia

Fayyad, U., & Hamutcu, H. (2020). Toward Foundations for Data Science and Analytics: A Knowledge Framework for Professional Standards. Harvard Data Science Review, 2(2). https://doi.org/10.1162/99608f92.1a99e67a