A Taxonomy for Data Science Masters Degree Programs

Masters in Data Science Taxonomy

Developed by the Standardization and Transparency in Data Science Masters Degree Programs working group, this taxonomy describes common, and sometimes critical, competencies for data science at the Masters degree level. Many, but not all, of these topics are required for earning a Masters degree in data science.

Foundations of Analytics

Statistics

  • Random variables and probability
  • Data Collection Design, e.g. sampling (random, convenience, stratified), experimental replication, confounding, blocking
  • Inference, e.g. estimation (point and confidence) bias, precision, hypothesis testing: errors, false positives
  • Modeling (Stochastic), e.g. random errors, dimension reduction, diagnostics, feature selection
  • Multivariate Analysis, e.g. principal components, clustering, discriminant analysis
  • Statistical Learning and Bayesian Methods, e.g. bootstrapping and bagging, regularization

Mathematics

  • Arithmetic Geometry
  • Set theory and basic logic
  • Matrices and basic linear algebra
  • Networks and graph theory
  • Optimization
  • Calculus
  • Induction (and principles of recursion)
  • Information Theory

Data Analytics

  • Exploratory analysis
  • Artificial Intelligence
  • Data Mining
  • Machine Learning

Data Modeling

  • Model development and deployment
  • Model risks and mitigation strategies
  • Model analysis and validation
  • Data visualization


Systems and Implementation

Computing and Computer Fundamentals

  • Data Structures
  • Algorithms, e.g. Big O notation, analysis, proof of correctness
  • Simulations

Data Engineering

  • Database design
  • Data preparation and cleaning
  • Records retention and curation
  • Big data systems
  • Data security and privacy
  • Infrastructure, e.g. cloud computing, HPC

Software Development and Maintenance

  • Programming, e.g. R, Python, C, javascript/html, SQL
  • Collaboration and version control, e.g. git/Github


Data Science Project Design

Users and Impacted Groups

  • Implications of analysis and results
  • Defining the user and UX design
  • Story-telling with data
  • Human-centered design

Research Methods

  • Defining data-driven questions
  • Computational logic 
  • Data-driven decision making
  • Data/research lifecycle
  • Analysis and presentation of decisions

Data 

  • Data acquisition
  • Data governance
  • Data provenance and citation

Open Science by Design

  • Reproducibility, replicability, repeatability
  • Containers
  • Interactive computing

Visualization

  • Grammar of graphics
  • Static and dynamic visualization design


Data Science In Practice:
Professional Practice and Responsible Data Science

Responsible Practices with Data and Ethics

  • Legal consideration
  • Data privacy
  • Data security
  • Data governance
  • Research integrity
  • Analysis for security
  • FAIR and CARE principles
  • Understanding and uncovering bias
  • Interpretability and Explainability
  • Human impacts of design
  • Responsible data collection
  • Understanding impacted communities

Effective Collaboration in Teams

  • Working with stakeholders
  • Working with domain experts
  • Project management
  • Infrastructure cost and benefits
  • Product management
  • DevOps 

Communication

  • Technical writing skills
  • Communication (oral) and presentation skills
  • Documentation

 

REFERENCES USED IN THE DEVELOPMENT OF THIS TAXONOMY

Computing Competencies for Undergraduate Data Science Curricula, ACM Data Science Task Force

National Academies of Sciences, Engineering, and Medicine 2018. Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. https://doi.org/10.17226/25104.

EDISON Data Science Framework: Part 1. Data Science Competence Framework (CF-DS) Release 3

Learning-outcomes-for-Masters-level-Data-Science-Programs(2021-03-17) (unpublished survey responses collected by this working group)

Practice of Data Science, Defining a Field, a School and a Curriculum, Brian Wright, University of Virginia

Fayyad, U., & Hamutcu, H. (2020). Toward Foundations for Data Science and Analytics: A Knowledge Framework for Professional Standards. Harvard Data Science Review, 2(2). https://doi.org/10.1162/99608f92.1a99e67a