Study Guide Azure Data Scientist Associate DP-100

Satish Gunjal
15 min readJun 5, 2021

--

Prerequisites

A candidate for this certification should have knowledge and experience in data science and using Azure Machine Learning and Azure Databricks. This exam was updated on May 20, 2021. In the new study guide, Azure Databricks is a new addition.

Why to do it?

This certificate provides you an opportunity to demonstrate the knowledge of managing Azure resources for machine learning; run experiments and train models; deploy and operationalize machine learning solutions; and implement responsible machine learning. DP-100 is targeted towards Data Scientists.

Exam, Languages & Price

To get this certificate you need to pass ‘Exam DP-100’, which is available in English, Japanese, Chinese (Simplified), Korean. Price is based on the country in which the exam is proctored. In the USA it’s for $165 USD and in India it’s for ₹11945 INR.

You may get any number of questions between 40 and 60. For every attempt, rendering of the questions will be different. DP-100 exam duration is 210 minutes, out of which 180 minutes are to answer the questions and 30 minutes are for reading the instructions. To pass the exam, you will need to score 700 points out of 1000. I got 903 points, Link to my certification badge

Skills Measured

Please note skills measured are intended to illustrate how Microsoft is assessing that skill. This list is not definitive or exhaustive. In most cases, exams do NOT cover preview features, and some features will only be added to an exam when they are GA (General Availability). The content of this exam was updated on May 20, 2021.

Above details are based on the updates available at the time of writing this article, for latest skill outline please refer link

Learning Path

There are two ways to prepare for this exam. You can either self-teach using free online resources or can go for instructor led path. In this article, I will list all the required resources from Microsoft Learn to clear this exam. Remember, the objective should be to achieve the necessary knowledge instead of just clearing certifications. If you google it, you will find tons of material with question and answers for this exam. But that won’t help you to gain necessary knowledge. In machine learning terminology, use all the learning material as ‘training data’ and use online question dumps as your ‘test data’. Remember, if you use ‘test data’ during training then it may result in good score but will definitely fail in real life scenarios!

Below are the learning resources for each of the section mentioned in the skill measured table. At the end of each learning resource there is a knowledge check section, to test your understanding of a particular module.

Create machine learning models

  • It covers core principles of machine learning and how to use common tools and frameworks to train, evaluate, and use machine learning models.
  • This section will be required about 6 hours to complete. It contains 5 modules.
  • Only prerequisite is to have knowledge of basic mathematical concepts and python programming experience. In case you are not familiar with Python, you can refer this Microsoft Learn path to get started with Python.

Module 1: Explore and analyze data with Python

  • This section will be required about 1 hour to complete. It contains 4 units.
  • In this module, you will learn: Common data exploration and analysis tasks., How to use Python packages like NumPy, Pandas, and Matplotlib to analyze data.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 2: Train and evaluate regression models

  • This section will be required about 1 hour to complete. It contains 4 units.
  • In this module, you’ll learn: When to use regression models., How to train and evaluate regression models using the Scikit-Learn framework.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 3: Train and evaluate classification models

  • This section will be required about 1 hour to complete. It contains 4 units.
  • In this module, you’ll learn: When to use classification., How to train and evaluate a classification model using the Scikit-Learn framework
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 4: Train and evaluate clustering models

  • This section will be required about 44 minutes to complete. It contains 4 units.
  • In this module, you’ll learn: When to use clustering., How to train and evaluate a clustering model using the scikit-learn framework
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 5: Train and evaluate deep learning models

  • This section will be required about 2 hours 14 minutes to complete. It contains 9 units.
  • Note that this module requires understanding of classical machine learning techniques. Please complete all above modules first and also try to solve few regression and classification problems before starting with Deep Learning.
  • In this module, you’ll learn: Basic principles of deep learning., How to train a deep neural network (DNN) using PyTorch or Tensorflow., How to train a convolutional neural network (CNN) using PyTorch or Tensorflow., How to use transfer learning to train a convolutional neural network (CNN) with PyTorch or Tensorflow
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Create no-code predictive models with Azure Machine Learning

  • It covers how to use Azure Machine Learning to create and publish models without writing code.
  • This section will be required about 3 hours 27 minutes to complete. It contains 4 modules.
  • Only prerequisite is the ability to navigate the Azure portal

Module 1: Use automated machine learning in Azure Machine Learning

  • This section will require about 45 minutes to complete. It contains 9 units.
  • In this module, you will learn how to use the automated machine learning user interface in Azure Machine Learning
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 2: Create a Regression Model with Azure Machine Learning designer

  • This section will require about 55 minutes to complete. It contains 10 units.
  • In this module, you will learn to train and publish a regression model with Azure Machine Learning designer.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 3: Create a classification model with Azure Machine Learning designer

  • This section will require about 1 hour to complete. It contains 10 units.
  • In this module, you will learn to train and publish a classification model with Azure Machine Learning designer
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 4: Create a classification model with Azure Machine Learning designer

  • This section will require about 47 minutes to complete. It contains 10 units.
  • In this module, you will learn to train and publish a clustering model with Azure Machine Learning designer
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Build and operate machine learning solutions with Azure Machine Learning

  • It covers how to use the Azure Machine Learning Python SDK to create and manage enterprise-ready ML solutions
  • This section will be required about 10 hours 11 minutes to complete. It contains 14 modules.
  • This learning path assumes that you have experience of training machine learning models with Python and open-source frameworks like Scikit-Learn, PyTorch, and Tensorflow.

Module 1: Introduction to the Azure Machine Learning SDK

  • This section will require about 1 hour to complete. It contains 8 units.
  • In this module, you will learn how to: Provision an Azure Machine Learning workspace., Use tools and interfaces to work with Azure Machine Learning., Run code-based experiments in an Azure Machine Learning workspace.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 2: Train a machine learning model with Azure Machine Learning

  • This section will require about 40 minutes to complete. It contains 7 units.
  • In this module, you will learn how to: Use a ScriptRunConfig to run a model training script as an Azure Machine Learning experiment., Create reusable, parameterized training scripts, Register trained models.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 3: Work with Data in Azure Machine Learning

  • This section will require about 45 minutes to complete. It contains 8 units.
  • In this module, you will learn how to: Create and use data stores in an Azure Machine Learning workspace., Create and use datasets in an Azure Machine Learning workspace.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 4: Work with Compute in Azure Machine Learning

  • This section will require about 45 minutes to complete. It contains 8 units.
  • In this module, you will learn how to: Work with environments, Work with compute targets.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 5: Orchestrate machine learning with pipelines

  • This section will require about 55 minutes to complete. It contains 10 units.
  • In this module, you will learn how to: Create Pipeline steps., Pass data between steps., Publish and run a pipeline., Schedule a pipeline.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 6: Deploy real-time machine learning services with Azure Machine Learning

  • This section will require about 40 minutes to complete. It contains 7 units.
  • In this module, you will learn how to: Deploy a model as a real-time inferencing service., Consume a real-time inferencing service., Troubleshoot service deployment.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 7: Deploy batch inference pipelines with Azure Machine Learning

  • This section will require about 44 minutes to complete. It contains 6 units.
  • In this module, you will learn how to create, publish, and use batch inference pipelines with Azure Machine Learning.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 8: Tune hyperparameters with Azure Machine Learning

  • This section will require about 46 minutes to complete. It contains 8 units.
  • In this module, you will learn how to use Azure Machine Learning hyperparameter tuning experiments to optimize model performance.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 9: Automate machine learning model selection with Azure Machine Learning

  • This section will require about 25 minutes to complete. It contains 7 units.
  • In this module, you will learn how to: Use Azure Machine Learning’s automated machine learning capabilities to determine the best performing algorithm for your data., Use automated machine learning to preprocess data for training., Run an automated machine learning experiment.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 10: Explore differential privacy

  • This section will require about 38 minutes to complete. It contains 6 units.
  • In this module, you will learn how to: Articulate the problem of data privacy., Describe how differential privacy works., Configure parameters for differential privacy., Perform differentially private data analysis
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 11: Explain machine learning models with Azure Machine Learning

  • This section will require about 47 minutes to complete. It contains 8 units.
  • In this module, you will learn how to explain models by calculating and interpreting feature importance.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 12: Detect and mitigate unfairness in models with Azure Machine Learning

  • This section will require about 45 minutes to complete. It contains 7 units.
  • In this module, you will learn: How to evaluate machine learning models for fairness., How to mitigate predictive disparity in a machine learning model.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 13: Monitor models with Azure Machine Learning

  • This section will require about 39 minutes to complete. It contains 6 units.
  • In this module, you will learn how to use Azure Application Insights to monitor a deployed Azure Machine Learning model.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Module 14: Monitor data drift with Azure Machine Learning

  • This section will require about 42 minutes to complete. It contains 6 units.
  • In this module, you will learn how to monitor data drift in Azure Machine Learning.
  • Below are a few of the sample questions with multiple choice options. For answers please refer to Answers

Perform data science with Azure Databricks

  • It covers how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run data science workloads in the cloud.
  • This section will be required about 8 hours 25 minutes to complete. It contains 12 modules.
  • Note: when I appeared for the exam, this was not a part of skill measured requirement.

Module 1: Describe Azure Databricks

  • This section will require about 53 minutes to complete. It contains 7 units.
  • In this module, you will: Understand the Azure Databricks platform., Create your own Azure Databricks workspace., Create a notebook inside your home folder in Databricks., Understand the fundamentals of Apache Spark notebook, Create, or attach to, a Spark cluster., Identify the types of tasks well-suited to the unified analytics engine Apache Spark
  • Below are a few of the sample questions with multiple choice options.

Module 2: Spark architecture fundamentals

  • This section will require about 33 minutes to complete. It contains 5 units.
  • In this module, you will: Understand the architecture of an Azure Databricks Spark Cluster., Understand the architecture of a Spark Job.
  • Below are a few of the sample questions with multiple choice options.

Module 3: Read and write data in Azure Databricks

  • This section will require about 1 hour to complete. It contains 9 units.
  • In this module, you will: Use Azure Databricks to read multiple file types, both with and without a Schema., Combine inputs from files and data stores, such as Azure SQL Database., Transform and store that data for advanced analytics.
  • Below are a few of the sample questions with multiple choice options.

Module 4: Work with DataFrames in Azure Databricks

  • This section will require about 46 minutes to complete. It contains 7 units.
  • In this module, you will: Use the count() method to count rows in a DataFrame., Use the display() function to display a DataFrame in the Notebook., Cache a DataFrame for quicker operations if the data is needed a second time., Use the limit function to display a small set of rows from a larger DataFrame., Use select() to select a subset of columns from a DataFrame., Use distinct() and dropDuplicates to remove duplicate data., Use drop() to remove columns from a DataFrame.
  • Below are a few of the sample questions with multiple choice options.

Module 5: Work with user-defined functions

  • This section will require about 33 minutes to complete. It contains 5 units.
  • In this module, you will learn how to: Write User-Defined Functions., Perform ETL operations using User-Defined Functions.
  • Below are a few of the sample questions with multiple choice options.

Module 6: Build and query a Delta Lake

  • This section will require about 43 minutes to complete. It contains 7 units.
  • In this module, you will: Learn about the key features and use cases of Delta Lake., Use Delta Lake to create, append, and upsert tables., Perform optimizations in Delta Lake., Compare different versions of a Delta table using Time Machine.
  • Below are a few of the sample questions with multiple choice options.

Module 7: Perform machine learning with Azure Databricks

  • This section will require about 59 minutes to complete. It contains 9 units.
  • In this module, you will learn how to: Perform Machine Learning., Train a model and create predictions., Perform exploratory data analysis., Describe machine learning workflows., Build and evaluate machine learning models.
  • Below are a few of the sample questions with multiple choice options.

Module 8: Train a machine learning model

  • This section will require about 53 minutes to complete. It contains 7 units.
  • In this module, you will learn how to: Perform featurization of the dataset., Finish featurization of the dataset., Understand Regression modeling., Build and interpret a regression model.
  • Below are a few of the sample questions with multiple choice options.

Module 9: Work with MLflow in Azure Databricks

  • This section will require about 33 minutes to complete. It contains 5 units.
  • In this module, you will learn how to: Use MLflow to track experiments, log metrics, and compare runs., Work with MLflow to track experiment metrics, parameters, artifacts and models., Below are a few of the sample questions with multiple choice options.

Module 10: Perform model selection with hyperparameter tuning

  • This section will require about 33 minutes to complete. It contains 5 units.
  • In this module, you will learn how to: Describe Model selection and Hyperparameter Tuning., Select the optimal model by tuning Hyperparameters.
  • Below are a few of the sample questions with multiple choice options.

Module 11: Deep learning with Horovod for distributed training

  • This section will require about 36 minutes to complete. It contains 6 units.
  • In this module, you will learn how to: Use Horovod to train a deep learning model., Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training., Work with Horovod and Petastorm for training a deep learning model.
  • Below are a few of the sample questions with multiple choice options.

Module 12: Work with Azure Machine Learning to deploy serving models

  • This section will require about 23 minutes to complete. It contains 4 units.
  • In this module, you will learn how to use Azure Machine Learning to deploy Serving Models
  • Below are a few of the sample questions with multiple choice options.

Practice Exams

It’s always a better idea to do few practice tests, before going for final exam. I have taken Examtopics practice test. I have also prepared my handwritten notes based on above learning path that I referred before the final exam.

Other Links

--

--

Satish Gunjal

Data Scientist, Machine Learning, Deep Learning, NLP, CX Architect, Conversational AI, Analytics & Process Optimization. www.satishgunjal.com