Search

Search this blog:

Search This Blog

Azure Machine Learning


I'm continuing on #TheMVPchallenge and working through some MS Learn Modules. This week I'm working my way through the unit: Create no-code predictive models with Azure Machine Learning

I'll update this post throughout the week as I progress through the modules.

Data scientists spend a lot of time working with regression models and manipulating large datasets to predict future values. In order to understand this hands on lab, it's helpful to have a basic understanding of data science and machine learning. 

A few terms you might want to get familiar with: 

  • Machine Learning
  • Label
  • Feature
  • Train
  • Regression
You may have some down time while waiting for the magic to happen in Azure, so use that to review the Introductions for each module and study up on some of the vocab above.

Tips for Learning

  • Set aside enough time (1-2 hours per module)
  • Save time for Clean up

These modules are hands-on. You get to work directly in Azure to train and test an Automated Machine Learning model. Set aside at least 1-2 hours to work through this module, more if you're like me and like to explore and try other things. You need to complete each module in one sitting so that you can clean up your Azure resource groups and save on cost when you're completed.

Datasets

This unit uses a few web sample datasets that are pretty cool, including one of penguins in Antarctica, a continent I'd love to visit someday. 

Use automated machine learning in Azure Machine Learning

Why reinvent the wheel? In this module you get to explore how Azure Machine Learning can automatically provide you with insights into your data. No need for you to pick and choose data, spend time cleaning or studying, or worrying about which algorithms to use. Let Azure figure it all out for you!

If you've used the AI visuals in Power BI, you may love this. I love exploring datasets with the Power BI Key Influencers visual, but in Power BI I need to select the fields and columns that I think are relevant. In the Azure Machine Learning, we can see which columns have the greatest influence over a value. All that you need to provide is a dataset, and identify the single column whose values you'd like to analyze. 

As you might expect, working day and average temp have a high correlation to bicycle rentals, whereas humidity and season have a much lower correlation and influence.

screenshot of Azure Machine Learning studio Global importance graph


Create a Regression Model with Azure Machine Learning designer

Step 1: Clean and Transform Data

Tips for cleaning data for regression models

  • Normalize all numeric attributes (speed, volume, temperature) 
  • Use text for categorical attributes that don't make sense to normalize (number of doors: two, four; not 2, 4)
  • Remove rows for missing / null values

Overview

Going through this module gave me a much greater appreciation of Power BI and all that it does to help us clean and transform our data. It's all about what you know, so I find Power Query very intuitive. Logging into the Azure Machine Learning studio is a step I'm not familiar with, but much of the Power Query functionality appears to be there. The trick is understanding the terminology.

For example, here are a few Azure Machine Learning data transformations and what they are called in Power Query: 

  • Azure Machine Learning | Power BI
  • Add Columns | Home > Merge Queries
  • Add Rows | Home > Append Queries
  • Apply Math Operation | Transform > Standard
  • Clean Missing Data | Home > Remove Rows 

Step 2: Create Training Pipeline 

What is a training pipeline? 

In order to evaluate the effectiveness of our model, we need to split the actual data into two groups - Group A that we'll use to 'train' and create the model, Group B that we'll use to score and evaluate the model.

Step 2A: Split actual data into two groups

  • use about a 70/30 random split for training/scoring 
  • use date split to evaluate forecasting 

Step 2B: Train Group A

Choose from the 19 algorithms available in the Azure Machine Learning designer to train your model.

Step 2C: Score 

Bring back the Group B and 'Score' your model. Basically this is a process of calculating the expected results for that Group B subset of data using the trained model developed by Group A. At the end of this process, you'll have two columns for your 'label' - the Actual and the Score (aka expected value).

Overview

The drag and drop user interface is has already become intuitive and I'm finding it really easy to use. What a great visual to see what's actually happening in the model - much more transparent than the Automated Machine Learning, and I love that you have more control over the actual design of the model.

Step 3: Evaluate Training Pipeline

There are a few key metrics that Azure Machine Learning Evaluate Model can provide to give you an indication of how accurate your trained model is: 

  • Mean Absolute Error (MAE)
  • Root Mean Squared Error (RMSE)
  • Relative Squared Error (RSE)
  • Relative Absolute Error (RAE)
  • Coefficient of Determination (R2)

Again, the easy drag and drop interface allows us to easily evaluate the model and get these results automatically!

Take the time to explore! How does the Decision Forest Regression compare to the Linear Regression evaluation?

Step 4: Create Inference Pipeline

Now that you've built and created the training model, we're ready to use it to predict label values on new data. 

Select the best model from your evaluation step above, and create an inference pipeline in Azure Machine Learning designer. You will need to edit the inputs so that the new data does NOT contain the label values, then run the pipeline. 

Step 5: Deploy

This is all pretty cool, but let's be honest, the people who need access to this information are not going to login to Azure Machine Learning designer to run the inference pipeline we just created, so the final step is deploying and publishing it to an external service. You'll notice that Web Service Input and Output were part of the inference pipeline we created, so now we'll get to test them in a web service environment. 

Again, explore the data. What happens if you change 'two' to 'four' door alfa-romero? How does the 'predicted price' change?

Create a classification model with Azure Machine Learning designer

By now you should be getting pretty familiar with the Azure Machine Learning designer interface, so you don't need to set aside quite as much time for this module. That being said, take advantage of the time to explore and try new things!

This module follows the same steps as the regression model, but for a classification label rather than a numeric label. In this case, we're classifying data as 'Yes' or 'No', 1 or 0. Is the patient diabetic?

Evaluate

Confusion Matrix

Because we're working with classification, the Evaluation will result in a 'Confusion Matrix', which includes key metrics: 

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • AUC

Threshold

The default threshold is 0.5, but if you want to err on the side of caution and predict that more people have diabetes, you might adjust this to something lower, like 0.3. A threshold of 0.3 will give you more false positives, but fewer false negatives. What threshold gives the greatest accuracy?

Create a clustering model with Azure Machine Learning designer

Again, the steps here are similar to those in the Regression model, except instead of 'scoring' the model, we need to 'assign data to clusters'. I LOVE hands-on learning. I'm already getting to the point where I can predict the steps before reading the instructions.

*DISCLAIMER: This often results in me breaking something, doing the wrong step, or losing my place, but it's the way I learn. Figure out the style that works for you and let me know how you go. 

Clustering models is one of my favorites - it's such a cool process that's happening in the background. Take the time while your model is running to read up on the maths behind the clustering.

Try testing this with your own sample data. It's the best way to show that you've learned something and reinforce the skills. An easy one to try is snow vs rain - give Azure Machine Learning designer a dataset of weather (use your favorite weather app) that includes temperature, wind chill, humidity and precipitation amount, but remove the precipitation type data (snow/rain) and see how well the clustering model works. 

Custom Visual Review: Charticulator

This is not your ordinary custom visual - this is EVERY custom visual. Charticulator puts the power to design and develop custom visuals to ...