Search

Search this blog:

Search This Blog

Computer Vision in Microsoft Azure


 #TheMVPChallenge continues as I continue on my journey through MS Learn modules in Azure. This week I'm working through the topic MS Learn: Computer Vision in Microsoft Azure

To complete the hands on exercises in this unit, you'll need to choose an Azure resource: 

  • Custom Vision: A dedicated resource for the custom vision service, which can be training, a prediction, or both resources.
  • Cognitive Services: A general cognitive services resource that includes Custom Vision along with many other cognitive services. You can use this type of resource for training, prediction, or both.

Tips for Hands-On Labs

This lab uses a virtual machine run through LabOnDemand. I love this as an instructor, because it enables me to see students' screens and troubleshoot when doing virtual or blended courses. However, as an end user who has never used it before, it can be a bit overwhelming. 

  • You can use the same launch instance to run multiple modules (until you reach the 1hr time limit)
  • Use the icons at top left to: 
    • Expand the screen to full screen (Computer icon)
    • Paste clipboard text (Lightning bolt icon)
screenshot of LabOnDemand

Analyze images with the Computer Vision service

This module is bringing back memories of science class and classifying plants and animals into Kingdom, ... , Family, Genus, Species. Apparently there's an 86-category taxonomy for Computer Vision that images are classified into using Azure Computer Vision. 

Features

Computer Vision enables you to analyze images and return many 'features', In this exercise we get to look at: 

  • Description
  • Tags
  • Adult
  • Objects
  • Faces

The image below is the result of one of the Computer Vision predictions that you will see during this unit's hands-on learning. Can you identify the results that match each of the features in the list above? 

Screenshot of Computer Vision result

Classify images with the Custom Vision service

This module starts to get into more specifics on how we might be able to train and use the Custom Vision service to develop our own models for every day use - without needing to be experts in data science or machine learning! Pretty neat. 

What will you use Custom Vision for? 

  • Classifying products
  • Identifying key structures (power lines, bridges, skyscrapers)
  • Other?

One key takeaway that stood out to me was the importance of providing the model with enough of the right data to train it properly: 

"One of the key considerations when using images for classification, is to ensure that you have sufficient images of the objects in question and those images should be of the object from many different angles."

Again, classification is one of my favorites and this module was lots of fun and very hands-on. I even managed to classify my images in Spanish, well at least 'naranja'. 

screenshot Classification Custom Vision

Detect objects in images with the Custom Vision service

Training vs Predicting

Custom Vision and Cognitive Services resources in Azure will both let you train and predict, but in order to do both in a Custom Vision resource, you need TWO Custom Vision resources; one for training and one for prediction. While there are reasons to keep these two separate, sometimes you want a simple solution with one single key and endpoint for BOTH training and prediction, in this case you must use the Cognitive Services resource which allows training and prediction in the same resource.

Tagging

This was a laborious process - you must manually tag each image before you can train the model. 

I tried to train the model without doing the (very manual) tagging process on all the images but got an error: 

Screenshot of error: Your project can't be trained just yet. Make sure you have at least 15 images for every tag.

It would be great if we could somehow use the classification model to help with the tagging process of object detection? 

Ultimately, it's still faster than having to create your own object detection model from scratch.

Human Error and Accuracy

I made at least 2 mistakes when tagging the fruits, and didn't bother to correct them (on purpose - I wanted to see how this would impact my results). My model was still able to accurately detect the apple and orange, but it also detected a phantom banana. Does yours do the same?

screenshot of orange apple detect results

How would you use this in the real world? 

  • Evaluating the safety of a building by looking for fire extinguishers or other emergency equipment.
  • Creating software for self-driving cars or vehicles with lane assist capabilities.
  • Medical imaging such as an MRI or x-rays that can detect known objects for medical diagnosis.

Detect and analyze faces with the Face service

Most of us have used facial recognition software before, such as: 

  • Tagging photos of friends and family (on social media or cloud photo storage)
  • Device security (to unlock phone or computer)
This technology is pretty advanced - the lab exercise for this module is pretty easy and I didn't feel like I had much influence or input in what was happening. We're merely utilizing tried and tested models to identify faces, common features and identify people. 

What impressed me was that by provided a single profile view of a shopper from the left hand side, the Azure Face service was able to detect that the profile view from the right hand side was the same shopper. Facial recognition is a really cool science that fascinates me.

As a side note, did you know that there's such a thing called 'Face Blind' or Prosopagnosia? I'm interested to see how the research around this disorder develops and if we can use the same technology from Face service to help people with Prosopagnosia. 

Read text with the Computer Vision service

OCR, or Optical Character Recognition, is a term you may have heard before. This module gives some background on where OCR started and shows off the Azure capability. However, you don't need Azure to take advantage of OCR. One of my favorite apps that is FANTASTIC at OCR is Microsoft OneNote.

Key Takeaway from this module is the difference between the Read API and OCR API. 

Bonus Activity: Try OCR in Microsoft OneNote

  1. Open Microsoft OneNote (download OneNote for free)
  2. With this blog post visible, type [Windows Key] [Shift] [S] simultaneously on your keyboard.
  3. Drag a box around a portion of text in this blog - try not to cut off the left and right.
  4. Return to OneNote and paste the screenshot.
  5. Right click on the image then choose Copy Text from Picture
  6. Click somewhere else in the OneNote page and Ctrl V to paste the text.

Analyze receipts with the Form Recognizer service

This exercise is fairly basic, but I wonder if we could use this to speed up processing of expense claims? It says it works for US based receipts on taxes paid - could it still identify GST on NZ based receipts? 

Custom Visual Review: Charticulator

This is not your ordinary custom visual - this is EVERY custom visual. Charticulator puts the power to design and develop custom visuals to ...