Machine Learning 101

By now you’re probably well aware that Big Data and Artificial Intelligence are major disruptors in almost every single vertical. Understanding the landscape can be challenging, particularly for business customers who want to innovate but aren’t sure where to start. In today’s blog, I hope to leave you, dear readers, with a basic understanding of both how machine learning works and how it might be beneficial for your organization.

First, let’s discuss where machine learning lives in the Big Data world. Forgive the Wikipedia Quote, but it’s a good summary: “Machine Learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to ‘learn’ (ie, progressively improve performance on a specific task) with data, without being explicitly programmed.” (1) Machine learning is a method to devise (or derive) complex models and algorithms which can be applied to a set of data to perform a specific task.

Figure 1: Original image by Dahl Winters, 2015

Machine learning is a subset of artificial intelligence, and itself an umbrella for a myriad of approaches (examples you may have heard of are artificial neural networks, genetic algorithms, decisions trees, etc.).

At a high level, machine learning operates by utilizing data to search for patterns within that data to create a model which can accurately predict an outcome. Over time, given sufficient data for the complexity of the task, an emergent model is developed via complex mathematical and statistical optimizations which essentially identifies relationships within the data that may not have been immediately recognizable to a human given the volume or complexity of the data.

Hypothetical Example

Consider an organization who provides elderly care. We can assume they have sufficient data about the daily lives, interactions, purchase history, etc. of a senior under their supervision. Now, let’s assume that this organization wants to minimize the impact of the flu virus in their patients. Without machine learning, we know the flu symptoms… but the problem is by the time symptoms show, it’s already too late! We want to be able to look for patterns in the behavior, environment and demographics of a person to identify when they are at high risk for catching the flu and alter our approach as a result (e.g., targeted literature to inform them of their risk). We can take a volume of data and feed it into a machine learning algorithm to look for emergent patterns that may not have been initially recognized by a human. Perhaps we discover that elders who live in the same city as their children are at increased risk (grandchildren are germ factories), or those who regularly play bingo on Thursday nights are, strangely, at lower risk! These patterns are the result of machine learning. We will revisit this example to understand how we got there.

There are two approaches to training a machine learning algorithm: supervised learning and unsupervised learning.

In supervised learning, the data provided to train the algorithm contains both the inputs and the outputs, or outcomes of that data. In this way, the algorithm receives “feedback” by being able to compare its results with the actual results. Supervised learning represents the majority of machine learning. It’s ideal for most applications, but organizations don’t always have the luxury of a dataset that includes the expected results.

Unsupervised learning, on the contrary, usually has a vastly different outcome to supervised learning. Rather than being able to answer a specific question, unsupervised learning is frequently only capable of offering insights into the data. It may identify patterns and associations, as well as clusters of data which may offer insights, but is unlikely to be able to perform a specific task.

Revisiting our Example

If our organization has years of data on their patients, as well as whether or not they caught the flu in a specific season, we have a good baseline for performing supervised learning. If we were unable to tie the outcomes to the specific question, our data would be unfit to solve the problem, and we may instead use machine learning to search for interesting patterns in the data which may lead to different hypotheses as a result.

Machine Learning: Competitive Advantage

Your data is a valuable corporate asset and applying machine learning is one way of extracting business value from that virtual goldmine. Many data-driven organizations are utilizing these capabilities to drive market insight, create competitive products and better serve their customers.

(1) Wikipedia contributors. (2018, July 11). Machine learning. In Wikipedia, The Free Encyclopedia. Retrieved 13:09, July 13, 2018, from https://en.wikipedia.org/w/index.php?title=Machine_learning&oldid=849817385