The future is unpredictable. Nobody knows what’s coming. Yet we learn our lessons from history and set expectations for our future all the time. We recognize and categorize situations. We get suspicious when something is not behaving according to our expectations.
Computers don’t learn history like we do. They do not assess like we do – and they won’t make decisions based on how they feel. Yet they can help us quite a lot about what will happen, by learning from what has happened. With computers we are able collect lots of data. Lots of historical data that can help us to learn about the future. Historical data in which we can find patterns and behaviours.
Let’s use an example here: the current energy prices. We know those prices are increasing and we know the energy prices can change widely. Imagine we do have a business which can store energy (e.g. hydroelectric energy storage). We’d love to know the coming electricity prices to maximize profit. We can use machine learning here to create an algorithm that gives us a prediction of the coming electricity prices.
The first step in creating this model is to gather the historical data – the training data. This includes historical electricity prices (the target, what we want to estimate) as well as the data that impacts these electricity prices, like the weather forecast at the time (the features). Next, we’ll investigate the impact of the each feature we have on the electricity price. With the relevant features we can train a model that estimates the electricity price based on our features. We now can use this trained model to estimate electricity prices of the future.
When we try to predict a value, like in the case above, we can talk about regression. Another example of supervised machine learning is classification. Handwriting recognition is an example of classification: you can train a model with known input, but the number of possible outputs is a fixed amount in this case.
Both samples above are examples of supervised machine learning models: the algorithm is trained with historical data from which we know the target. But there is more: unsupervised machine learning. Here we can see two main categories: Clustering and anomaly detection.
With anomaly detection we search for patterns in our incoming data. When data deviates too much from the general patterns in this data, we talk about an anomaly. An example of anomaly detection is to detect faulty machines in a production environment.
Clustering is the other unsupervised machine learning category. In here we will group data automatically based on similar properties. Recommendation builders are known to use clustering algorithms: A clustering algorithm will group users of a service with similar preferences here, on which users within this group can now get similar recommendations.