Machine Learning

How Hyperparameter Tuning in Machine Learning Works

Sep 5, 2022

5 min read

Until now, we have discussed multiple optimization algorithms and techniques that can influence the performance of your model and MLOps pipeline, which can save both resources and hours for your team and investors. Still, in all that clutter, you may have been looking for the most fundamental of them all!

Performance Metrics, Cross Validation, and Anomaly Detection all come under one broader technique that is the most extensive arsenal for your armory, Hyperparameter Tuning. Number of Epochs, Number of K-Folds, Hours of Training, Learning Rate, and much more, all parameters can be classified as hyperparameters in a Machine Learning pipeline. These parameters are optimized and perfected using a robust reiterative cycle achieved through thorough use of Performance Metrics and Cross-Validation.

Let us indulge in an exploration of techniques and methods that can help you save incredible amounts of training time and achieve efficacy as never before!

What is hyperparameter tuning in machine learning?

Let us start from the beginning and revisit what I mean by “Hyperparameter Tuning”! Hyperparameters are parameters of a machine learning algorithm that are not learned from the training data (or not optimally learned from the training data) and must be set in advance by an operator. Hyperparameter tuning is finding the optimal hyperparameters for any given machine learning algorithm and training them efficiently to gain the best results in the least amount of resources.

Indifferent from standard parameters, hyperparameters are aspects of the model architecture. These values and variables require an acute knowledge of how a model works at its core. This knowledge comes from experience; that is something a veteran Machine Learning Engineer should ideally have, or do they? Read on how you can tackle these parameters head-on!

Hyperparameter tuning consists of finding a set of values that should work for a model, irrespective of the data thrown at it. The tuning’s main objective is to maximize the model’s performance while minimizing the set loss initially defined for that particular task. It’s worth noting that these parameters exist to find the optimal setting for the said model but do not fundamentally change how the model works. Let us now delve deeper into why and how hyper-parameter tuning is important more subjectively and what kind of performance can be achieved with this technique.

Why is hyperparameter tuning critical?

Hyperparameter Tuning is an essential part of taming a wild machine learning model and is necessary to ensure that the pipeline isn’t producing sub-optimal results that inhibit a model from not working at its full potential. We assume that the full potential of a model refers to its very ability to minimize the decreed loss function as low as possible without actually over-fitting the dataset.

Over-fitting: The model is "overfitted" when it memorizes the noise and fits too closely to the training set, making it less effective at generalization to new data.

Machine Learning models are fundamentally empirical, meaning they operate solely based on logic and are verified by observation and metrics. This is the ideal playground for hyperparameters to shine, as with statements comes room to improve. Some subjective advantages of implementing hyperparameter tuning techniques on a model can be

1. Arriving at the desired loss value sooner.

2. Not miss the learnable patterns in the dataset without falling short due to the learning rate.

3. Save a bunch of Dollars spent training a model.

Much like humans and the different cogs in your machine learning pipeline, each hyperparameter is as unique as an individual and requires special care when being optimized. In addition, these hyperparameters vary for other models and require different approaches. when being tweaked for the best performance. Let us look at some types of hyperparameters you can encounter while training.

Examples of hyperparameters in machine learning

Chances are when you are leading an MLOps team comprising derivative work performed by machines and computers; you may be in a pickle when deciding how to hack this model training into working better. Compared to human learning to learn a new task, the only go-to for improving a model may be increasing the training time, right? Well, sadly, ML models are not that simple.

Modern neural networks come with unique parameters with room for improvement; let us look at some.

1. Hidden Layers: The literal backbone and ribs of a neural network, hidden layers comprise the nodes and neurons that make up the mind of your machine learning model. These individual neurons act as feature extractors for the unseen data and are responsible for identifying the underlying relationships between the input and output. So more layers mean more accuracy, right? It turns out no! The number of hidden layers in architecture should be a sweet trade-off between the complexity and generality of the model, which is still fast to train.

2. Neurons in a layer: Like a biological neuron, neurons sit at the very core of your machine intelligence. Typically, a neuron computes the weighted average of its input, and this sum is passed through a nonlinear function, often called the activation function. However, much like hidden layers, more is not always better. The multiple numbers of these neurons working together in a single layer may cause overfitting because of the things that are memorized at every step and also need to be selected, keeping the trade-off in mind.

3. Learning Rate: This rate refers to the degree of correction or changes required to the model parameters. This rate also determines how close you will get to the local minimum, which will minimize the desired prediction to the actual predictions. However, this also comes with the downside of the parameters not updating quickly and increasing your training time.

4. Number of Epochs: An epoch in machine learning means one complete pass over the training set through the algorithm. However, when trying to tweak this number to achieve the most efficient training, this value is typically kept as low as possible to allow for the completion of more training "experiments" in the same amount of time.

5. Momentum: By preventing abrupt changes in parameter values, momentum prevents us from entering local minima. It encourages parameters to shift in the same direction, which helps avoid parameters zigzagging after each iteration. To begin with, low momentum values aim to adjust upward as necessary.

However, we have just scraped the surface of the vast examples of hyperparameters in machine learning that are out there in the industry; the parameters discussed over here are sure to give you an idea of the overarching parameters that should be looked over when training a model.

Methods used for Hyperparameter Tuning

Manually scouring the unlimited permutations and combinations of hyperparameters can be tiring for your model’s “not so substantial” (but still significant) improvement. Throughout this article, we have discussed the importance of hyperparameters and examples of such parameters in your model.

Several techniques have been established through trials that you can implement to get the best set that suits your needs and requirements. Let us look at some strategies:

1. Grid Search: One of the brute force methods, grid search is the most basic algorithm for hyperparameter tuning. Essentially, we divide the domain of the hyperparameters into a discrete grid. Then, using cross-validation, we try every possible combination of grid values. The optimal combination of hyperparameter values is the grid point that maximizes the average value in cross-validation.

Grid search is an exhaustive algorithm that searches all possible combinations to find the best point in the domain. The main disadvantage is that it is prolonged. Checking every variety of space takes a lot of time, which isn't always available.

2. Random Search: Random search is like grid search, but instead of testing all of the points in the grid, it only tests a random subset of them. The optimization will be faster, but this subset’s more minor will be less accurate. The more precise the optimization is, but the more it looks like a grid search, the bigger this dataset is.

Random search is proper when you have a grid of values for several hyperparameters. We can get a pretty good set of values for the hyperparameters by taking a subset of randomly chosen points. It probably won't be the best point, but it can still give us a good set of values to follow.

Now that we have looked at some of the techniques, we hope these help you refine the model to the best combination. Sadly, there are no concrete methods of getting the exact ideal values, but with your knowledge of the model, the grid and random searches can be tweaked to get efficient results.

Conclusion

In this blog, we went through a bunch of overall hyperparameters and tweaking, saving you both time and money. However, these hyperparameters differ with each model that you encounter. Some overarching techniques can work out for every model you and your team are trying to tackle.

Let us know the checklist you follow when you are tweaking your model to reach its potential.

Written By

Aryan Kargwal

Data Evangelist