- Machine Learning With Python
- Home
- Basics
- Python Ecosystem
- Methods for Machine Learning
- Data Loading for ML Projects
- Understanding Data with Statistics
- Understanding Data with Visualization
- Preparing Data
- Data Feature Selection
- ML Algorithms - Classification
- Introduction
- Logistic Regression
- Support Vector Machine (SVM)
- Decision Tree
- Naïve Bayes
- Random Forest
- ML Algorithms - Regression
- Random Forest
- Linear Regression
- ML Algorithms - Clustering
- Overview
- K-means Algorithm
- Mean Shift Algorithm
- Hierarchical Clustering
- ML Algorithms - KNN Algorithm
- Finding Nearest Neighbors
- Performance Metrics
- Automatic Workflows
- Improving Performance of ML Models
- Improving Performance of ML Model (Contd…)
- ML With Python - Resources
- Machine Learning With Python - Quick Guide
- Machine Learning with Python - Resources
- Machine Learning With Python - Discussion
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Improving Performance of ML Model (Contd…)
Performance Improvement with Algorithm Tuning
As we know that ML models are parameterized in such a way that their behavior can be adjusted for a specific problem. Algorithm tuning means finding the best combination of these parameters so that the performance of ML model can be improved. This process sometimes called hyperparameter optimization and the parameters of algorithm itself are called hyperparameters and coefficients found by ML algorithm are called parameters.
Here, we are going to discuss about some methods for algorithm parameter tuning provided by Python Scikit-learn.
Grid Search Parameter Tuning
It is a parameter tuning approach. The key point of working of this method is that it builds and evaluate the model methodically for every possible combination of algorithm parameter specified in a grid. Hence, we can say that this algorithm is having search nature.
Example
In the following Python recipe, we are going to perform grid search by using GridSearchCV class of sklearn for evaluating various alpha values for the Ridge Regression algorithm on Pima Indians diabetes dataset.
First, import the required packages as follows −
import numpy from pandas import read_csv from sklearn.linear_model import Ridge from sklearn.model_selection import GridSearchCV
Now, we need to load the Pima diabetes dataset as did in previous examples −
path = r"C:\pima-indians-diabetes.csv" headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] data = read_csv(path, names=headernames) array = data.values X = array[:,0:8] Y = array[:,8]
Next, evaluate the various alpha values as follows −
alphas = numpy.array([1,0.1,0.01,0.001,0.0001,0]) param_grid = dict(alpha=alphas)
Now, we need to apply grid search on our model −
model = Ridge() grid = GridSearchCV(estimator=model, param_grid=param_grid) grid.fit(X, Y)
Print the result with following script line −
print(grid.best_score_) print(grid.best_estimator_.alpha)
Output
0.2796175593129722 1.0
The above output gives us the optimal score and the set of parameters in the grid that achieved that score. The alpha value in this case is 1.0.
Random Search Parameter Tuning
It is a parameter tuning approach. The key point of working of this method is that it samples the algorithm parameters from a random distribution for a fixed number of iterations.
Example
In the following Python recipe, we are going to perform random search by using RandomizedSearchCV class of sklearn for evaluating different alpha values between 0 and 1 for the Ridge Regression algorithm on Pima Indians diabetes dataset.
First, import the required packages as follows −
import numpy from pandas import read_csv from scipy.stats import uniform from sklearn.linear_model import Ridge from sklearn.model_selection import RandomizedSearchCV
Now, we need to load the Pima diabetes dataset as did in previous examples −
path = r"C:\pima-indians-diabetes.csv" headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] data = read_csv(path, names=headernames) array = data.values X = array[:,0:8] Y = array[:,8]
Next, evaluate the various alpha values on Ridge regression algorithm as follows −
param_grid = {'alpha': uniform()} model = Ridge() random_search = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=50, random_state=7) random_search.fit(X, Y)
Print the result with following script line −
print(random_search.best_score_) print(random_search.best_estimator_.alpha)
Output
0.27961712703051084 0.9779895119966027
The above output gives us the optimal score just similar to the grid search.