Cheatsheet: ML parameters

Python

ML

Refer to the jupyter notebook for rendered code.

Author

Chi Zhang

Published

March 3, 2025

Classical models

Model	Key parameters	Selection methods	Syntax
Linear regression	`alpha` for Ridge, Lasso	CV, AIC, BIC	`RidgeCV(alphas=[0.1, 1, 10])`
Logistic regression	`C`	CV, AIC, BIC
Naive Bayes	`alpha` for Multinomial NB	GridSearchCV	`GridSearchCV(MultinomialNB(), param_grid={'alpha': [0.1, 0.5, 1]})`
KNN	`n_neighbors`, `metric` (Euclidean, Manhattan)	CV	`GridSearchCV(KNeighborsClassifier(), param_grid={'n_neighbors': range(1, 50)})`
SVM	`kernel`, `C`, `gamma`	GridSearchCV, RandomSearchCV
Decision Tree	`max_depth`, `min_samples_split`, `min_samples_leaf`	GridSearchCV, RandomSearchCV	`GridSearchCV(DecisionTreeClassifier(), param_grid={'max_depth': [3, 5, 10]})`
Random forest	`n_estimators`, `max_depth`,`min_samples_leaf`	GridSearchCV, OOB score, feature importance	`RandomForestClassifier(n_estimators=100, oob_score=True)`
Gradient boosting	`learning_rate`, `max_depth`, `n_estimators`	RandomSearchCV, Bayesian Optimization	`GridSearchCV(XGBClassifier(), param_grid={'max_depth': [3, 5, 7], 'learning_rate': [0.01, 0.1]})`

Time series

Model	Key parameters	Selection methods	Syntax
ARIMA	`(p, d, q)` AR, differencing, MA	ACF/PACF plots, AIC, BIC
Exponential smoothing	`seasonal`, `trend`	AIC, BIC

To be added

Deep learning

MLP is in sklearn, the rest are from keras / tensorflow

Model	Key parameters	Selection methods	Syntax
MLP	`hidden_layer_sizes`, `activation`, `learning_rate`, `batch_size`		`GridSearchCV(MLPClassifier(), param_grid={'hidden_layer_sizes': [(50,), (100,)]})`

Loss functions

Regression models

MSE: \(\frac{1}{n}\sum (y_i - \hat{y_i})^2\)
MSE + L1: \(MSE + \lambda \sum |\beta|\)
MSE + L2: \(MSE + \lambda \sum \beta^2\)
MSE + (L1, L2): weighted sum of L1 and L2

Classification

Model	Loss function	Formula
LR	Log loss (binary cross entropy)	\(-\frac{1}{n}\sum [y log(\hat{y}) + (1-y) log(1-\hat{y})]\)
SVM	Hinge loss	\(\sum max(0, 1-y*\hat{y})\)
Decision Tree, RF	Gini impurity	\(1-\sum p_i^2\)
	Entropy	\(-\sum p_i log p_i\)
KNN	0-1 loss (misclassification rate)	\(L = \frac{1}{n}\sum I(y_i \neq y_i)\)
NB	Log loss
XGB	Log loss (cross entropy)	\(-\sum y_i log \hat{y_i}\)

Activation functions

Activation function	Formula	Range	Pros	Cons	Use cases
Sigmoid	\(\frac{1}{1+e^{-x}}\)	(0,1)			Output layer for binary classification
Softmax	\(\frac{e_i}{\sum e_j}\)	(0,1)			Output layer for multi-class classification
Tanh	\(\frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}\)				Hidden layer
ReLU	max(0, x)	\([0, inf)\)	No vanishing gradient issue, efficient		Hidden layer
Leaky Relu	x if \(x>0\), else \(\alpha x\)	\((-inf, inf)\)			Alternative to Relu for dead neurons
ELU
Swish					Very deep networks