Cheatsheet: ML parameters

Python
ML

Refer to the jupyter notebook for rendered code.

Author

Chi Zhang

Published

March 3, 2025

Classical models

Model Key parameters Selection methods Syntax
Linear regression alpha for Ridge, Lasso CV, AIC, BIC RidgeCV(alphas=[0.1, 1, 10])
Logistic regression C CV, AIC, BIC
Naive Bayes alpha for Multinomial NB GridSearchCV GridSearchCV(MultinomialNB(), param_grid={'alpha': [0.1, 0.5, 1]})
KNN n_neighbors, metric (Euclidean, Manhattan) CV GridSearchCV(KNeighborsClassifier(), param_grid={'n_neighbors': range(1, 50)})
SVM kernel, C, gamma GridSearchCV, RandomSearchCV
Decision Tree max_depth, min_samples_split, min_samples_leaf GridSearchCV, RandomSearchCV GridSearchCV(DecisionTreeClassifier(), param_grid={'max_depth': [3, 5, 10]})
Random forest n_estimators, max_depth,min_samples_leaf GridSearchCV, OOB score, feature importance RandomForestClassifier(n_estimators=100, oob_score=True)
Gradient boosting learning_rate, max_depth, n_estimators RandomSearchCV, Bayesian Optimization GridSearchCV(XGBClassifier(), param_grid={'max_depth': [3, 5, 7], 'learning_rate': [0.01, 0.1]})

Time series

Model Key parameters Selection methods Syntax
ARIMA (p, d, q) AR, differencing, MA ACF/PACF plots, AIC, BIC
Exponential smoothing seasonal, trend AIC, BIC

To be added

Deep learning

MLP is in sklearn, the rest are from keras / tensorflow

Model Key parameters Selection methods Syntax
MLP hidden_layer_sizes, activation, learning_rate, batch_size GridSearchCV(MLPClassifier(), param_grid={'hidden_layer_sizes': [(50,), (100,)]})

Loss functions

Regression models

  • MSE: \(\frac{1}{n}\sum (y_i - \hat{y_i})^2\)
  • MSE + L1: \(MSE + \lambda \sum |\beta|\)
  • MSE + L2: \(MSE + \lambda \sum \beta^2\)
  • MSE + (L1, L2): weighted sum of L1 and L2

Classification

Model Loss function Formula
LR Log loss (binary cross entropy) \(-\frac{1}{n}\sum [y log(\hat{y}) + (1-y) log(1-\hat{y})]\)
SVM Hinge loss \(\sum max(0, 1-y*\hat{y})\)
Decision Tree, RF Gini impurity \(1-\sum p_i^2\)
Entropy \(-\sum p_i log p_i\)
KNN 0-1 loss (misclassification rate) \(L = \frac{1}{n}\sum I(y_i \neq y_i)\)
NB Log loss
XGB Log loss (cross entropy) \(-\sum y_i log \hat{y_i}\)

Activation functions

Activation function Formula Range Pros Cons Use cases
Sigmoid \(\frac{1}{1+e^{-x}}\) (0,1) Output layer for binary classification
Softmax \(\frac{e_i}{\sum e_j}\) (0,1) Output layer for multi-class classification
Tanh \(\frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}\) Hidden layer
ReLU max(0, x) \([0, inf)\) No vanishing gradient issue, efficient Hidden layer
Leaky Relu x if \(x>0\), else \(\alpha x\) \((-inf, inf)\) Alternative to Relu for dead neurons
ELU
Swish Very deep networks