# 머신러닝 알고리즘 Cheat Sheet

in Data on Machine-Learning

각종 머신러닝 알고리즘의 Cheat Sheet입니다! 매번 검색하기 번거로워 인터넷에 있는 자료들을 가지고 왔습니다

## Dummies 자료

Algorithm | Best at | Pros | Cons |
---|---|---|---|

Random Forest | Apt at almost any machine learning problem Bioinformatics | Can work in parallel Seldom overfits Automatically handles missing values No need to transform any variable No need to tweak parameters Can be used by almost anyone with excellent results | Difficult to interpret Weaker on regression when estimating values at the extremities of the distribution of response values Biased in multiclass problems toward more frequent classes |

Gradient Boosting | Apt at almost any machine learning problem Search engines (solving the problem of learning to rank) | It can approximate most nonlinear function Best in class predictor Automatically handles missing values No need to transform any variable | It can overfit if run for too many iterations Sensitive to noisy data and outliers Doesn’t work well without parameter tuning |

Linear regression | Baseline predictions Econometric predictions Modelling marketing responses | Simple to understand and explain It seldom overfits Using L1 & L2 regularization is effective in feature selection Fast to train Easy to train on big data thanks to its stochastic version | You have to work hard to make it fit nonlinear functions Can suffer from outliers |

Support Vector Machines | Character recognition Image recognition Text classification | Automatic nonlinear feature creation Can approximate complex nonlinear functions | Difficult to interpret when applying nonlinear kernels Suffers from too many examples, after 10,000 examples it starts taking too long to train |

K-nearest Neighbors | Computer vision Multilabel tagging Recommender systems Spell checking problems | Fast, lazy training Can naturally handle extreme multiclass problems (like tagging text) | Slow and cumbersome in the predicting phase Can fail to predict correctly due to the curse of dimensionality |

Adaboost | Face detection | Automatically handles missing values No need to transform any variable It doesn’t overfit easily Few parameters to tweak It can leverage many different weak-learners | Sensitive to noisy data and outliers Never the best in class predictions |

Naive Bayes | Face recognition Sentiment analysis Spam detection Text classification | Easy and fast to implement, doesn’t require too much memory and can be used for online learning Easy to understand Takes into account prior knowledge | Strong and unrealistic feature independence assumptions Fails estimating rare occurrences Suffers from irrelevant features |

Neural Networks | Image recognition Language recognition and translation Speech recognition Vision recognition | Can approximate any nonlinear function Robust to outliers Works only with a portion of the examples (the support vectors) | Very difficult to set up Difficult to tune because of too many parameters and you have also to decide the architecture of the network Difficult to interpret Easy to overfit |

Logistic regression | Ordering results by probability Modelling marketing responses | Simple to understand and explain It seldom overfits Using L1 & L2 regularization is effective in feature selection The best algorithm for predicting probabilities of an event Fast to train Easy to train on big data thanks to its stochastic version | You have to work hard to make it fit nonlinear functions Can suffer from outliers |

SVD | Recommender systems | Can restructure data in a meaningful way | Difficult to understand why data has been restructured in a certain way |

PCA | Removing collinearity Reducing dimensions of the dataset | Can reduce data dimensionality | Implies strong linear assumptions (components are a weighted summations of features) |

K-means | Segmentation | Fast in finding clusters Can detect outliers in multiple dimensions | Suffers from multicollinearity Clusters are spherical, can’t detect groups of other shape Unstable solutions, depends on initialization |