Decision Tree

Categorical Variable: Classification

Continuous Variable: Regression

Entropy: 자료가 섞여 있는 정도

Information Gain : Entropy _ before - Entropy _ after

Basis : High Entropy --> Low Entropy ( Information Gain 이 높은 순서로 왼쪽 자식 노드 부터 채워가는 것 )

이와 같이 자료를 받았을 때, 받아온 자료 Input data 가 추정된 Y의 어떤 범주에 갔을 때, 가장 높은 확률을 갖는지 (가장 높은 확률을 갖는 class 로 분류)

오랜만에, 간단한 방법이다... 행복하군

실습은 간단하게 sklearn 의 iris datasets 을 사용하였고, 다른 하나는 kaggle 에서 기저 질환과 적절한 약물 분류에 대한 데이터를 받아 실습하였다.

간단해서 그런지 ... Drug 의 경우 test set 에서 100% 의 정확도가 있었다...

Unsupervised Learning - Clustering ( KMeans, Hierarchical, DBSCAN) (0)	2021.09.01
Ensemble - (Bagging, RandomForest, Boosting(Adaboost, lightgbm, Catboost), Stacking (0)	2021.08.31
Support Vector Machine (0)	2021.08.30
Linear Discriminant Analysis (LDA) & QDA (0)	2021.08.30
K - Nearest Neighbor (0)	2021.08.30

안다는 것은 자유로운 것이다.