[LG Aimers] Module 5.4(Supervised Learning 분류, 회귀) : Linear Classification

AI Fundamentals/LG Aimers 3기

[LG Aimers] Module 5.4(Supervised Learning 분류, 회귀) : Linear Classification

Jae. 2023. 7. 16. 06:29

728x90

1. Classification

Supervised Learning: Labeled Data 사용
Output이 discrete 한 경우
Hyperplane을 기준으로 score 값을 계산하여 classification을 수행

Linear Model
- Input Features & Model Parameter(learnable parameter)의 linear combination으로 구성
- Input Feature $x$ 기준 linear 할 필요가 없다
- Model Paramter $w$ 기준 linear 하면 Linear Model

Input Feature: d-dimension vector
Hyper plane: decision boundary

Positive sample과 negative sample들을 linear combination에 의해 구분을 하는 것이 목적

Linear Model의 장점
- 단순
- 해석 가능성
- 다양한 환경에서 일반적으로 안정적인 성능

Multiclass Classification
- 입력 신호 공간에서 hyper plane이 다수 존재 하여 classification을 수행
- d차원의 공간에 입력 feature vector가 존재
- target function f를 approximation하는 hypothesis h를 학습하는 것이 목적

2. Linear Classification Framework

Predictor 결정 : $h(x) = sign(w^{T}x$
Model Parameter를 fitting하기 위한 Loss function을 결정
- Zero-one loss
- Hinge loss
- Cross-entropy loss
Optimization Algorithm 결정

Sign Function
- 0보다 크면: +1
- 0보다 작으면: -1
- 0이면 : 0

Hyper plane parameter [w1, w2] = [-1,1]

Hyper plane parameter [w1, w2] = [-1,1]

Parameter W를 학습하는 것이 목적

Zero-one Loss
- 내부의 logic을 판별하여 맞으면 1 틀리면 0을 출력하는 함수
- Function 표시방법 : $1\left [ logic \right ]$

Hyper Plane : [w1,w2] vector와 내적했을 때 부호가 +/-가 되는 기준영역

3. Classification Model에서의 Error 판단

Score $w^{T}\phi(x) = w \cdot \phi(x)$ : 결정 과정에서 model이 얼마나 confident한지 판별
- Hyperplane을 기준으로 score 값을 계산하여 classification을 수행
- Input Feature과 Model Parameter의 linear combination으로 구성
- + : model prediction이 positive sample을 의미
- - : model prediction이 negative sample을 의미

Margin $(w \cdot \phi(x))y$: Score에 y값을 곱해서 구함, model이 얼마나 Correct한지 판별
- y = 1 (positive sample)
- y = -1 (negative sample)
- Margin이 음수가 나오면 model prediction이 실패했다는 것을 의미

Zero-one loss
- $Loss_{0-1}(x,y,w) = 1\left [ (w \cdot \phi(x))y \right ]$
- Zero-one loss의 경우 partial derivative term을 구할 때 gradient = 0이 되므로 model 학습이 불가능해진다

Hinge Loss
- $Loss_{hinge}(x,y,w) = max\left [1 - (w \cdot \phi(x))y,0 \right]$
- Gradient : Model Parameter로 미분

Cross-entropy Loss
- 2개의 서로 다른 PMF p,q 사이의 dissimilarity를 측정
- ( 임의의 실수값이 있다면 sigmoid등의 함수로 [0,1]의 확률값으로 Mapping 해야함)
- p,q가 유사한 정도 : $p log\frac{1}{q} + (1-p) log\frac{1}{1-q} $
  - p,q가 유사할수록 : Loss가 줄어듦
  - p,q가 다를수록 : Loss가 올라감
- Classification Model 학습에 가장 많이 사용되는 loss function
- 함수의 원형 : 2개의 서로 다른 PMF 사이에 가까운 정도, 다른 정도를 측정하기 위한 KL divergence에 의해 표현

Score 값은 실수인데, Cross-entropy 값은 확률 값(0~1)을 서로간에 비교를 해야된다
즉, Score 값을 확률 값으로 mapping 해야 한다 : Sigmoid Function 이용

Logistic Model
- $w^{T}\phi(x)$의 값을 sigmoid 함수에 집어넣으면, score 실수 값을 0부터 1사이의 값으로 mapping 가능

학습이 진행되면서 estimated 값이 real 값으로 점점 이동 : cross-entropy Loss가 줄어듦

Linear Classifier가 GD를 사용하는 방법 : Weight Update

4. Multiclass classification: image recognition

입력 feature를 적절히 구분할 수 있는 hyper plane을 학습
Binary classification -> multiclass classification 으로 확장

One vs All: Multiclass classification 문제를 binary classification으로 풀 수 있게 한다

Score 값들을 얻으면 sigmoid 함수를 이용하여 확률값으로 mapping할 수 있음

One Hot Encoding으로 두 개의 서로 다른 표 사이에 거리를 가깝게 하면서 학습

Linear Classification의 장점
- 쉽게 구현할 수 있고, 쉽게 테스트 할 수 있음 : 가장 처음에 시도를 하기에 적합한 모델
- 해석 가능성 증가

linear classification model: positive/negative 로 선형적으로 구분됨

728x90

저작자표시 (새창열림)