728x90
1. MNIST
MNIST
- 사람이 수기로 쓴 0~9까지의 digit data
- train data = 55,000장
- test data = 10,000장
- 각 Image는 Preprocessing 완료
- 각 한장의 이미지는 28 x 28 size
- Digit들은 Center-Aligned
- Digit Size는 각 Image당 비슷한 Size로 Rescaled
2. MNIST Classification Model (Sigmoid)
Activation Function: Sigmoid
Loss Function: MSE
Sigmoid Function을 이용하여 784개의 Input Feature를 바탕으로 Output Layer의 값을 손글씨가 각 digit에 해당할 확률로 반환
Sigmoid Outputs & MSE Error Loss 의 문제점
Sigmoid outputs
- $Prediction \in (0,1)$
- $Target \in (0,1)$
- 특정 class에서의 target data - predicted output 의 값이 최대 1
- Loss 와 Gradient Magnitude의 최댓값이 존재
- Max Loss < 1
- Max Gradient Magnitude < 2
- 거기서 발생하는 gradient의 값이 크지 않을 수 있기에 학습이 느려질 수 있는 문제
Multi-Class Classification model에서 (가능한 class 중 특정한 하나의 class로 분류를 하고자 할 때)
일반적으로 원하는 output의 형태는 합이 1이고 값이 [0,1] 사이의 확률 분포에 해당하는 vector를 얻는 것이 중요
3. MNIST Classification Model (Softmax Layer / Classifier)
Activation Function: Softmax Activation
Loss Function: Cross-Entropy
Multi-Class Classification에서
output vector의 형태가 총합이 1이고
각각 [0, 1] 사이의 확률분포를 가지는 activation function을 softmax layer 이라 한다
- 소프트맥스(softmax) 함수는 모델의 출력을 확률로 해석할 수 있게 변환해 주는 연산
- 분류 문제를 풀 때 선형모델과 소프트맥스 함수를 결합하여 예측
- Input Node: 5
- Output Node: 3
- Weight Matrix
- 각 Row 마다 하나의 Output Node에 대응
- # Output Node X # Input Node (행 X 열)로 구성
- Input Feature, Model Parameter 간의 linear combination을 Softmax Layer이라는 활성함수에 통과시킨다
- Softmax Layer의 input은 실수 전체의 범위를 가지기 때문에 합이 1인 형태의 확률 분포에 해당하는 vector로 변환
- Softmax의 output vector에 loss function을 적용할 때, Cross-Entropy Loss를 사용
- Cross-Entropy Loss
- 확률 분포에서 정답 class에 해당하는 확률 값이 최대한 1이 나오도록 하는 loss function을 설계
- $\hat{p_{c}}$: 내가 예측한 확률 vector (빨간색 숫자)
- $y_{c}$: ground truth vector
- 해당하는 class면 1, 해당하지 않는 class면 0을 반환
- one-hot vector의 형태로 주어짐
- $- \sum_{c=1}^{C} y_{c}log(\hat{p_{c}}) = -log(\hat{p_{y_{i}}})$
- $\sum_{c=1}^{C}$: 모든 Class에 대해 loss를 계산
- $y_{c} = 0$ 인 항들은 전부 제거, $y_{c} = 1$ 인 항만 살아남음
- $\hat{p_{y_{i}}}$: i번째 training data item의 확률
- $-log(\hat{p_{y_{i}}})$: 확률이 0에 가까워질수록 loss가 커진다
- Softmax loss는 다른 class에 부여된 확률 값은 전혀 고려하지 않음
- 그 이전에 지수함수를 통과하고 값들의 상대적인 비율을 따질 때 $\hat{p_{y_{i}}}$에 선반영됨
- 정답 class에 해당하는 확률이 증가하면 해당하지 않는 class의 확률은 작아짐: 상호의존적
4. Logistic Regression
Logistic Regression w/ Binary Cross - Entropy
- softmax layer의 특별한 형태 = Sigmoid
- Binary Classification (class가 2개만 있을 때) 일 때 많이 사용
- Output Node가 하나로만 구성된다: Weight Matrix가 1 row만을 가짐
가상의 class를 생성하고 logic 값을 default(0)으로 설정: 모든 Weight = 0
- 가상의 class는 weight matrix에서 하나의 row에 해당하는 모든 entry = 0
- Positive Class: 기존의 class
- Negative Class: 가상의 class
positive class에 해당하는 가중치만을 학습을 통해 최적화된 값으로 도출해낸다
- $\hat{p_{i}}$: 내가 예측한 positive 확률 vector (빨간색 숫자)
- $y_{i}$
- positive class면 1: $y_{i}=1$ 이므로 앞항만 남음 & softmax loss와 동일
- negtaive class면 0: $y_{i}=0$ 이므로 뒷항만 남음 & softmax loss와 동일(negative 일 때 해당하는 항들 부여)
- one-hot vector의 형태로 주어짐
5. 실습 - MNIST
import torch
import torchvision
import torch.nn.functional as F
from torchvision import transforms
from torch.utils.data.dataloader import DataLoader
- transforms.ToTensor(): (N, C, H, W) 형태의 tensor shape으로 입력 데이터를 변환
- N: number of images
- C: channel
- H, W: Height, Width
- MNIST Dataset은 grayscale 영상으로 ToTensor() 적용시 (60000, 28, 28)임
- Normalize( mean, standard deviation)
- 영상의 평균과 표준 편차를 통한 정규화
- Color 영상인 경우: $mean, \; std \in R^{1 \times 3}$
- Setup the image set: all images & labels
- Setup the data loader: 모든 데이터를 batch size에 따라서 random 하게 load하기 위해 loader 사용
- DataLoader Parameter
- shuffle: 입력 데이터의 무작위 호출을 위한 옵션
- drop_last: 맨 마지막 batch data를 생략 (train시 True, test시 False)
- pin_memory: 시스템 메모리의 직접적인 할당을 통한 CUDA 연산 효율성 증대
- GPU를 사용할 경우 일반적으로 True로 할당
# device
device = "cuda" if torch.cuda.is_available() else "cpu"
# device = 'cpu'
# Reproducibility
torch.manual_seed(123)
if device == "cuda":
torch.cuda.manual_seed_all(123)
trans = transforms.Compose(
[transforms.ToTensor(), transforms.Normalize((0.1307), (0.3081))]
)
# Setup Image Set
X_train = torchvision.datasets.MNIST(
"./data", train=True, transform=trans, download=True
)
X_test = torchvision.datasets.MNIST(
"./data", train=False, transform=trans, download=True
)
# Setup data Loader
train_loader = DataLoader(
X_train, batch_size=64, shuffle=True, drop_last=True, pin_memory=True
)
test_loader = DataLoader(
X_test, batch_size=64, shuffle=False, drop_last=False, pin_memory=True
)
- 훈련을 위한 Model 정의
## Model
layer = torch.nn.Sequential(
torch.nn.Flatten(), # one-dimensional Vector로 변환
torch.nn.Linear(in_features=784, out_features=256, bias=True),
torch.nn.ReLU(),
torch.nn.Linear(in_features=256, out_features=256, bias=True),
torch.nn.ReLU(),
torch.nn.Linear(in_features=256, out_features=10, bias=True),
).to(device)
print(layer)
Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=784, out_features=256, bias=True)
(2): ReLU()
(3): Linear(in_features=256, out_features=256, bias=True)
(4): ReLU()
(5): Linear(in_features=256, out_features=10, bias=True)
)
- Epoch: 1 epoch당 모든 image가 iterate 된다
# Optimizer
optimizer = torch.optim.Adam(layer.parameters(), lr=0.001)
# Training
for epoch in range(15): # total 15 epochs
for idx, (images, labels) in enumerate(train_loader):
# Change the data to cuda tensor and type
images, labels = images.float().to(device), labels.long().to(device)
# Extract output of single layer
hypothesis = layer(images)
# Calculate Cross-Entropy Loss
cost = F.cross_entropy(input=hypothesis, target=labels)
# Gradient initialization
optimizer.zero_grad()
# Calculate Gradient
cost.backward()
# Update Parameters
optimizer.step()
# Calculate accuracy
prob = hypothesis.softmax(dim=1) # 0: column-wise, 1: row-wise
pred = prob.argmax(dim=1)
acc = pred.eq(labels).float().mean()
if (idx + 1) % 128 == 0:
print(
f"Train-Iteration: {idx+1}, Loss: {cost.item()}, Accuracy: {acc.item()}"
)
Train-Iteration: 128, Loss: 0.2432430237531662, Accuracy: 0.90625
Train-Iteration: 256, Loss: 0.09525416046380997, Accuracy: 0.96875
Train-Iteration: 384, Loss: 0.21631625294685364, Accuracy: 0.9375
Train-Iteration: 512, Loss: 0.05328263342380524, Accuracy: 1.0
Train-Iteration: 640, Loss: 0.156303271651268, Accuracy: 0.921875
Train-Iteration: 768, Loss: 0.04886922985315323, Accuracy: 1.0
Train-Iteration: 896, Loss: 0.05945151671767235, Accuracy: 0.984375
Train-Iteration: 128, Loss: 0.08172701299190521, Accuracy: 0.953125
Train-Iteration: 256, Loss: 0.15105785429477692, Accuracy: 0.9375
Train-Iteration: 384, Loss: 0.18889756500720978, Accuracy: 0.9375
Train-Iteration: 512, Loss: 0.03930482640862465, Accuracy: 1.0
Train-Iteration: 640, Loss: 0.06349022686481476, Accuracy: 0.984375
Train-Iteration: 768, Loss: 0.004894572775810957, Accuracy: 1.0
Train-Iteration: 896, Loss: 0.04462869465351105, Accuracy: 0.984375
Train-Iteration: 128, Loss: 0.0492296926677227, Accuracy: 0.953125
Train-Iteration: 256, Loss: 0.07583090662956238, Accuracy: 0.96875
Train-Iteration: 384, Loss: 0.04224878549575806, Accuracy: 0.984375
Train-Iteration: 512, Loss: 0.07279356569051743, Accuracy: 0.96875
Train-Iteration: 640, Loss: 0.030228042975068092, Accuracy: 0.984375
Train-Iteration: 768, Loss: 0.11051540821790695, Accuracy: 0.9375
Train-Iteration: 896, Loss: 0.08741894364356995, Accuracy: 0.96875
Train-Iteration: 128, Loss: 0.05106734111905098, Accuracy: 0.984375
Train-Iteration: 256, Loss: 0.0331578329205513, Accuracy: 1.0
Train-Iteration: 384, Loss: 0.008251325227320194, Accuracy: 1.0
Train-Iteration: 512, Loss: 0.06776051223278046, Accuracy: 0.984375
Train-Iteration: 640, Loss: 0.031793735921382904, Accuracy: 0.984375
Train-Iteration: 768, Loss: 0.08528926223516464, Accuracy: 0.984375
Train-Iteration: 896, Loss: 0.11919710040092468, Accuracy: 0.96875
Train-Iteration: 128, Loss: 0.024348294362425804, Accuracy: 0.984375
Train-Iteration: 256, Loss: 0.0319991260766983, Accuracy: 0.984375
Train-Iteration: 384, Loss: 0.02634349837899208, Accuracy: 0.984375
Train-Iteration: 512, Loss: 0.016720261424779892, Accuracy: 1.0
Train-Iteration: 640, Loss: 0.03306460753083229, Accuracy: 0.984375
Train-Iteration: 768, Loss: 0.027682529762387276, Accuracy: 0.984375
Train-Iteration: 896, Loss: 0.003526151878759265, Accuracy: 1.0
Train-Iteration: 128, Loss: 0.008680121041834354, Accuracy: 1.0
Train-Iteration: 256, Loss: 0.021030602976679802, Accuracy: 0.984375
Train-Iteration: 384, Loss: 0.00915486179292202, Accuracy: 1.0
Train-Iteration: 512, Loss: 0.018900636583566666, Accuracy: 1.0
Train-Iteration: 640, Loss: 0.005776845384389162, Accuracy: 1.0
Train-Iteration: 768, Loss: 0.03194323927164078, Accuracy: 0.984375
Train-Iteration: 896, Loss: 0.07372798025608063, Accuracy: 0.96875
Train-Iteration: 128, Loss: 0.002314481418579817, Accuracy: 1.0
Train-Iteration: 256, Loss: 0.0018324761185795069, Accuracy: 1.0
Train-Iteration: 384, Loss: 0.007435821462422609, Accuracy: 1.0
Train-Iteration: 512, Loss: 0.022387122735381126, Accuracy: 0.984375
Train-Iteration: 640, Loss: 0.0278813187032938, Accuracy: 0.984375
Train-Iteration: 768, Loss: 0.02152453176677227, Accuracy: 0.984375
Train-Iteration: 896, Loss: 0.1327311396598816, Accuracy: 0.984375
Train-Iteration: 128, Loss: 0.0011456963839009404, Accuracy: 1.0
Train-Iteration: 256, Loss: 0.03393447771668434, Accuracy: 0.984375
Train-Iteration: 384, Loss: 0.021737735718488693, Accuracy: 1.0
Train-Iteration: 512, Loss: 0.0008623672183603048, Accuracy: 1.0
Train-Iteration: 640, Loss: 0.06686681509017944, Accuracy: 0.96875
Train-Iteration: 768, Loss: 0.007728107739239931, Accuracy: 1.0
Train-Iteration: 896, Loss: 0.021840423345565796, Accuracy: 0.984375
Train-Iteration: 128, Loss: 0.01483811903744936, Accuracy: 0.984375
Train-Iteration: 256, Loss: 0.00077625218546018, Accuracy: 1.0
Train-Iteration: 384, Loss: 0.01038976851850748, Accuracy: 1.0
Train-Iteration: 512, Loss: 0.00223970552906394, Accuracy: 1.0
Train-Iteration: 640, Loss: 0.005060031544417143, Accuracy: 1.0
Train-Iteration: 768, Loss: 0.0011778954649344087, Accuracy: 1.0
Train-Iteration: 896, Loss: 0.003516174852848053, Accuracy: 1.0
Train-Iteration: 128, Loss: 0.0017609259812161326, Accuracy: 1.0
Train-Iteration: 256, Loss: 0.029003959149122238, Accuracy: 0.984375
Train-Iteration: 384, Loss: 0.09368067234754562, Accuracy: 0.96875
Train-Iteration: 512, Loss: 0.018030766397714615, Accuracy: 1.0
Train-Iteration: 640, Loss: 0.0006788636092096567, Accuracy: 1.0
Train-Iteration: 768, Loss: 0.036278121173381805, Accuracy: 0.984375
Train-Iteration: 896, Loss: 0.025945255532860756, Accuracy: 0.984375
Train-Iteration: 128, Loss: 0.0005539588164538145, Accuracy: 1.0
Train-Iteration: 256, Loss: 0.008682118728756905, Accuracy: 1.0
Train-Iteration: 384, Loss: 0.009658947587013245, Accuracy: 1.0
Train-Iteration: 512, Loss: 0.0013244193978607655, Accuracy: 1.0
Train-Iteration: 640, Loss: 0.002140435390174389, Accuracy: 1.0
Train-Iteration: 768, Loss: 0.001486493507400155, Accuracy: 1.0
Train-Iteration: 896, Loss: 0.0798792764544487, Accuracy: 0.96875
Train-Iteration: 128, Loss: 0.07495461404323578, Accuracy: 0.984375
Train-Iteration: 256, Loss: 0.013753404840826988, Accuracy: 0.984375
Train-Iteration: 384, Loss: 0.011747729033231735, Accuracy: 1.0
Train-Iteration: 512, Loss: 0.02869034744799137, Accuracy: 0.984375
Train-Iteration: 640, Loss: 0.020220700651407242, Accuracy: 0.984375
Train-Iteration: 768, Loss: 0.0019553883466869593, Accuracy: 1.0
Train-Iteration: 896, Loss: 0.017024043947458267, Accuracy: 1.0
Train-Iteration: 128, Loss: 0.0014674750855192542, Accuracy: 1.0
Train-Iteration: 256, Loss: 0.0016279949340969324, Accuracy: 1.0
Train-Iteration: 384, Loss: 0.01275317370891571, Accuracy: 1.0
Train-Iteration: 512, Loss: 0.0001198670724988915, Accuracy: 1.0
Train-Iteration: 640, Loss: 0.0069814156740903854, Accuracy: 1.0
Train-Iteration: 768, Loss: 0.012434744276106358, Accuracy: 0.984375
Train-Iteration: 896, Loss: 0.022211426869034767, Accuracy: 0.984375
Train-Iteration: 128, Loss: 0.0005716342129744589, Accuracy: 1.0
Train-Iteration: 256, Loss: 0.0022972715087234974, Accuracy: 1.0
Train-Iteration: 384, Loss: 0.0018064221367239952, Accuracy: 1.0
Train-Iteration: 512, Loss: 0.000177455905941315, Accuracy: 1.0
Train-Iteration: 640, Loss: 0.00019640527898445725, Accuracy: 1.0
Train-Iteration: 768, Loss: 0.0232881810516119, Accuracy: 0.984375
Train-Iteration: 896, Loss: 0.02891821600496769, Accuracy: 0.984375
Train-Iteration: 128, Loss: 0.019340503960847855, Accuracy: 0.984375
Train-Iteration: 256, Loss: 0.0020200666040182114, Accuracy: 1.0
Train-Iteration: 384, Loss: 0.00010180178651353344, Accuracy: 1.0
Train-Iteration: 512, Loss: 0.000519769499078393, Accuracy: 1.0
Train-Iteration: 640, Loss: 0.05370203033089638, Accuracy: 0.984375
Train-Iteration: 768, Loss: 0.0014313850551843643, Accuracy: 1.0
Train-Iteration: 896, Loss: 4.0087190427584574e-05, Accuracy: 1.0
# Evaluation
with torch.no_grad():
acc = 0
for idx, (images, labels) in enumerate(test_loader):
images, labels = images.float().to(device), labels.long().to(device)
# Extract output of single layer
hypothesis = layer(images)
# Calculate Cross-Entropy Loss
cost = F.cross_entropy(input=hypothesis, target=labels)
# Calculate Accuracy
prob = hypothesis.softmax(dim=1)
pred = prob.argmax(dim=1)
acc += pred.eq(labels).float().mean()
print(f"Test-Accuracy: {acc / (len(test_loader))}")
Test-Accuracy: 0.9786027073860168
728x90