모델 학습을 개선하는 4가지 테크닉

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

치즈의 AI 녹이기

모델 학습을 개선하는 4가지 테크닉 본문

인공지능 대학원생의 생활/구글링

모델 학습을 개선하는 4가지 테크닉

개발자 치즈 2021. 11. 25. 18:33

1. torch.cuda.amp 활용하기

일부 작업에 대하여 float32에서 float16으로(또는 반대) 바꾸어 각 연산을 적절한 데이터 유형과 일치시킴으로써 학습 속도를 좀 더 빠르게 할 수 있다.

autocast(torch.cuda.amp와 동일)은 네트워크의 순방향 패스(forward + loss)에만 적용되어야 한다.

https://runebook.dev/ko/docs/pytorch/amp

2. Gradient Accumulation

gradient를 특정 배치 주기까지 모았다가 한번에 업데이트하여 적은 메모리 환경에서 작은 배치사이즈로 큰 배치사이즈를 사용하는 효과를 기대한다. 큰 배치사이즈를 사용함으로써 학습 시 정보의 노이즈를 제거하고 더 나은 gradient descent를 수행할 수 있다.

https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255

3. Gradient Clipping

gradient exploding을 방지하여 학습의 안정화를 도모할 수 있다.

gradient가 일정 threshold를 넘어가면 gradient의 L2 norm으로 나눠준다.

learning rate를 작게 하는 것과 비교했을 때와 같은 효과를 얻는다.

4. Stochastic Weight Averaging (SWA)

시간에 흐름에 따라 다른 weight을 가진 똑같은 모델을 ensembling하는 방법이다.

학습 파라미터 w 외에 weight_swa을 따로 저장하여 일정 주기마다 weight을 average하여 weight_swa에 업데이트한다.

https://hoya012.github.io/blog/Image-Classification-with-Stochastic-Weight-Averaging-Training-PyTorch-Tutorial/

torch SWA 사용하기 :

https://runebook.dev/ko/docs/pytorch/optim

PyTorch - torch.optim - torch.optim 은 다양한 최적화 알고리즘을 구현하는 패키지입니다. 가장 일반적으

torch.optim 은 다양한 최적화 알고리즘을 구현하는 패키지입니다. 가장 일반적으로 사용되는 방법은 이미 지원되고 있으며 인터페이스는 충분히 일반적이므로 향후 더 복잡한 방법도 쉽게 통합

runebook.dev

SWA 구현 GITHUB :

https://github.com/timgaripov/swa/blob/master/train.py

GitHub - timgaripov/swa: Stochastic Weight Averaging in PyTorch

Stochastic Weight Averaging in PyTorch. Contribute to timgaripov/swa development by creating an account on GitHub.

github.com

SWA 참고 링크:

https://towardsdatascience.com/stochastic-weight-averaging-a-new-way-to-get-state-of-the-art-results-in-deep-learning-c639ccf36a

Stochastic Weight Averaging — a New Way to Get State of the Art Results in Deep Learning

Update: you can now enjoy this post on my personal blog, where math typography is much better (Medium doesn’t support math rendering…

towardsdatascience.com

'인공지능 대학원생의 생활 > 구글링' 카테고리의 다른 글

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. (0)	2021.12.18
Permission denied (0)	2021.12.18
Knowledge Distillation (0)	2021.10.28
Beam Search (0)	2021.10.28
Git에 requirements.txt 생성하기 (0)	2021.07.27

'인공지능 대학원생의 생활/구글링' Related Articles

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

치즈의 AI 녹이기

치즈의 AI 녹이기

모델 학습을 개선하는 4가지 테크닉 본문

모델 학습을 개선하는 4가지 테크닉

1. torch.cuda.amp 활용하기

2. Gradient Accumulation

3. Gradient Clipping

'인공지능 대학원생의 생활 > 구글링' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역