FracGrad: A Discretized Riemann-Liouville Fractional Integral Approach to Gradient Accumulation for Deep Learning

Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Gradient accumulation enables training large-scale deep learning models under GPU memory constraints by aggregating gradients across multiple microbatches before parameter updates. Standard gradient accumulation treats all microbatches uniformly through simple averaging, implicitly assuming that all stochastic gradient estimates are equally reliable. This assumption becomes problematic in non-convex optimization where gradient variance across microbatches is high, causing some gradient estimates to be noisy and less representative of the true descent direction. In this paper, FracGrad is proposed, a simple weighting scheme for gradient accumulation that biases toward recent microbatches via a power-law schedule derived from a discretized Riemann-Liouville integral. Unlike uniform summation, FracGrad reweights each microbatch gradient by wi(alpha)=(N-i+1)alpha-(N-i)alpha & sum;j=1N[(N-j+1)alpha-(N-j)alpha], controlled by alpha is an element of(0,1]. When alpha=1, standard accumulation is recovered. In experiments on mini-ImageNet with ResNet-18 using up to N=32 accumulation steps, the best FracGrad variant with alpha=0.1 improves test accuracy from 16.99% to 31.35% at N=16. Paired t-tests yield p approximate to 2x10-6.

키워드

gradient accumulationfractional calculusmemory-efficient trainingstochastic optimizationpower-law weightingdeep learningnon-convex optimization
제목
FracGrad: A Discretized Riemann-Liouville Fractional Integral Approach to Gradient Accumulation for Deep Learning
저자
Lee, Minhyeok
DOI
10.3390/fractalfract9110733
발행일
2025-11
유형
Article
저널명
FRACTAL AND FRACTIONAL
9
11

파일 다운로드