warmup
CosineAnnealingWithLinearWarmUp(optimizer, batch_size, total_epochs, init_lr=(0.01), lr_scale=256.0, linear_warmup_epochs=10, lr_reduce_factor=0.001, len_loader=None, scheduler_interval='epoch')
¶
Bases: LearningRateScheduler
Cosine learning rate scheduler with linear warmup.
Parameters:
-
optimizer
(
Optimizer
) –optimizer for which the learning rate has to be optimized. If your are using this scheduler, than you have set the learning rate of the optimizer to 0
-
batch_size
(
int
) –global batch size of the data loader. For more information please take a look at https://pytorch-lightning.readthedocs.io/en/latest/advanced/multi_gpu.html?highlight=batch%20size#batch-size
-
total_epochs
(
int
) –the total number of epochs
-
init_lr
(
tuple[float, ...]
, default:(0.01)
) –The initial learning rate, one for every
param_group
. Mind that the learning rate it's linearly scaled bybatch_size
/lr_scale
, as specified by https://arxiv.org/abs/1706.02677. Defaults to 0.01. -
lr_scale
(
float
, default:256.0
) –the learning rate scheduler. Mind that the learning rate it's linearly scaled by
batch_size
/lr_scale
as specified by https://arxiv.org/abs/1706.02677. Defaults to 256. -
linear_warmup_epochs
(
int
, default:10
) –how many epochs for the initial linear learning rate scaling. Defaults to 10.
-
lr_reduce_factor
(
float
, default:0.001
) –factor to be multiplied by scaled lr (init_lr * batch_size / lr_scale) to avoid reaching 0 lr at the end of training.
-
len_loader
(
int | None
, default:None
) –number of batches in a given dataloader. Remind that the
len_loader
must be divided by total number of gpus used during the training. If one specifies thelen_loader
parameter, then the unit measure for the lr update will be in steps (number of batches), not in epochs. Defaults to None. -
scheduler_interval
(
str
, default:'epoch'
) –'step' or 'epoch'. If 'step' then the scheduler expects 'len_loader' to be not None. Defaults to
epoch
.
Source code in quadra/schedulers/warmup.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
step()
¶
Update the learning rate for the current step.
Source code in quadra/schedulers/warmup.py
116 117 118 119 120 121 122 123 124 125 126 127 |
|
cosine_annealing_with_warmup(init_lrs, step, total_steps, warmup_steps, lr_reduce_factor=0.001)
¶
Cosine learning rate scheduler with linear warmup helper function.
Parameters:
-
init_lrs
(
list[float]
) –The initial learning rate, one for every
param_group
. -
step
(
int
) –the current step
-
total_steps
(
int
) –the total steps
-
warmup_steps
(
int
) –total linear warmup steps
-
lr_reduce_factor
(
float
, default:0.001
) –reduce factor for the initial learning rate. This is used to set the minimum learning rate as
init_lr[i] * lr_reduce_factor
Defaults to 0.001.
Returns:
Source code in quadra/schedulers/warmup.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|