Skip to content

warmup

CosineAnnealingWithLinearWarmUp(optimizer, batch_size, total_epochs, init_lr=(0.01), lr_scale=256.0, linear_warmup_epochs=10, lr_reduce_factor=0.001, len_loader=None, scheduler_interval='epoch')

Bases: LearningRateScheduler

Cosine learning rate scheduler with linear warmup.

Parameters:

  • optimizer (Optimizer) –

    optimizer for which the learning rate has to be optimized. If your are using this scheduler, than you have set the learning rate of the optimizer to 0

  • batch_size (int) –

    global batch size of the data loader. For more information please take a look at https://pytorch-lightning.readthedocs.io/en/latest/advanced/multi_gpu.html?highlight=batch%20size#batch-size

  • total_epochs (int) –

    the total number of epochs

  • init_lr (tuple[float, ...], default: (0.01) ) –

    The initial learning rate, one for every param_group. Mind that the learning rate it's linearly scaled by batch_size / lr_scale, as specified by https://arxiv.org/abs/1706.02677. Defaults to 0.01.

  • lr_scale (float, default: 256.0 ) –

    the learning rate scheduler. Mind that the learning rate it's linearly scaled by batch_size / lr_scale as specified by https://arxiv.org/abs/1706.02677. Defaults to 256.

  • linear_warmup_epochs (int, default: 10 ) –

    how many epochs for the initial linear learning rate scaling. Defaults to 10.

  • lr_reduce_factor (float, default: 0.001 ) –

    factor to be multiplied by scaled lr (init_lr * batch_size / lr_scale) to avoid reaching 0 lr at the end of training.

  • len_loader (int | None, default: None ) –

    number of batches in a given dataloader. Remind that the len_loader must be divided by total number of gpus used during the training. If one specifies the len_loader parameter, then the unit measure for the lr update will be in steps (number of batches), not in epochs. Defaults to None.

  • scheduler_interval (str, default: 'epoch' ) –

    'step' or 'epoch'. If 'step' then the scheduler expects 'len_loader' to be not None. Defaults to epoch.

Source code in quadra/schedulers/warmup.py
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def __init__(
    self,
    optimizer: torch.optim.Optimizer,
    batch_size: int,
    total_epochs: int,
    init_lr: tuple[float, ...] = (0.01,),
    lr_scale: float = 256.0,
    linear_warmup_epochs: int = 10,
    lr_reduce_factor: float = 0.001,
    len_loader: int | None = None,
    scheduler_interval: str = "epoch",
) -> None:
    super().__init__(optimizer, init_lr)
    assert batch_size > 0
    assert total_epochs > 0
    assert lr_scale != 0
    assert linear_warmup_epochs >= 0
    assert lr_reduce_factor > 0
    assert scheduler_interval.lower() in ["step", "epoch"]
    self.batch_size = batch_size
    self.total_epochs = total_epochs
    self.linear_warmup_epochs = linear_warmup_epochs
    self.base_lr_scale = self.batch_size / lr_scale
    self.updates_counter = 1
    self.init_lr = tuple(lr * self.base_lr_scale for lr in init_lr)
    self.lr = self.init_lr
    self.lr_reduce_factor = lr_reduce_factor

    self.scheduler_interval = scheduler_interval.lower()
    if self.scheduler_interval == "step":
        assert len_loader is not None and len_loader > 0
        self.total_epochs = len_loader * self.total_epochs
        self.linear_warmup_epochs = len_loader * self.linear_warmup_epochs

step()

Update the learning rate for the current step.

Source code in quadra/schedulers/warmup.py
116
117
118
119
120
121
122
123
124
125
126
127
def step(self):
    """Update the learning rate for the current step."""
    self.lr = cosine_annealing_with_warmup(
        self.init_lr,
        self.updates_counter,
        self.total_epochs,
        self.linear_warmup_epochs,
        self.lr_reduce_factor,
    )
    self.set_lr(self.lr)
    self.updates_counter += 1
    return self.lr

cosine_annealing_with_warmup(init_lrs, step, total_steps, warmup_steps, lr_reduce_factor=0.001)

Cosine learning rate scheduler with linear warmup helper function.

Parameters:

  • init_lrs (list[float]) –

    The initial learning rate, one for every param_group.

  • step (int) –

    the current step

  • total_steps (int) –

    the total steps

  • warmup_steps (int) –

    total linear warmup steps

  • lr_reduce_factor (float, default: 0.001 ) –

    reduce factor for the initial learning rate. This is used to set the minimum learning rate as init_lr[i] * lr_reduce_factor Defaults to 0.001.

Returns:

  • list[float]

    Annealed learning rate for this step

Source code in quadra/schedulers/warmup.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def cosine_annealing_with_warmup(
    init_lrs: list[float],
    step: int,
    total_steps: int,
    warmup_steps: int,
    lr_reduce_factor: float = 0.001,
) -> list[float]:
    """Cosine learning rate scheduler with linear warmup helper function.

    Args:
        init_lrs: The initial learning rate, one for every `param_group`.
        step: the current step
        total_steps: the total steps
        warmup_steps: total linear warmup steps
        lr_reduce_factor: reduce factor for the initial learning
            rate. This is used to set the minimum learning rate as
            `init_lr[i] * lr_reduce_factor`
            Defaults to 0.001.

    Returns:
        Annealed learning rate for this `step`
    """
    lrs = []
    for init_lr in init_lrs:
        if step < warmup_steps:
            lr = init_lr * step / warmup_steps
        else:
            step -= warmup_steps
            total_steps -= warmup_steps
            q = 0.5 * (1 + math.cos(math.pi * step / total_steps))
            end_lr = init_lr * lr_reduce_factor
            lr = init_lr * q + end_lr * (1 - q)
        lrs.append(lr)
    return lrs