Pytorch classification example¶

In this page, we will show you how to run a classification experiment exploiting Pytorch and Pytorch Lightning to finetune or train from scratch a model on a custom dataset.

This example will demonstrate how to create a custom experiment starting from default settings.

Training¶

Dataset¶

Let's start with the dataset that we are going to use. Since we are using the classification datamodule, images must be arranged in a folder structure that reflects the classes' partition.

Suppose we have a dataset with the following structure:

dataset/
├── class_1
│   ├── abc.xyz
│   └── ...
├── class_2
│   ├── abc.xyz
│   └── ...
├── class_3 
│   ├── abc.xyz
│   └── ...
├── test.txt # optional
├── train.txt # optional
└── val.txt # optional

The standard datamodule configuration for classification is found under configs/datamodule/base/classification.yaml.

_target_: quadra.datamodules.classification.ClassificationDataModule
data_path: ???
exclude_filter: [".ipynb_checkpoints"]
include_filter:
seed: ${core.seed}
num_workers: 8
batch_size: 16
test_size: 0.2
val_size: 0.2
train_transform: ${transforms.train_transform}
test_transform: ${transforms.test_transform}
val_transform: ${transforms.val_transform}
train_split_file:
test_split_file:
val_split_file:
label_map:
class_to_idx:
name:
dataset:
  _target_: hydra.utils.get_method
  path: quadra.datasets.classification.ClassificationDataset

We give a small description for each tweakable parameter inside the base datamodule config:

data_path: Path to root_folder. "???" denotes mandatory parameters.
exclude_filter: If an image path contains one of the strings in this list, it will be ignored.
include_filter: If an image path does not contain one of the strings in this list, it will be ignored.
seed: Seed for experiment reproducibility (if training is on gpu, complete reproducibility can not be ensured).
num_workers: Number of workers used by the dataloaders (same for train/val/test).
bath_size: Batch size for the dataloaders (same for train/val/test).
test_size: If no test_split_file is provided, test_size * len(training_set) will be put in test set.
val_size: If no val_split_file is provided, val_size * len(remaining_training_set) will be put in validation set.
label_map: You can map classes to other ones in order to group more sub-classes into one macro-class.
class_to_idx: Map classes to indexes, if you need to ensure that one specific mapping is respected (otherwise a ordered class_to_idx is built)

If you want to choose which files to put in training/val/test sets, provide the split files' paths in the respective fields in the datamodule config (train_split_file, val_split_file, test_split_file). You can leave val_split_file empty and the training set will be automatically split in train/val.

The content of the files must be formatted in the same way for train.txt/val.txt/test.txt:

images/abc.xyz,class_1,class_2
images/abc_2.xyz,class_3

The first column is the path to the image, while the other columns are the labels associated with that image. The labels must be separated by a comma.

Experiment¶

Suppose that we want to run the experiment on the given dataset, we can define a config starting from the base config (found under configs/experiment/base/classification/classification.yaml).

# @package _global_
defaults:
  - override /backbone: resnet18
  - override /datamodule: base/classification
  - override /loss: cross_entropy
  - override /model: classification
  - override /optimizer: adam
  - override /task: classification
  - override /scheduler: rop
  - override /transforms: default_resize

datamodule:
  num_workers: 8
  batch_size: 32
  data_path: ???

print_config: true

model:
  module:
    lr_scheduler_interval: "epoch"

task:
  lr_multiplier: 0.0
  gradcam: true
  run_test: True
  report: True
  output:
    example: True


core:
  tag: "run"
  upload_artifacts: true
  name: classification_base_${trainer.max_epochs}

logger:
  mlflow:
    experiment_name: classification_base
    run_name: ${core.name}

backbone:
  model:
    pretrained: True
    freeze: False
  freeze_parameters_name:

trainer:
  precision: 32
  max_epochs: 200
  check_val_every_n_epoch: 1
  log_every_n_steps: 1
  devices: [0]

scheduler:
  patience: 20
  factor: 0.9
  verbose: False
  threshold: 0.01

callbacks:
  early_stopping:
    _target_: pytorch_lightning.callbacks.EarlyStopping
    monitor: val_loss_epoch
    min_delta: 0.01
    mode: min
    patience: 35
    verbose: false
    stopping_threshold: 0
  model_checkpoint:
    monitor: val_loss_epoch

The base experiment will train a resnet18 (by default pretrained on Imagenet) for 200 epochs using Adam as optimizer and reducing the learning rate on plateaus.

We can define a custom experiment starting from the base one, and override the parameters that we want to change. Suppose we create a yaml configuration under configs/experiment/custom_experiment/torch_classification.yaml with the following content:

# @package _global_
defaults:
  - base/classification/classification
  - override /backbone: vit16_tiny
  - _self_

export:
  types: [onnx, torchscript]

datamodule:
  num_workers: 12
  batch_size: 32
  data_path: path/to/experiment/dataset
  class_to_idx:
    class_1: 0
    class_2: 1
    class_3: 2

task:
  gradcam: True # Enable gradcam computation during evaluation
  run_test: True # Perform test evaluation at the end of training
  report: True 
  output:
    example: True # Generate an example of concordants and discordants predictions for each class


model:
  module:
    lr_scheduler_interval: "epoch"

backbone:
  model:
    pretrained: True
    freeze: False
  freeze_parameters_name:
    - conv1
    - bn1
    - layer1
    - layer2

core:
  tag: "run"
  name: "train_core_name"

logger:
  mlflow:
    experiment_name: name_of_the_experiment
    run_name: ${core.name}

Warning

Remember to set the mandatory parameter "data_path".

Run¶

Assuming that you have created a virtual environment and installed the quadra library, you can run the experiment by running the following command:

quadra experiment=custom_experiment/torch_classification

This should produce the following output files:

checkpoints           config_tree.txt  deployment_model  test
config_resolved.yaml  data             main.log

Where checkpoints contains the pytorch lightning checkpoints of the model, data contains the joblib dump of the datamodule with its parameters and dataset split, deployment_model contains the model in exported format (in this case onnx and torchscript, but by default is only torchscript), test contains the test artifacts.

Evaluation¶

Experiment¶

The same datamodule specified before can be used for inference. There are different modalities to define the test-set. The simplest one is setting test_size=1.0 (remember the .0) and data_path=path/to/another_root_folder, where "another_root_folder" has the same structure as the root_folder described at the start of this document, but it contains only images you want to use for tests. Another possibility is to pass a test_split_file to the datamodule config:

test_split_file: path/to/test_split_file.txt

Where test_split_file is a simple .txt file structured in this way:

class_1/image1.png
class_1/image2.png
...
class_2/image1.png
class_2/image2.png
...

Where each line contains the relative path to the image from the data_path folder.

The default experiment configuration can be found at configs/experiment/base/classification/classification_evaluation.yaml:

# @package _global_
defaults:
  - override /datamodule: base/classification
  - override /transforms: default_resize

datamodule:
  num_workers: 6
  batch_size: 32

core:
  tag: "run"
  upload_artifacts: true
  name: classification_evalutation_base

logger:
  mlflow:
    experiment_name: name_of_the_experiment
    run_name: ${core.name}

task:
  _target_: quadra.tasks.ClassificationEvaluation
  gradcam: true
  output:
    example: true
  model_path: ???

Given that we don't have to set all the training-related parameters, the evaluation experiment .yaml file will be much simpler, suppose it is saved under configs/experiment/custom_experiment/torch_classification_evaluation.yaml:

# @package _global_
defaults:
  - base/classification/classification_evaluation
  - _self_

datamodule:
  num_workers: 6
  batch_size: 32
  data_path: path/to/test/dataset
  test_size: 1.0
  class_to_idx:
    class_1: 0
    class_2: 1
    class_3: 2

core:
  tag: "run"
  upload_artifacts: true
  name: eval_core_name

task:
  output:
      example: true
  model_path: path/to/model.pth

Notice that we must provide the path to a deployment model file that will be used to perform inferences. In this case class_to_idx is mandatory (we can not infer it from a test-set). We suggest to be careful to set the same class_to_idx that has been used to train the model.

Run¶

Just as before, assuming that you have created a virtual environment and installed the quadra library, you can run the experiment by running the following command:

quadra experiment=custom_experiment/torch_classification_evaluation

This will compute the metrics on the test-set and since example is set to true it will generate an example of concordants and discordants predictions for each class.