Skip to content

patch

PatchSklearnClassificationTrainDataset(data_path, samples, targets, class_to_idx=None, resize=None, transform=None, rgb=True, channel=3, balance_classes=False)

Bases: Dataset

Dataset used for patch sampling, it expects samples to be paths to h5 files containing all the required information for patch sampling from images.

Parameters:

  • data_path (str) –

    base path to the dataset

  • samples (list[str]) –

    Paths to h5 files

  • targets (list[str | int]) –

    Labels associated with each sample

  • class_to_idx (dict | None, default: None ) –

    Mapping between class and corresponding index

  • resize (int | None, default: None ) –

    Whether to perform an aspect ratio resize of the patch before the transformations

  • transform (Callable | None, default: None ) –

    Optional function applied to the image

  • rgb (bool, default: True ) –

    if False, image will be converted in grayscale

  • channel (int, default: 3 ) –

    1 or 3. If rgb is True, then channel will be set at 3.

  • balance_classes (bool, default: False ) –

    if True, the dataset will be balanced by duplicating samples of the minority class

Source code in quadra/datasets/patch.py
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def __init__(
    self,
    data_path: str,
    samples: list[str],
    targets: list[str | int],
    class_to_idx: dict | None = None,
    resize: int | None = None,
    transform: Callable | None = None,
    rgb: bool = True,
    channel: int = 3,
    balance_classes: bool = False,
):
    super().__init__()

    # Keep-Aspect-Ratio resize
    self.resize = resize
    self.data_path = data_path

    if balance_classes:
        samples_array = np.array(samples)
        targets_array = np.array(targets)
        samples_to_use: list[str] = []
        targets_to_use: list[str | int] = []

        cls, counts = np.unique(targets_array, return_counts=True)
        max_count = np.max(counts)
        for cl, count in zip(cls, counts):
            idx_to_pick = list(np.where(targets_array == cl)[0])

            if count < max_count:
                idx_to_pick += random.choices(idx_to_pick, k=max_count - count)

            samples_to_use.extend(samples_array[idx_to_pick])
            targets_to_use.extend(targets_array[idx_to_pick])
    else:
        samples_to_use = samples
        targets_to_use = targets

    # Data
    self.x = np.array(samples_to_use)
    self.y = np.array(targets_to_use)

    if class_to_idx is None:
        unique_targets = np.unique(targets_to_use)
        class_to_idx = {c: i for i, c in enumerate(unique_targets)}

    self.class_to_idx = class_to_idx
    self.idx_to_class = {v: k for k, v in class_to_idx.items()}

    self.samples = [
        (path, self.class_to_idx[self.y[i]] if self.y[i] is not None else None) for i, path in enumerate(self.x)
    ]

    self.rgb = rgb
    self.channel = 3 if rgb else channel

    self.transform = transform