base
BaseDataModule(data_path, name='base_datamodule', num_workers=16, batch_size=32, seed=42, load_aug_images=False, aug_name=None, n_aug_to_take=None, replace_str_from=None, replace_str_to=None, train_transform=None, val_transform=None, test_transform=None, enable_hashing=True, hash_size=64, hash_type='content')
¶
Bases: LightningDataModule
Base class for all data modules.
Parameters:
-
data_path
(
str
) –Path to the data main folder.
-
name
(
str
, default:'base_datamodule'
) –The name for the data module. Defaults to "base_datamodule".
-
num_workers
(
int
, default:16
) –Number of workers for dataloaders. Defaults to 16.
-
batch_size
(
int
, default:32
) –Batch size. Defaults to 32.
-
seed
(
int
, default:42
) –Random generator seed. Defaults to 42.
-
train_transform
(
Optional[Compose]
, default:None
) –Transformations for train dataset. Defaults to None.
-
val_transform
(
Optional[Compose]
, default:None
) –Transformations for validation dataset. Defaults to None.
-
test_transform
(
Optional[Compose]
, default:None
) –Transformations for test dataset. Defaults to None.
-
enable_hashing
(
bool
, default:True
) –Whether to enable hashing of images. Defaults to True.
-
hash_size
(
Literal[32, 64, 128]
, default:64
) –Size of the hash. Must be one of [32, 64, 128]. Defaults to 64.
-
hash_type
(
Literal['content', 'size']
, default:'content'
) –Type of hash to use, if content hash is used, the hash is computed on the file content, otherwise the hash is computed on the file size which is faster but less safe. Defaults to "content".
Source code in quadra/datamodules/base.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
|
test_data: pd.DataFrame
property
¶
Get test data.
test_dataset_available: bool
property
¶
Checks if the test dataset is available.
train_data: pd.DataFrame
property
¶
Get train data.
train_dataset_available: bool
property
¶
Checks if the train dataset is available.
val_data: pd.DataFrame
property
¶
Get validation data.
val_dataset_available: bool
property
¶
Checks if the validation dataset is available.
__getstate__()
¶
This method is called when pickling the object. It's useful to remove attributes that shouldn't be pickled.
Source code in quadra/datamodules/base.py
285 286 287 288 289 290 291 292 293 294 |
|
hash_data()
¶
Computes the hash of the files inside the datasets.
Source code in quadra/datamodules/base.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|
load_augmented_samples(samples, targets, replace_str_from=None, replace_str_to=None, shuffle=False)
¶
Loads augmented samples.
Source code in quadra/datamodules/base.py
337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 |
|
prepare_data()
¶
Prepares the data, should be overridden by subclasses.
Source code in quadra/datamodules/base.py
276 277 278 279 280 281 282 283 |
|
restore_checkpoint()
¶
Loads the data from disk, utility function that should be called from setup.
Source code in quadra/datamodules/base.py
323 324 325 326 327 328 329 330 331 332 333 334 |
|
save_checkpoint()
¶
Saves the datamodule to disk, utility function that is called from prepare_data. We are required to save datamodule to disk because we can't assign attributes to the datamodule in prepare_data when working with multiple gpus.
Source code in quadra/datamodules/base.py
296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 |
|
DecorateParentMethod
¶
Bases: type
Metaclass to decorate methods of subclasses.
__new__(name, bases, dct)
¶
Create new decorator for parent class methods.
Source code in quadra/datamodules/base.py
41 42 43 44 45 46 47 48 49 50 |
|
compute_file_content_hash(path, hash_size=64)
¶
Get hash of a file based on its content.
Parameters:
-
path
(
str
) –Path to the file.
-
hash_size
(
Literal[32, 64, 128]
, default:64
) –Size of the hash. Must be one of [32, 64, 128].
Returns:
-
str
–The hash of the file.
Source code in quadra/datamodules/base.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
compute_file_size_hash(path, hash_size=64)
¶
Get hash of a file based on its size.
Parameters:
-
path
(
str
) –Path to the file.
-
hash_size
(
Literal[32, 64, 128]
, default:64
) –Size of the hash. Must be one of [32, 64, 128].
Returns:
-
str
–The hash of the file.
Source code in quadra/datamodules/base.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
istarmap(self, func, iterable, chunksize=1)
¶
Starmap-version of imap.
Source code in quadra/datamodules/base.py
102 103 104 105 106 107 108 109 110 111 112 113 |
|
load_data_from_disk_dec(func)
¶
Load data from disk if it exists.
Source code in quadra/datamodules/base.py
25 26 27 28 29 30 31 32 33 34 35 |
|