Utilities for Trainer¶
This page lists all the utility functions used by Trainer.
Most of those are only useful if you are studying the code of the Trainer in the library.
Utilities¶
-
class
transformers.EvalPrediction(predictions: Union[numpy.ndarray, Tuple[numpy.ndarray]], label_ids: numpy.ndarray)[source]¶ Evaluation output (always contains labels), to be used to compute metrics.
- Parameters
predictions (
np.ndarray) – Predictions of the model.label_ids (
np.ndarray) – Targets to be matched.
Callbacks internals¶
Distributed Evaluation¶
-
class
transformers.trainer_pt_utils.DistributedTensorGatherer(world_size, num_samples, make_multiple_of=None, padding_index=- 100)[source]¶ A class responsible for properly gathering tensors (or nested list/tuple of tensors) on the CPU by chunks.
If our dataset has 16 samples with a batch size of 2 on 3 processes and we gather then transfer on CPU at every step, our sampler will generate the following indices:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1]to get something of size a multiple of 3 (so that each process gets the same dataset length). Then process 0, 1 and 2 will be responsible of making predictions for the following samples:
P0:
[0, 1, 2, 3, 4, 5]P1:
[6, 7, 8, 9, 10, 11]P2:
[12, 13, 14, 15, 0, 1]
The first batch treated on each process will be
P0:
[0, 1]P1:
[6, 7]P2:
[12, 13]
So if we gather at the end of the first batch, we will get a tensor (nested list/tuple of tensor) corresponding to the following indices:
[0, 1, 6, 7, 12, 13]If we directly concatenate our results without taking any precautions, the user will then get the predictions for the indices in this order at the end of the prediction loop:
[0, 1, 6, 7, 12, 13, 2, 3, 8, 9, 14, 15, 4, 5, 10, 11, 0, 1]For some reason, that’s not going to roll their boat. This class is there to solve that problem.
- Parameters
world_size (
int) – The number of processes used in the distributed training.num_samples (
int) – The number of samples in our dataset.make_multiple_of (
int, optional) – If passed, the class assumes the datasets passed to each process are made to be a multiple of this argument (by adding samples).padding_index (
int, optional, defaults to -100) – The padding index to use if the arrays don’t all have the same sequence length.
Distributed Evaluation¶
-
class
transformers.HfArgumentParser(dataclass_types: Union[NewType.<locals>.new_type, Iterable[NewType.<locals>.new_type]], **kwargs)[source]¶ This subclass of argparse.ArgumentParser uses type hints on dataclasses to generate arguments.
The class is designed to play well with the native argparse. In particular, you can add more (non-dataclass backed) arguments to the parser after initialization and you’ll get the output back after parsing as an additional namespace.