Utility classes and functions for running and designing experiments.


evenly_distribute_count_into_bins(count: int, nbins: int) -> List[int]


Distribute a count into a number of bins.


  • count: A positive integer to be distributed, should be >= nbins.
  • nbins: The number of bins.


A list of positive integers which sum to count. These values will be as close to equal as possible (may differ by at most 1).


recursive_update(original: Union[Dict,], update: Union[Dict,])


Recursively updates original dictionary with entries form update dict.


  • original : Original dictionary to be updated.
  • update : Dictionary with additional or replacement entries.


Updated original dictionary.


class Builder(tuple,  Generic[ToBuildType])


Used to instantiate a given class with (default) parameters.

Helper class that stores a class, default parameters for that class, and key word arguments that (possibly) overwrite the defaults. When calling this an object of the Builder class it generates a class of type class_type with parameters specified by the attributes default and kwargs (and possibly additional, overwriting, keyword arguments).


  • class_type: The class to be instantiated when calling the object.
  • kwargs: Keyword arguments used to instantiate an object of type class_type.
  • default: Default parameters used when instantiating the class.


 | __new__(cls, class_type: ToBuildType, kwargs: Optional[Dict[str, Any]] = None, default: Optional[Dict[str, Any]] = None)


Create a new Builder.

For parameter descriptions see the class documentation. Note that kwargs and default can be None in which case they are set to be empty dictionaries.


 | __call__(**kwargs) -> ToBuildType


Build and return a new class.


  • kwargs : additional keyword arguments to use when instantiating the object. These overwrite all arguments already in the self.kwargs and self.default attributes.


Class of type self.class_type with parameters taken from self.default, self.kwargs, and any keyword arguments additionally passed to __call__.


class ScalarMeanTracker(object)


Track a collection scalar key -> mean pairs.


 | add_scalars(scalars: Dict[str, Union[float, int]], n: Union[int, Dict[str, int]] = 1) -> None


Add additional scalars to track.


  • scalars : A dictionary of scalar key -> value pairs.


 | pop_and_reset() -> Dict[str, float]


Return tracked means and reset.

On resetting all previously tracked values are discarded.


A dictionary of scalar key -> current mean pairs corresponding to those values added with add_scalars.


class LoggingPackage(object)


Data package used for logging.


class LinearDecay(object)


Linearly decay between two values over some number of steps.

Obtain the value corresponding to the i-th step by calling an instance of this class with the value i.


  • steps : The number of steps over which to decay.
  • startp : The starting value.
  • endp : The ending value.


 | __init__(steps: int, startp: float = 1.0, endp: float = 0.0) -> None



See class documentation for parameter definitions.


 | __call__(epoch: int) -> float


Get the decayed value for epoch number of steps.


  • epoch : The number of steps.


Decayed value for epoch number of steps.


class MultiLinearDecay(object)


Container for multiple stages of LinearDecay.

Obtain the value corresponding to the i-th step by calling an instance of this class with the value i.


  • stages: List of LinearDecay objects to be sequentially applied for the number of steps in each stage.


 | __init__(stages: Sequence[LinearDecay]) -> None



See class documentation for parameter definitions.


 | __call__(epoch: int) -> float


Get the decayed value factor for epoch number of steps.


  • epoch : The number of steps.


Decayed value for epoch number of steps.


set_deterministic_cudnn() -> None


Makes cudnn deterministic.

This may slow down computations.


set_seed(seed: Optional[int] = None) -> None


Set seeds for multiple (cpu) sources of randomness.

Sets seeds for (cpu) pytorch, base random, and numpy.


  • seed : The seed to set. If set to None, keep using the current seed.


class EarlyStoppingCriterion(abc.ABC)


Abstract class for class who determines if training should stop early in a particular pipeline stage.


 | @abc.abstractmethod
 | __call__(stage_steps: int, total_steps: int, training_metrics: ScalarMeanTracker) -> bool


Returns True if training should be stopped early.


  • stage_steps: Total number of steps taken in the current pipeline stage.
  • total_steps: Total number of steps taken during training so far (includes steps taken in prior pipeline stages).
  • training_metrics: Metrics recovered over some fixed number of steps (see the metric_accumulate_interval attribute in the TrainingPipeline class) training.


class NeverEarlyStoppingCriterion(EarlyStoppingCriterion)


Implementation of EarlyStoppingCriterion which never stops early.


class OffPolicyPipelineComponent(NamedTuple)


An off-policy component for a PipeLineStage.


  • data_iterator_builder: A function to instantiate a Data Iterator (with a next(self) method)
  • loss_names: list of unique names assigned to off-policy losses
  • updates: number of off-policy updates between on-policy rollout collections
  • loss_weights: A list of floating point numbers describing the relative weights applied to the losses referenced by loss_names. Should be the same length as loss_names. If this is None, all weights will be assumed to be one.
  • data_iterator_kwargs_generator: Optional generator of keyword arguments for data_iterator_builder (useful for distributed training. It takes a cur_worker int value, a rollouts_per_worker list of number of samplers per training worker, and an optional random seed shared by all workers, which can be None.


class TrainingSettings(object)


Class defining parameters used for training (within a stage or the entire pipeline).


  • num_mini_batch: The number of mini-batches to break a rollout into.
  • update_repeats: The number of times we will cycle through the mini-batches corresponding to a single rollout doing gradient updates.
  • max_grad_norm: The maximum "inf" norm of any gradient step (gradients are clipped to not exceed this).
  • num_steps: Total number of steps a single agent takes in a rollout.
  • gamma: Discount factor applied to rewards (should be in [0, 1]).
  • use_gae: Whether or not to use generalized advantage estimation (GAE).
  • gae_lambda: The additional parameter used in GAE.
  • advance_scene_rollout_period: Optional number of rollouts before enforcing an advance scene in all samplers.
  • save_interval: The frequency with which to save (in total agent steps taken). If None then no checkpoints will be saved. Otherwise, in addition to the checkpoints being saved every save_interval steps, a checkpoint will always be saved at the end of each pipeline stage. If save_interval <= 0 then checkpoints will only be saved at the end of each pipeline stage.
  • metric_accumulate_interval: The frequency with which training/validation metrics are accumulated (in total agent steps). Metrics accumulated in an interval are logged (if should_log is True) and used by the stage's early stopping criterion (if any).


class PipelineStage(TrainingSettings)


A single stage in a training pipeline, possibly including overrides to the global TrainingSettings in TrainingPipeline.


  • loss_name: A collection of unique names assigned to losses. These will reference the Loss objects in a TrainingPipeline instance.
  • max_stage_steps: Either the total number of steps agents should take in this stage or a Callable object (e.g. a function)
  • loss_weights: A list of floating point numbers describing the relative weights applied to the losses referenced by loss_name. Should be the same length as loss_name. If this is None, all weights will be assumed to be one.
  • teacher_forcing: If applicable, defines the probability an agent will take the expert action (as opposed to its own sampled action) at a given time point.
  • early_stopping_criterion: An EarlyStoppingCriterion object which determines if training in this stage should be stopped early. If None then no early stopping occurs. If early_stopping_criterion is not None then we do not guarantee reproducibility when restarting a model from a checkpoint (as the EarlyStoppingCriterion object may store internal state which is not saved in the checkpoint). Currently AllenAct only supports using early stopping criterion when not using distributed training.
  • num_mini_batch: See docs for TrainingSettings.
  • update_repeats: See docs for TrainingSettings.
  • max_grad_norm: See docs for TrainingSettings.
  • num_steps: See docs for TrainingSettings.
  • gamma: See docs for TrainingSettings.
  • use_gae: See docs for TrainingSettings.
  • gae_lambda: See docs for TrainingSettings.
  • advance_scene_rollout_period: See docs for TrainingSettings.
  • save_interval: See docs for TrainingSettings.
  • metric_accumulate_interval: See docs for TrainingSettings.


class TrainingPipeline(TrainingSettings)


Class defining the stages (and global training settings) in a training pipeline.

The training pipeline can be used as an iterator to go through the pipeline stages in, for instance, a loop.


  • named_losses: Dictionary mapping a the name of a loss to either an instantiation of that loss or a Builder that, when called, will return that loss.
  • pipeline_stages: A list of PipelineStages. Each of these define how the agent will be trained and are executed sequentially.
  • optimizer_builder: Builder object to instantiate the optimizer to use during training.
  • num_mini_batch: See docs for TrainingSettings.
  • update_repeats: See docs for TrainingSettings.
  • max_grad_norm: See docs for TrainingSettings.
  • num_steps: See docs for TrainingSettings.
  • gamma: See docs for TrainingSettings.
  • use_gae: See docs for TrainingSettings.
  • gae_lambda: See docs for TrainingSettings.
  • advance_scene_rollout_period: See docs for TrainingSettings.
  • save_interval: See docs for TrainingSettings.
  • metric_accumulate_interval: See docs for TrainingSettings.
  • should_log: True if metrics accumulated during training should be logged to the console as well as to a tensorboard file.
  • lr_scheduler_builder: Optional builder object to instantiate the learning rate scheduler used through the pipeline.


 | __init__(named_losses: Dict[str, Union[Loss, Builder[Loss]]], pipeline_stages: List[PipelineStage], optimizer_builder: Builder[optim.Optimizer], num_mini_batch: int, update_repeats: Optional[int], max_grad_norm: float, num_steps: int, gamma: float, use_gae: bool, gae_lambda: float, advance_scene_rollout_period: Optional[int], save_interval: Optional[int], metric_accumulate_interval: int, should_log: bool = True, lr_scheduler_builder: Optional[Builder[optim.lr_scheduler._LRScheduler]] = None)



See class docstring for parameter definitions.