xdrl.trainer_hooks.logging#

Classes#

LoggingCollectionMetricsHook

Log collection metrics in the collection/ namespace.

LoggingTrainingMetricsHook

Log reduced optimization metrics under the train/ namespace.

LoggingCountersHook

Track frame and iteration counters in the counters/ namespace.

LoggingProgressMetricsHook

Emit a compact progress-bar view for reward and frame counters.

LoggingEvaluationMetricsHook

Run periodic evaluation and log metrics under eval/<subgroup>/.

LoggingEvaluationHookSet

Compose deterministic and non-deterministic evaluation hooks.

LoggingHookSet

Compose the default xdrl logging hooks.

WandbFinishHook

Optional shutdown hook for config-driven Weights & Biases cleanup.

WandbFlushHook

Flush pending W&B scalar rows emitted through TorchRL's scalar logger.

Functions#

_as_float(value)

_min_mean_max(prefix, value)

_summarize_metric(prefix, value, *[, reduce_stats])

_collector_mask(batch)

_scalarize_last_dim(value, weights)

_prepare_episode_reward(value, done, *, weights)

Module Contents#

xdrl.trainer_hooks.logging._as_float(value)[source]#
Parameters:

value (torch.Tensor)

Return type:

float

xdrl.trainer_hooks.logging._min_mean_max(prefix, value)[source]#
Parameters:
  • prefix (str)

  • value (torch.Tensor)

Return type:

dict[str, float]

xdrl.trainer_hooks.logging._summarize_metric(prefix, value, *, reduce_stats=None)[source]#
Parameters:
  • prefix (str)

  • value (torch.Tensor)

  • reduce_stats (bool | None)

Return type:

dict[str, float]

xdrl.trainer_hooks.logging._collector_mask(batch)[source]#
Parameters:

batch (tensordict.TensorDictBase)

Return type:

torch.Tensor | None

xdrl.trainer_hooks.logging._scalarize_last_dim(value, weights)[source]#
Parameters:
  • value (torch.Tensor)

  • weights (tuple[float, Ellipsis] | None)

Return type:

torch.Tensor

xdrl.trainer_hooks.logging._prepare_episode_reward(value, done, *, weights)[source]#
Parameters:
  • value (torch.Tensor)

  • done (torch.Tensor)

  • weights (tuple[float, Ellipsis] | None)

Return type:

torch.Tensor

class xdrl.trainer_hooks.logging.LoggingCollectionMetricsHook(group='agents', reward_key=None, done_key=('next', 'done'), episode_reward_key=None, episode_reward_weights=None, reduce_stats=None)[source]#

Bases: torchrl.trainers.trainers.TrainerHookBase

Log collection metrics in the collection/ namespace.

The hook reads reward, done, and optional episode-reward tensors from a collected TensorDict. Vector-valued rewards can be scalarized with explicit weights, which is useful for MO-Gymnasium and other multi-objective runs.

Parameters:
  • group (str)

  • reward_key (tuple[str, Ellipsis] | None)

  • done_key (tuple[str, Ellipsis])

  • episode_reward_key (tuple[str, Ellipsis] | None)

  • episode_reward_weights (collections.abc.Sequence[float] | None)

  • reduce_stats (bool | None)

group = 'agents'[source]#
reward_key = None[source]#
done_key = ('next', 'done')[source]#
episode_reward_key = None[source]#
episode_reward_weights[source]#
reduce_stats = None[source]#
__call__(batch)[source]#
Parameters:

batch (tensordict.TensorDictBase)

Return type:

dict[str, float]

register(trainer, name='logging_collection_metrics')[source]#

Registers the hook in the trainer at a default location.

Parameters:
  • trainer (Trainer) – the trainer where the hook must be registered.

  • name (str) – the name of the hook.

Return type:

None

Note

To register the hook at another location than the default, use register_op().

state_dict()[source]#
Return type:

dict[str, Any]

load_state_dict(state_dict)[source]#
Parameters:

state_dict (dict[str, Any])

Return type:

None

class xdrl.trainer_hooks.logging.LoggingTrainingMetricsHook(group='agents')[source]#

Bases: torchrl.trainers.trainers.TrainerHookBase

Log reduced optimization metrics under the train/ namespace.

TorchRL loss modules may emit TensorDict metrics with vector values. This hook reduces tensors to scalars and mirrors them under train/<group>/ so logger dashboards and progress outputs use stable names.

Parameters:

group (str)

group = 'agents'[source]#
__call__(_optim_steps, average_losses)[source]#
Parameters:
  • _optim_steps (int)

  • average_losses (tensordict.TensorDictBase | None)

Return type:

dict[str, float]

register(trainer, name='logging_training_metrics')[source]#

Registers the hook in the trainer at a default location.

Parameters:
  • trainer (Trainer) – the trainer where the hook must be registered.

  • name (str) – the name of the hook.

Return type:

None

Note

To register the hook at another location than the default, use register_op().

state_dict()[source]#
Return type:

dict[str, Any]

load_state_dict(state_dict)[source]#
Parameters:

state_dict (dict[str, Any])

Return type:

None

class xdrl.trainer_hooks.logging.LoggingCountersHook(frame_skip=1)[source]#

Bases: torchrl.trainers.trainers.TrainerHookBase

Track frame and iteration counters in the counters/ namespace.

Parameters:

frame_skip (int)

frame_skip = 1[source]#
total_frames = 0[source]#
iteration = 0[source]#
_current_frames(batch)[source]#
Parameters:

batch (tensordict.TensorDictBase)

Return type:

int

__call__(batch)[source]#
Parameters:

batch (tensordict.TensorDictBase)

Return type:

dict[str, int]

register(trainer, name='logging_counters')[source]#

Registers the hook in the trainer at a default location.

Parameters:
  • trainer (Trainer) – the trainer where the hook must be registered.

  • name (str) – the name of the hook.

Return type:

None

Note

To register the hook at another location than the default, use register_op().

state_dict()[source]#
Return type:

dict[str, Any]

load_state_dict(state_dict)[source]#
Parameters:

state_dict (dict[str, Any])

Return type:

None

class xdrl.trainer_hooks.logging.LoggingProgressMetricsHook(*, group, counters_hook, reward_key=None)[source]#

Bases: torchrl.trainers.trainers.TrainerHookBase

Emit a compact progress-bar view for reward and frame counters.

Parameters:
group[source]#
counters_hook[source]#
reward_key = None[source]#
__call__(batch)[source]#
Parameters:

batch (tensordict.TensorDictBase)

Return type:

dict[str, float | bool]

register(trainer, name='logging_progress_metrics')[source]#

Registers the hook in the trainer at a default location.

Parameters:
  • trainer (Trainer) – the trainer where the hook must be registered.

  • name (str) – the name of the hook.

Return type:

None

Note

To register the hook at another location than the default, use register_op().

state_dict()[source]#
Return type:

dict[str, Any]

load_state_dict(state_dict)[source]#
Parameters:

state_dict (dict[str, Any])

Return type:

None

class xdrl.trainer_hooks.logging.LoggingEvaluationMetricsHook(*, policy, environment, group, metric_subgroup, interval_frames, max_steps, deterministic, render, video_fps, render_kwargs=None, reward_key=None, reduce_stats=None, logger=None)[source]#

Bases: torchrl.trainers.trainers.TrainerHookBase

Run periodic evaluation and log metrics under eval/<subgroup>/.

Parameters:
  • policy (torch.nn.Module) – Policy module used during rollout.

  • environment – TorchRL environment with a rollout method.

  • group (str) – Agent/group namespace used for reward keys.

  • metric_subgroup (str) – Evaluation label such as "deterministic".

  • interval_frames (int) – Collected-frame interval between evaluations.

  • max_steps (int) – Maximum rollout length.

  • deterministic (bool) – Whether to force deterministic exploration.

  • render (bool) – Whether to capture rendered frames and log a video.

  • video_fps (int) – Video frame rate passed to the logger.

  • render_kwargs (dict[str, Any] | None) – Optional keyword arguments passed to environment render.

  • reward_key (tuple[str, Ellipsis] | None) – TensorDict key for rollout rewards.

  • reduce_stats (bool | None) – Whether vector metrics are reduced to min/mean/max.

  • logger (Any | None) – Optional logger used before the hook is registered on a trainer.

policy[source]#
environment[source]#
group[source]#
reward_key = None[source]#
reduce_stats = None[source]#
metric_subgroup[source]#
interval_frames[source]#
max_steps[source]#
deterministic[source]#
render[source]#
render_kwargs[source]#
video_fps[source]#
logger = None[source]#
trainer: torchrl.trainers.trainers.Trainer | None = None[source]#
static _to_frame_array(frame)[source]#
Parameters:

frame (Any)

Return type:

numpy.ndarray

_extract_render_frame(output)[source]#
Parameters:

output (Any)

Return type:

numpy.ndarray

_renderable_candidates()[source]#
Return type:

list[Any]

_render_frame()[source]#
Return type:

numpy.ndarray

_evaluate_once(step)[source]#
Parameters:

step (int)

Return type:

dict[str, float]

_log_direct(metrics, step)[source]#
Parameters:
  • metrics (dict[str, float])

  • step (int)

Return type:

None

run(*, step)[source]#
Parameters:

step (int)

Return type:

dict[str, float]

__call__(_batch)[source]#
Parameters:

_batch (tensordict.TensorDictBase)

Return type:

dict[str, float]

register(trainer, name='logging_evaluation_metrics')[source]#

Registers the hook in the trainer at a default location.

Parameters:
  • trainer (Trainer) – the trainer where the hook must be registered.

  • name (str) – the name of the hook.

Return type:

None

Note

To register the hook at another location than the default, use register_op().

close()[source]#
Return type:

None

state_dict()[source]#
Return type:

dict[str, Any]

load_state_dict(state_dict)[source]#
Parameters:

state_dict (dict[str, Any])

Return type:

None

class xdrl.trainer_hooks.logging.LoggingEvaluationHookSet(*, policy, environment, group, interval_frames, max_steps, deterministic, non_deterministic, render, video_fps, render_kwargs=None, reward_key=None, reduce_stats=None, logger=None)[source]#

Compose deterministic and non-deterministic evaluation hooks.

This wrapper is useful when the same environment and policy should be evaluated with both exploration settings on the same schedule.

Parameters:
  • policy (torch.nn.Module)

  • group (str)

  • interval_frames (int)

  • max_steps (int)

  • deterministic (bool)

  • non_deterministic (bool)

  • render (bool)

  • video_fps (int)

  • render_kwargs (dict[str, Any] | None)

  • reward_key (tuple[str, Ellipsis] | None)

  • reduce_stats (bool | None)

  • logger (Any | None)

hooks: list[LoggingEvaluationMetricsHook] = [][source]#
register(trainer, name='logging_evaluation_metrics')[source]#
Parameters:
  • trainer (torchrl.trainers.trainers.Trainer)

  • name (str)

Return type:

None

run(*, step)[source]#
Parameters:

step (int)

Return type:

dict[str, float]

close()[source]#
Return type:

None

class xdrl.trainer_hooks.logging.LoggingHookSet(*, group, frame_skip, reward_key=None, done_key=('next', 'done'), episode_reward_key=None, episode_reward_weights=None, reduce_stats=None, eval_hook_set=None)[source]#

Bases: torchrl.trainers.trainers.TrainerHookBase

Compose the default xdrl logging hooks.

The hook set registers collection metrics, training metrics, counters, progress metrics, timers, and optional evaluation hooks as a single object. This keeps Hydra configs concise while preserving independently testable hook components.

Parameters:
  • group (str)

  • frame_skip (int)

  • reward_key (tuple[str, Ellipsis] | None)

  • done_key (tuple[str, Ellipsis])

  • episode_reward_key (tuple[str, Ellipsis] | None)

  • episode_reward_weights (collections.abc.Sequence[float] | None)

  • reduce_stats (bool | None)

  • eval_hook_set (LoggingEvaluationHookSet | None)

group[source]#
collection_hook[source]#
training_hook[source]#
counters_hook[source]#
progress_hook[source]#
eval_hook_set = None[source]#
_iteration_start: float | None = None[source]#
_previous_iteration_end: float | None = None[source]#
_collection_time = 0.0[source]#
_total_time = 0.0[source]#
_timers_start(batch)[source]#
Parameters:

batch (tensordict.TensorDictBase)

Return type:

tensordict.TensorDictBase

_timers_end(_batch)[source]#
Parameters:

_batch (tensordict.TensorDictBase)

Return type:

dict[str, float]

register(trainer, name='logging_hooks')[source]#

Registers the hook in the trainer at a default location.

Parameters:
  • trainer (Trainer) – the trainer where the hook must be registered.

  • name (str) – the name of the hook.

Return type:

None

Note

To register the hook at another location than the default, use register_op().

run_pre_eval()[source]#
Return type:

dict[str, float]

close()[source]#
Return type:

None

state_dict()[source]#
Return type:

dict[str, Any]

load_state_dict(state_dict)[source]#
Parameters:

state_dict (dict[str, Any])

Return type:

None

class xdrl.trainer_hooks.logging.WandbFinishHook(enabled=True)[source]#

Bases: torchrl.trainers.trainers.TrainerHookBase

Optional shutdown hook for config-driven Weights & Biases cleanup.

The hook intentionally swallows import/runtime errors so offline or disabled W&B runs do not fail trainer shutdown.

Parameters:

enabled (bool)

enabled = True[source]#
__call__()[source]#
Return type:

None

register(trainer, name='wandb_finish')[source]#

Registers the hook in the trainer at a default location.

Parameters:
  • trainer (Trainer) – the trainer where the hook must be registered.

  • name (str) – the name of the hook.

Return type:

None

Note

To register the hook at another location than the default, use register_op().

state_dict()[source]#
Return type:

dict[str, Any]

load_state_dict(state_dict)[source]#
Parameters:

state_dict (dict[str, Any])

Return type:

None

class xdrl.trainer_hooks.logging.WandbFlushHook(enabled=True)[source]#

Bases: torchrl.trainers.trainers.TrainerHookBase

Flush pending W&B scalar rows emitted through TorchRL’s scalar logger.

TorchRL logs scalar metrics one by one, while its W&B logger defaults those calls to commit=False so metrics for the same step can be grouped. This hook commits the pending row after each trainer iteration and before W&B is finished, which makes metrics appear during long-running jobs.

Parameters:

enabled (bool)

enabled = True[source]#
trainer: torchrl.trainers.trainers.Trainer | None = None[source]#
_last_flushed_steps: tuple[tuple[str, int], Ellipsis] = ()[source]#
static _wandb_step_registry(logger)[source]#
Parameters:

logger (Any)

Return type:

tuple[tuple[str, int], Ellipsis]

static _wandb_experiment(logger)[source]#
Parameters:

logger (Any)

Return type:

Any | None

__call__(*_args, **_kwargs)[source]#
Parameters:
  • _args (Any)

  • _kwargs (Any)

Return type:

None

register(trainer, name='wandb_flush')[source]#

Registers the hook in the trainer at a default location.

Parameters:
  • trainer (Trainer) – the trainer where the hook must be registered.

  • name (str) – the name of the hook.

Return type:

None

Note

To register the hook at another location than the default, use register_op().

state_dict()[source]#
Return type:

dict[str, Any]

load_state_dict(state_dict)[source]#
Parameters:

state_dict (dict[str, Any])

Return type:

None