tmrl.actor module

class tmrl.actor.ActorModule(observation_space, action_space)[source]

Bases: ABC

Implement this interface for the RolloutWorker(s) to interact with your policy.

Note

If overidden, the __init()__ definition must at least take the two following arguments (args or kwargs): observation_space and action_space. When overriding __init__, don’t forget to call super().__init__ in the subclass.

Parameters:

observation_space (gymnasium.spaces.Space) – observation space (here for your convenience)
action_space (gymnasium.spaces.Space) – action space (here for your convenience)

abstract act(obs, test=False)[source]

Must compute an action from an observation.

Parameters:

obs (object) – the observation
test (bool) – True at test time, False otherwise

Returns:

the computed action

Return type:

numpy.array

load(path, device)[source]

Load and return an instance of your ActorModule from the hard drive.

This method loads your ActorModule from the binary file saved by your implementation of save

If not implemented, load defaults to returning this output of pickle.load(…). By default, the device argument is ignored (but you may want to use it in your implementation).

You need to override this method if your ActorModule is not picklable.

Note

You can use this function to load attributes and return self, or you can return a new instance.

Parameters:

path (pathlib.Path) – a filepath to load your ActorModule from
device – device to load relevant attributes to (e.g., “cpu” or “cuda:0”)

Returns:

An instance of your ActorModule

Return type:

ActorModule

save(path)[source]

Save your ActorModule on the hard drive.

If not implemented, save defaults to pickle.dump(obj=self, …).

You need to override this method if your ActorModule is not picklable.

Note

Everything needs to be saved into a single binary file. tmrl reads this file and transfers its content over network.

Parameters:: path (pathlib.Path) – a filepath to save your ActorModule to

to_device(device)[source]

Set the ActorModule’s relevant attributes to the designated device.

By default, this method is a no-op and returns self.

Parameters:: device – the device where to move relevant attributes (e.g., “cpu” or “cuda:0”)
Returns:: an ActorModule whose relevant attributes are moved to device (can be self)

class tmrl.actor.TorchActorModule(observation_space, action_space, device='cpu')[source]

Bases: ActorModule, Module, ABC

Partial implementation of ActorModule as a torch.nn.Module.

You can implement this instead of ActorModule when using PyTorch. TorchActorModule is a subclass of torch.nn.Module and may implement forward(). Typically, your implementation of act() can call forward() with gradients turned off.

When using TorchActorModule, the act method receives observations collated on device, with an additional dimension corresponding to the batch size.

Note

If overidden, the __init()__ definition must at least take the two following arguments (args or kwargs): observation_space and action_space. When overriding __init__, don’t forget to call super().__init__ in the subclass.

Parameters:

observation_space (gymnasium.spaces.Space) – observation space (here for your convenience)
action_space (gymnasium.spaces.Space) – action space (here for your convenience)
device – device where your model should live and where observations for act will be collated

load(path, device)[source]

Load and return an instance of your ActorModule from the hard drive.

This method loads your ActorModule from the binary file saved by your implementation of save

If not implemented, load defaults to returning this output of pickle.load(…). By default, the device argument is ignored (but you may want to use it in your implementation).

You need to override this method if your ActorModule is not picklable.

Note

You can use this function to load attributes and return self, or you can return a new instance.

Parameters:

path (pathlib.Path) – a filepath to load your ActorModule from
device – device to load relevant attributes to (e.g., “cpu” or “cuda:0”)

Returns:

An instance of your ActorModule

Return type:

ActorModule

save(path)[source]

Save your ActorModule on the hard drive.

If not implemented, save defaults to pickle.dump(obj=self, …).

You need to override this method if your ActorModule is not picklable.

Note

Everything needs to be saved into a single binary file. tmrl reads this file and transfers its content over network.

Parameters:: path (pathlib.Path) – a filepath to save your ActorModule to

to(device)[source]

Move and/or cast the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)[source]

to(dtype, non_blocking=False)[source]

to(tensor, non_blocking=False)[source]

to(memory_format=torch.channels_last)[source]

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtypes. In addition, this method will only cast the floating point or complex parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters:

device (torch.device) – the desired device of the parameters and buffers in this module
dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module
tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns:

self

Return type:

Module

Examples:

>>> # xdoctest: +IGNORE_WANT("non-deterministic")
>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

to_device(device)[source]

Set the ActorModule’s relevant attributes to the designated device.

By default, this method is a no-op and returns self.

Parameters:: device – the device where to move relevant attributes (e.g., “cpu” or “cuda:0”)
Returns:: an ActorModule whose relevant attributes are moved to device (can be self)