API Reference

megastep

core

megastep.core.gamma_encode(x)[source]

Converts RGB data to viewable values.

megastep.core.gamma_decode(x)[source]

Converts RGB data to interpolatable values.

class megastep.core.Core(scenery, res=64, fov=130, fps=10)[source]

The core rendering and physics interface.

To create the Core, you pass a Scenery that describes the environment. Once created, the tensors hanging off of the Core give the state of the world, and that state can be advanced with the functions in cuda.

Variables
  • agents – A Agents object describing the agents.

  • scenery – A Scenery object describing the scene.

  • progress – A (n_env, n_agent)-tensor giving how far the agent was able to move in the previous timestep as a fraction of its intended movement, before running into an obstable. A value less than 1 means the agent collided with something. Useful for detecting collisions.

  • n_envs – Number of environments. Same as the number of geometries passed in.

  • n_agents – Number of agents.

  • res – The horizontal resolution of observations.

  • fov – The field of view in degrees.

  • agent_radius – The radius of a disc containing the agent, in meters.

  • fps – The framerate.

  • random – The seeded numpy.random.RandomState used to initialize the environment. By reusing this in any extra random decisions made when generating the environments, it can be guaranteed you’ll get the same environments every time.

Parameters
  • scenery (Scenery.) – Describes the static parts of the environment.

  • n_agents (int) – the number of agents to put in each environment. Defaults to 1.

  • res (int) – The horizontal resolution of the observations. The resolution must be less than 1024, as that’s the maximum number of CUDA threads in a block. Defaults to 64 pixels.

  • fov – The field of view in degrees. Must be less than 180° due to how frames are rendered. Defaults to 130°.

  • fps (int) – The simulation frame rate/step rate. Defaults to 10.

state(e)[source]

Returns a dotdict tree representing the state of environment e.

A typical state looks like this:

arrdict:
n_envs          1
n_agents        4
res             512
fov             60
agent_radius    0.10606601717798211
fps             10
scenery           arrdict:
                model       Tensor((8, 2, 2), torch.float32)
                lines       Tensor((307, 2, 2), torch.float32)
                lights      Tensor((21, 3), torch.float32)
                textures    <megastepcuda.Ragged2D object at 0x7fba34112eb0>
                baked       <megastepcuda.Ragged1D object at 0x7fba34112670>
agents          arrdict:
                angles       Tensor((4,), torch.float32)
                positions    Tensor((4, 2), torch.float32)
progress        Tensor((4,), torch.float32)

This state tree is usually passed onto a Plotting function.

classmethod plot_state(state, ax=None, zoom=False)[source]
env_full(x)[source]

Returns a (n_envs,)-tensor on the environment’s device full of x.

This isn’t strictly required by the Core, but you find yourself making these vectors so often it’s useful sugar.

agent_full(x)[source]

Returns a (n_envs, n_agents)-tensor on the environment’s device full of x.

This isn’t strictly required by the Core, but you find yourself making these vectors so often it’s useful sugar.

cuda

This module contains all the rendering and physics CUDA kernels, and intended to operate on the state tensors held by Core.

Internals

This module is dynamically compiled upon import of megastep by _cuda().

The best explanation of how the bridge between CUDA and Python works is the PyTorch C++ extension tutorial .

In short though, this is a PyBind module. You can find the PyBind wrappers in wrappers.cpp, and the actual code they call into in common.h and kernels.cu.

I have very limited experience with distributing binaries, so while I’ve _tried_ to reference the library paths in a platform-independent way, there is a good chance they’ll turn out to be dependent after all. Submit an issue and explain a better way to me!

The libraries listed are - I believe - the minimal possible to allow megastep’s compilation. The default library set for PyTorch extensions is much larger and slower to compile.

megastep.cuda.initialize(agent_radius: float, res: int, fov: float, fps: float) → None[source]

Initializes the CUDA kernels by setting some global constants. The constants are then used by bake(), physics() and render().

Really, the existance of these constants is an indicator the whole CUDA side of things should be wrapped up in a class. But that’d make things a bit messier, and it’s a rare use-case that’ll have them set to different values in the same process.

megastep.cuda.bake(scenery: Scenery) → None[source]

Pre-computes the lighting for the static geometry, updating the Scenery.baked tensor.

For more details on how this works, see the rendering section.

Parameters

scenery (Scenery) – The scenery to compute the lighting for

megastep.cuda.physics(scenery: Scenery, agents: Agents) → Physics[source]

Advances the physics simulation, updating the Agents’s movement tensors based on their velocity and possible collisions. It also returns the progress tensor with how far the agents moved before colliding with something.

For more details on how this works, see the physics section.

Parameters
  • scenery (Scenery) – The scenery to reference when updating the agents

  • agents (Agents) – The agents to update the movement of

Returns

megastep.cuda.render(scenery: Scenery, agents: Agents) → Render[source]

Returns a rendering of the scenery onto the agents’ cameras.

For more details on how this works, see the rendering section.

Parameters
  • scenery (Scenery) – The scenery to reference when updating the agents

  • agents (Agents) – The agents to update the movement of

Return type

Render

class megastep.cuda.Render[source]

The result of a render() call, showing the scenery from the agents’ points of view.

See the rendering section for a discussion of this class’s place in megastep.

Rendering is done by casting ‘rays’ from the camera, through each pixel and out into the world. When a ray intersects a line from Scenery.lines, that’s called a ‘hit’.

property distances[source]

A (n_envs, n_agents, res)-tensor giving the distance from the camera to the hit in meters.

property dots[source]

A (n_envs, n_agents, res)-tensor giving the dot product between the ray and the line it hit.

property indices[source]

A (n_envs, n_agents, res)-tensor giving the index into Scenery.lines of the hit. Rays which don’t hit anything get a -1.

property locations[source]

A (n_envs, n_agents, res)-tensor giving the location along each line that the hit occurs. A zero means it happened at the first endpoint; a one means it happened at the second.

property screen[source]

A (n_envs, n_agents, res, 3)-tensor giving the views of each agent. Colours are RGB with values between 0 and 1. Infinity is coloured black.

class megastep.cuda.Agents[source]

Holds the state of the agents. Typically accessed through agents.

See the agents section for a discussion of this class’s place in megastep.

property angles[source]

An (n_env, n_agent)-tensor of agents’ angles relative to the positive x axis, given in degrees.

property angvelocity[source]

An (n_env, n_agent)-tensor of agents’ angular velocity, in degrees per second.

property positions[source]

An (n_env, n_agent, 2)-tensor of agents’ positions, in meters.

state()[source]

Extracts the state for the e th scene, returning it as a dotdict.

property velocity[source]

An (n_env, n_agent, 2)-tensor of agents’ velocity, in meters per second.

class megastep.cuda.Scenery[source]

Holds the state of the scenery. Typically accessed through scenery.

See the scenery section for a discussion of this class’s place in megastep.

property baked[source]

An (n_texels,)-Ragged tensor giving the bake()-d illumination of each texel.

property lights[source]
An (n_lights, 3)-tensor giving the locations of the lights in the first two columns, and their intensities

in the third.

property lines[source]

An (n_lines, 2, 2)-Ragged tensor giving the lines in each scenery.

property model[source]

An (n_model_line, 2, 2)-tensor giving the model - the set of lines - that make up the agent. This will be shifted and rotated according to the Agents angles and positions, then rendered into the scenery.

property n_agents[source]

The number of agents in each environment

state()[source]

Extracts the state for the e th scene, returning it as a arrdict.

property textures[source]

An (n_texels, 3)-Ragged tensor giving the texels in each line.

cubicasa

megastep.cubicasa.sample(n_geometries, split='training', seed=1)[source]

Returns a random sample of cubicasa geometries.

If you pass the same arguments, you’ll get the same sample every time.

There are 4,992 unique geometries, split into a 4,492-geometry training set and a 500-geometry test set.

Caching

The geometries are derived from the Cubicasa5k dataset.

The first time you call this function, it’ll fetch and cache a ~10MB precomputed geometries file. This is far easier to work with than the full 5GB Cuibcasa5k dataset. If you want to recompute the geometries from scratch however, import this module and try calling

svg_data(regenerate=True) 
geometry_data(regenerate=True)

Parameters

Parameters
  • n_designs (int) – The number of geometries to return

  • split (str) – Whether to return a sample from the training set, the test set, or all . The split is 90/10 in favour of the training set. Defaults to training .

  • seed – The seed to use when allocating the training and test sets.

Returns

A list of geometries.

geometry

megastep.geometry.cyclic_pairs(xs)[source]

Returns pairs (xs[i], xs[i+1]), wrapping the last pair round to the start.

megastep.geometry.masks(walls, spaces, res=0.2)[source]

Generates a masking array from an array of walls and a list of spaces.

Parameters
  • walls – A (n_walls, 2, 2)-array giving the coordinates of the walls’ endpoints.

  • spaces – A list of spaces, each given as a coordinate array of the space’s vertices.

  • res – The resolution of the the masking array.

Returns

A masking array, with indices 1, 2, … for the spaces, 0 for free space, and -1 for walls.

megastep.geometry.centers(indices, shape, res)[source]

Converts mask (i, j) indices to the (x, y) coordinates of the ij th cell’s center.

Usually the shape and res arguments for this come directly from a geoemtry dotdict.

Parameters
  • indices – A (…, 2) array of indices into a masking array.

  • shape – A tuple-like giving the height and width of the masking array.

  • res – The resolution of the masking array.

Returns

A (…, 2) array of (x, y) coordinates

megastep.geometry.indices(coords, shape, res)[source]

Converts (x, y) coordinates to the (i, j) indices of the containing cell.

Usually the shape and res arguments for this come directly from a geoemtry dotdict.

Parameters
  • indices – A (…, 2) array of (x, y) coordinates.

  • shape – A tuple-like giving the height and width of the masking array.

  • res – The resolution of the masking array.

Returns

A (…, 2) array of integer (i, j) indices.

megastep.geometry.display(g)[source]

Visualize a geometry using matplotlib.

Supports visualizing partial geometries, that only have a subset of id/masks/walls/lights

modules

megastep.modules are chunks of functionality that often turn up in megastep environments.

megastep.modules.to_local_frame(angles, p)[source]

Converts a velocity vector in the global coordinate frame to one in the frame local to the agent

megastep.modules.to_global_frame(angles, p)[source]

Converts a velocity vector in the local coordinate frame of the agent to one in the global frame.

class megastep.modules.SimpleMovement(core, speed=10, ang_speed=180, n_agents=None)[source]

A simple movement system with no momentum.

There are seven actions in total:
  • do nothing

  • forward/backward a fixed distance

  • strafe left/right a fixed distance

  • turn left/right a fixed angle

Parameters
  • core – The Core used by the environment.

  • speed – The speed of the agent in its linear movements, in meters per second.

  • ang_speed – The speed of the agent in its rotational movements, in degrees per second.

N_agents

The number of agents to output actions for. This is usually taken from the core; it can be usefully overridden in multiagent environments.

Variables

space – The action space to present to the controlling network.

class megastep.modules.MomentumMovement(core, accel=5, ang_accel=180, decay=0.125, n_agents=None)[source]

A simple movement system with momentum.

There are seven actions in total:
  • do nothing

  • accelerate forward/backward

  • accelerate left/right

  • torque left/right

TODO: Make the decay per second rather than per timestep.

Parameters
  • core – The Core used by the environment.

  • accel – The acceleration of the agent in its linear movements, in meters per second squared.

  • ang_accel – The acceleration of the agent in its rotational movements, in degrees per second squared.

  • decay – The multiplicative decay of the agent’s velocity per timestep. 1 means Newtonian motion; 0 means the only velocity is that generated by this timestep’s acceleration.

N_agents

The number of agents to output actions for. This is usually taken from the core; it can be usefully overridden in multiagent environments.

Variables
  • space – The action space to present to the controlling network.

  • decay – The value of the decay parameter.

megastep.modules.unpack(d)[source]

Unpacks cuda datastructures into arrdicts with the same attributes.

megastep.modules.render(core)[source]

Calls render(), turns the output into an attrdict, then converts the screen attribute into the kind of (batch, channel, height, width) tensor expected by PyTorch convolution modules.

This is almost always what you want when rendering RGB observations.

megastep.modules.downsample(screen, subsample)[source]

Factors a render()’d screen tensor along its final width dimension, returning something with shape (…, width/subsample, subsample).

Typically you chase this call by aggregating over the trailing dimension in some way; either mean or min or max or [..., 0].

class megastep.modules.Depth(core, n_agents=None, subsample=1, max_depth=10)[source]

Generates depth observations.

Parameters
  • core – The Core used by the environment.

  • n_agents – The number of agents to generate observations for. This is usually taken from the core; it can be usefully overridden in multiagent environments.

  • subsample – How many horizontal pixels to average when generating the observations. For example, if the core is rendering at 256 pixels and subsample is 4, then 64-pixel observations will be returned. A higher subsampling rate makes for slower rendering, but smoother observations.

  • max_depth – The maximum depth, corresponding to a zero in the observation. Given in meters.

Variables
  • space – The observation space to present to the controlling network.

  • max_depth – The value of the max_depth parameter.

  • subsample – The value of the subsample parameter.

state(e=0)[source]

The state of the module in sub-env e, which is to say its last observation for e. Useful in plotting

class megastep.modules.RGB(core, n_agents=None, subsample=1)[source]

Generates RGB observations.

Parameters
  • core – The Core used by the environment.

  • n_agents – The number of agents to generate observations for. This is usually taken from the core; it can be usefully overridden in multiagent environments.

  • subsample – How many horizontal pixels to average when generating the observations. For example, if the core is rendering at 256 pixels and subsample is 4, then 64-pixel observations will be returned. A higher subsampling rate makes for slower rendering, but smoother observations.

Variables
  • space – The observation space to present to the controlling network.

  • subsample – The value of the subsample parameter.

state(e=0)[source]

The state of the module in sub-env e, which is to say its last observation for e. Useful in plotting

classmethod plot_state(state, axes=None)[source]

Plots the state of this module using imshow. Make sure to numpyify() the state before passing it here. Useful in plotting.

class megastep.modules.IMU(core, speed_scale=10.0, ang_scale=360.0, n_agents=None)[source]

Generate a linear-and-angular-velocity measurement. Kinda like a inertial measurement unit.

Parameters
  • core – The Core used by the environment.

  • n_agents – The number of agents to generate observations for. This is usually taken from the core; it can be usefully overridden in multiagent environments.

  • speed_scale – The scale of speeds to use, with this value corresponding to an observation of 1. Given in meters per second.

  • ang_scale – The scale of angular speeds to use, with this value corresponding to an observation of 1. Given in degrees per second.

Variables
  • space – The observation space to present to the controlling network.

  • speed_scale – The value of the speed_scale parameter.

  • ang_scale – The value of the ang_scale parameter.

megastep.modules.random_empty_positions(geometries, n_agents, n_points)[source]

Returns a tensor of randomly-selected empty points in each geometry.

The returned tensor is a (n_geometries, n_agents, n_points, 2)-float tensor, with the coordinates given in meters.

This is typcially used when you want to randomly move an agent to a new place, but finding an empty point at each timestep is too expensive. So instead this is used to generate n_points empty points in advance, and then when you need one you can choose from the pre-generated options.

class megastep.modules.RandomSpawns(geometries, core, n_spawns=100)[source]

Respawns agents to random empty locations in the geometry.

Parameters
  • geometries – The geometry to use when calculating the spawn locations. Should be a list with one for each environment.

  • core – The Core used by the environment.

  • n_spawns – The number of spawns to choose between for each agent. This is precomputed when the class is created, so that the respawns themselves are fast.

class megastep.modules.RandomLifespans(core, max_lifespan, min_lifespan=None)[source]

Tracks how many steps each agent has been alive for, and indicates when they exceed a randomly-chosen lifespan.

Lifespans are chosen randomly between min_lifespan and max_lifespan and re-rolled after each reset. This is useful when you want otherwise ‘synchronous’ environments to ‘mix’ so that you get a random distribution of behaviour in each batch, rather than one batch full of ‘early life experience’ and another of ‘late life experience’.

Parameters
  • core – The Core used by the environment.

  • max_lifespan (int) – The maximum lifespan.

  • min_lifespan (int) – The minimum lifespan; defaults to half of max_lifespan.

Variables
  • max_lifespan – Value of the max_lifespan parameter.

  • min_lifespan – Value of the min_lifespan parameter.

TODO: Test this now you’ve rewritten it.

state(e)[source]

Returns the state of this module on sub-env e. The state is a arrdict of the agents’ lifespans and max lifespans as (n_agent,)-tensors.

plotting

TODO-DOCS Plotting docs

megastep.plotting.imshow_arrays(arrs, transpose=False)[source]

Args: arrs: {name: A x C x H x W}

megastep.plotting.plot_images(arrs, axes=None, aspect=1, **kwargs)[source]
megastep.plotting.n_agent_texels(scenery)[source]
megastep.plotting.line_arrays(state)[source]
megastep.plotting.plot_lights(ax, state)[source]
megastep.plotting.extent(state, zoom, radius=5)[source]
megastep.plotting.plot_lines(ax, state, zoom=True)[source]
megastep.plotting.adjust_view(ax, state, zoom=True)[source]
megastep.plotting.plot_wedge(ax, pose, distance, fov, radians=False, **kwargs)[source]
megastep.plotting.plot_fov(ax, state, distance=1, field='agents')[source]
megastep.plotting.plot_poses(poses, ax=None, radians=True, color='C9', **kwargs)[source]

Not used directly here, but often useful for code using this module

ragged

class megastep.ragged.RaggedNumpy(vals, widths)[source]

A Ragged backed by numpy arrays.

Parameters
  • vals – a (V, …)-array of backing values.

  • widths – a (W,)-array of widths of each subarray in the ragged. The sum of the widths must equal V.

Variables
  • vals – a (V, …)-array of backing values.

  • widths – a (W,)-array of widths of each subarray in the ragged array.

  • starts – a (W,)-array of indices giving where each subarray starts in vals.

  • ends – an (W,)-array of indices giving where each subarray ends in vals.

  • inverse – an (V,)-array of indices giving the index of the subarray the corresponding element of vals is a part of.

torchify()[source]

Applies arrdict.torchify() to the backing arrays and returns a new cuda.Ragged$ND for them

megastep.ragged.Ragged(vals, widths)[source]

Returns a Ragged array or tensor.

If you pass numpy arrays as arguments, you’ll get back a RaggedNumpy object; if you pass Torch tensors, you’ll get back a cuda.Ragged$ND that’s backed by a C++ implementation and is OK to pass to the core.Core machinery.

Parameters
  • vals – a (V, …)-array/tensor of backing values.

  • widths – a (W,)-array/tensor of widths of each subarray in the ragged. The sum of the widths must equal V.

Variables
  • vals – a (V, …)-array/tensor of backing values.

  • widths – a (W,)-array/tensor of widths of each subarray in the ragged.

  • starts – a (W,)-array/tensor of indices giving where each subarray starts in vals.

  • ends – an (W,)-array/tensor of indices giving where each subarray ends in vals.

  • inverse – an (V,)-array/tensor of indices giving the index of the subarray the corresponding element of vals is a part of.

scene

TODO-DOCS Scene docs

megastep.scene.lengths(lines)[source]
megastep.scene.agent_model()[source]
megastep.scene.agent_colors()[source]
megastep.scene.resolutions(lines)[source]
megastep.scene.wall_pattern(n, l=0.5, random=<module 'numpy.random' from '/opt/conda/lib/python3.7/site-packages/numpy/random/__init__.py'>)[source]
megastep.scene.init_textures(agentlines, agentcolors, walls, random=<module 'numpy.random' from '/opt/conda/lib/python3.7/site-packages/numpy/random/__init__.py'>)[source]
megastep.scene.random_lights(lights, random=<module 'numpy.random' from '/opt/conda/lib/python3.7/site-packages/numpy/random/__init__.py'>)[source]
megastep.scene.scenery(geometries, n_agents=1, device='cuda', random=<module 'numpy.random' from '/opt/conda/lib/python3.7/site-packages/numpy/random/__init__.py'>)[source]
megastep.scene.display(scenery, e=0)[source]

spaces

TODO-DOCS Spaces docs

class megastep.spaces.MultiEmpty[source]
class megastep.spaces.MultiVector(n_agents, dim)[source]
class megastep.spaces.MultiImage(n_agents, C, H, W)[source]
class megastep.spaces.MultiConstant(n_agents)[source]
class megastep.spaces.MultiDiscrete(n_agents, n_actions)[source]

toys

megastep.toys.box(width=5)[source]

A geometry which is just a simple box, with one room and one light inside it.

megastep.toys.column(width=5, column_width=0.1)[source]

A geometry which is just a simple ‘column’ (aka small box), with one room around it

demo

TODO-DOCS Demo docs

class megastep.demo.Agent(env, width=256)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(world, sample=False, value=False, test=False)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

megastep.demo.train()[source]
megastep.demo.demo(run=- 1, length=None, test=True, N=None, env=None, agent=None, d=0)[source]

heads

TODO-DOCS Heads docs

class megastep.demo.heads.MultiVectorIntake(space, width)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class megastep.demo.heads.MultiImageIntake(space, width)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(obs, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class megastep.demo.heads.ConcatIntake(space, width)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

megastep.demo.heads.intake(space, width)[source]
class megastep.demo.heads.MultiDiscreteOutput(space, width)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

sample(logits, test=False)[source]
class megastep.demo.heads.DictOutput(space, width)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

sample(l)[source]
class megastep.demo.heads.ValueOutput(width)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

megastep.demo.heads.output(space, width)[source]

learning

TODO-DOCS learning docs

megastep.demo.learning.batch_indices(chunk, batch_size)[source]
megastep.demo.learning.gather(arr, indices)[source]
megastep.demo.learning.flatten(arr)[source]
megastep.demo.learning.assert_same_shape(ref, *arrs)[source]
megastep.demo.learning.deltas(value, reward, target, reset, gamma=0.99)[source]
megastep.demo.learning.present_value(dv, finals, reset, alpha)[source]
megastep.demo.learning.generalized_advantages(value, reward, v, reset, gamma, lambd=0.97)[source]
megastep.demo.learning.reward_to_go(reward, value, reset, gamma)[source]
megastep.demo.learning.v_trace(ratios, value, reward, reset, gamma, max_rho=1, max_c=1)[source]
megastep.demo.learning.v_trace_ref(ratios, value, reward, reset, gamma=0.99, max_rho=1, max_c=1)[source]
megastep.demo.learning.test_v_trace()[source]
megastep.demo.learning.test_v_trace_ref()[source]
megastep.demo.learning.test_v_trace_equivalent(R=100, T=10)[source]
megastep.demo.learning.test_reward_to_go()[source]
megastep.demo.learning.test_generalized_advantages()[source]

lstm

TODO-DOCS LSTM docs

class megastep.demo.lstm.Packer(reset)[source]
pack_data(x)[source]
pack_state(h)[source]
pack(x, h, c)[source]
unpack_data(xp)[source]
unpack_state(hp)[source]
unpack(xp, hcp)[source]
class megastep.demo.lstm.LSTM(d_model)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, reset)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

megastep.demo.lstm.test_packer()[source]

demo.envs

minimal

class megastep.demo.envs.minimal.Minimal(n_envs=1)[source]

A minimal environment, with a box env, depth observations and simple movement. A good foundation for building your own environments.

See the simple environment tutorial for details.

reset()[source]
step(decision)[source]
state(e=0)[source]
classmethod plot_state(state)[source]
display(e=0)[source]
class megastep.demo.envs.minimal.Agent(env, width=32)[source]

A minimal agent to go with the minimal environment.

See the simple environment tutorial for details.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(world)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

deathmatch

TODO-DOCS Deathmatch docs

explorer

TODO-DOCS Explorer docs

rebar

rebar helps with reinforcement. That’s why it’s called rebar! It’s a toolkit that has evolved as I’ve worked on RL projects.

Unlike the megastep module which is stable, documented and feature-complete, rebar is an unstable, undocumented work-in-progress. It’s in the megastep repo because megastep itself uses two of rebar’s most useful components: dotdict and arrdict, while the demo uses a whole lot more.

arrdict

class rebar.arrdict.arrdict(*args, **kwargs)[source]

An arrdict is an dotdict with extra support for array and tensor values.

arrdicts have a lot of unusual but extremely useful behaviours, which are documented in the dotdicts and arrdicts concept section .

rebar.arrdict.torchify(a)[source]

Converts an array or a dict of numpy arrays to CPU tensors.

If you’d like CUDA tensors, follow the tensor-ification .cuda() ; the attribute delegation built into dotdict s will do the rest.

Floats get mapped to 32-bit PyTorch floats; ints get mapped to 32-bit PyTorch ints. This is usually what you want in machine learning work.

rebar.arrdict.numpyify(tensors)[source]

Converts an array or a dict of tensors to numpy arrays.

rebar.arrdict.stack(x, *args, **kwargs)[source]

Stacks a sequence of arrays, tensors or dicts thereof.

For example,

>>> d = arrdict(a=1, b=np.array([1, 2]))
>>> stack([d, d, d])
arrdict:
a    ndarray((3,), int64)
b    ndarray((3, 2), int64)

Any *args or **kwargs will be forwarded to the np.stack or torch.stack call.

Python scalars are converted to numpy scalars, so - as in the example above - stacking floats will get you a 1D array.

rebar.arrdict.cat(x, *args, **kwargs)[source]

Concatenates a sequence of arrays, tensors or dicts thereof.

For example,

>>> d = arrdict(a=1, b=np.array([1, 2]))
>>> cat([d, d, d])
arrdict:
a    ndarray((3,), int64)
b    ndarray((6,), int64)

Any *args or **kwargs will be forwarded to the np.concatenate or torch.cat call.

Python scalars are converted to numpy scalars, so - as in the example above - concatenating floats will get you a 1D array.

dotdict

class rebar.dotdict.dotdict[source]

dotdicts are dictionaries with additional support for attribute (dot) access of their elements. dotdicts have a lot of unusual but extremely useful behaviours, which are documented in the dotdicts and arrdicts concept section .

copy()[source]

Shallow-copy the dotdict

pipe(f, *args, **kwargs)[source]

Returns f(self, *args, **kwargs) .

>>> d = dotdict(a=1, b=2)
>>> d.pipe(list)
['a', 'b']

Useful for method-chaining.

map(f, *args, **kwargs)[source]

Applies f to the values of the dotdict, returning a matching dotdict of the results. *args and **kwargs are passed as extra arguments to each call.

>>> d = dotdict(a=1, b=2)
>>> d.map(int.__add__, 10)
dotdict:
a    11
b    12

Useful for method-chaining. Works equally well on trees of dotdicts.

See mapping() for a functional version of this method.

starmap(f, *args, **kwargs)[source]

Applies f to the values of the dotdicts one key at a time, returning a matching dotdict of the results.

>>> d = dotdict(a=1, b=2)
>>> d.starmap(int.__add__, d)
dotdict:
a    2
b    4

Useful for method-chaining. Works equally well on trees of dotdicts.

See starmapping() for a functional version of this method.

rebar.dotdict.mapping(f)[source]

Wraps f so that when called on a dotdict, f instead gets called on the dotdict’s values and a dotdict of the results is returned. Extra *args and **kwargs passed to the wrapper are passed as extra arguments to f .

>>> d = dotdict(a=1, b=2)
>>> m = mapping(int.__add__)
>>> m(d, 10)
dotdict:
a    11
b    12

Works equally well on trees of dotdicts, where f will be applied to the leaves of the tree.

Can be used as a decorator.

See dotdict.map() for an object-oriented version of this function.

rebar.dotdict.starmapping(f)[source]

Wraps f so that when called on a sequence of dotdicts, f instead gets called on the dotdict’s values and a dotdict of the results is returned.

>>> d = dotdict(a=1, b=2)
>>> m = starmapping(int.__add__)
>>> m(d, d)
dotdict:
a    2
b    4

Works equally well on trees of dotdicts, where f will be applied to the leaves of the trees.

Can be used as a decorator.

See dotdict.starmap() for an object-oriented version of this function.

rebar.dotdict.leaves(t)[source]

Returns the leaves of a tree of dotdicts as a list

recording

class rebar.recording.Encoder(fps=20)[source]

A context manager for encoding frames of video. Usually you’ll want to use ParallelEncoder instead.

Typically used as

with Encoder() as encoder:
    # Call it with each frame in turn.
    for frame in frames:
        encoder(frame)

# Now write it out.
with open('test.mp4', 'b') as f:
    f.write(encoder.value)

In this example, frame is a (H, W, 1 or 3)-dim numpy array, or a matplotlib figure.

This follows the PyAV cookbook.

class rebar.recording.ParallelEncoder(f, fps=20, N=None)[source]

A context manager for encoding frames of video in parallel. Typically used as

with ParallelEncoder(f) as encoder:
    for x in xs:
        encoder(x)
encoder.notebook()  # to display the video in your notebook
encoder.save(path)  # to save the video

In this example, f is a function that takes some arguments and returns a (H, W, 1 or 3)-dim numpy array, or a matplotlib figure. Whatever you call encoder with will be forwarded to f in a separate process, and the resulting array will be brought back to this process for encoding.

This aligns with the common scenario where generating each frame with matplotlib is much slower than actually getting the arguments needed to do the generation, or doing the encoding itself.

Parameters
  • fps (int) – The framerate. Defaults to 20.

  • N (int, float) – The number of processes to use. Can be an integer or a float indicating the fraction of CPUs to use. Defaults to using 1/2 the CPUs.

result()[source]
notebook()[source]
save(path)[source]