gfn.gym.helpers.box_utils

This file contains utilitary functions for the Box environment.

Module Contents

Classes

BoxPBEstimator

Estimator for P_B for the Box environment. Uses the QuarterCircle(northeastern=False) distribution

BoxPBNeuralNet

A deep neural network for the backward policy.

BoxPBUniform

A module to be used to create a uniform PB distribution for the Box environment

BoxPFEstimator

Estimator for P_F for the Box environment. Uses the BoxForwardDist distribution.

BoxPFNeuralNet

A deep neural network for the forward policy.

BoxStateFlowModule

A deep neural network for the state flow function.

DistributionWrapper

QuarterCircle

Represents distributions on quarter circles (or parts thereof), either the northeastern

QuarterCircleWithExit

Extends the previous QuarterCircle distribution by considering an extra parameter, called

QuarterDisk

Represents a distribution on the northeastern quarter disk centered at (0, 0) of maximal radius delta.

Functions

split_PF_module_output(output, n_comp_max)

Splits the module output into the expected parameter sets.

Attributes

CLAMP

PI_2

PI_2_INV

class gfn.gym.helpers.box_utils.BoxPBEstimator(env, module, n_components, min_concentration=0.1, max_concentration=2.0)

Bases: gfn.modules.GFNModule

Estimator for P_B for the Box environment. Uses the QuarterCircle(northeastern=False) distribution

Parameters
  • env (gfn.gym.Box) –

  • module (torch.nn.Module) –

  • n_components (int) –

  • min_concentration (float) –

  • max_concentration (float) –

expected_output_dim()

Expected output dimension of the module.

Return type

int

to_probability_distribution(states, module_output)

Transform the output of the module into a probability distribution.

The kwargs modify a base distribution, for example to encourage exploration.

Not all modules must implement this method, but it is required to define a policy from a module’s outputs. See DiscretePolicyEstimator for an example using a categorical distribution, but note this can be done for all continuous distributions as well.

Parameters
  • states (gfn.states.States) –

  • module_output (torchtyping.TensorType[batch_shape, output_dim, float]) –

Return type

torch.distributions.Distribution

class gfn.gym.helpers.box_utils.BoxPBNeuralNet(hidden_dim, n_hidden_layers, n_components, **kwargs)

Bases: gfn.utils.NeuralNet

A deep neural network for the backward policy.

Parameters
  • hidden_dim (int) –

  • n_hidden_layers (int) –

  • n_components (int) –

n_components

the number of components for each distribution parameter.

forward(preprocessed_states)

Forward method for the neural network.

Parameters

preprocessed_states (torchtyping.TensorType[batch_shape, 2, float]) – a batch of states appropriately preprocessed for ingestion by the MLP.

Return type

torchtyping.TensorType[batch_shape, 3 * n_components]

Returns: out, a set of continuous variables.

class gfn.gym.helpers.box_utils.BoxPBUniform

Bases: torch.nn.Module

A module to be used to create a uniform PB distribution for the Box environment

A module that returns (1, 1, 1) for all states. Used with QuarterCircle, it leads to a uniform distribution over parents in the south-western part of circle.

input_dim = 2
forward(preprocessed_states)
Parameters

preprocessed_states (torchtyping.TensorType[batch_shape, 2, float]) –

Return type

torchtyping.TensorType[batch_shape, 3]

class gfn.gym.helpers.box_utils.BoxPFEstimator(env, module, n_components_s0, n_components, min_concentration=0.1, max_concentration=2.0)

Bases: gfn.modules.GFNModule

Estimator for P_F for the Box environment. Uses the BoxForwardDist distribution.

Parameters
  • env (gfn.gym.Box) –

  • module (torch.nn.Module) –

  • n_components_s0 (int) –

  • n_components (int) –

  • min_concentration (float) –

  • max_concentration (float) –

expected_output_dim()

Expected output dimension of the module.

Return type

int

to_probability_distribution(states, module_output)

Transform the output of the module into a probability distribution.

The kwargs modify a base distribution, for example to encourage exploration.

Not all modules must implement this method, but it is required to define a policy from a module’s outputs. See DiscretePolicyEstimator for an example using a categorical distribution, but note this can be done for all continuous distributions as well.

Parameters
  • states (gfn.states.States) –

  • module_output (torchtyping.TensorType[batch_shape, output_dim, float]) –

Return type

torch.distributions.Distribution

class gfn.gym.helpers.box_utils.BoxPFNeuralNet(hidden_dim, n_hidden_layers, n_components_s0, n_components, **kwargs)

Bases: gfn.utils.NeuralNet

A deep neural network for the forward policy.

Parameters
  • hidden_dim (int) –

  • n_hidden_layers (int) –

  • n_components_s0 (int) –

  • n_components (int) –

n_components_s0

the number of components for each s=0 distribution parameter.

n_components

the number of components for each s=t>0 distribution parameter.

PFs0

the parameters for the s=0 distribution.

forward(preprocessed_states)

Forward method for the neural network.

Parameters

preprocessed_states (torchtyping.TensorType[batch_shape, 2, float]) – a batch of states appropriately preprocessed for ingestion by the MLP.

Return type

torchtyping.TensorType[batch_shape, 1 + 5 * n_components]

Returns: out, a set of continuous variables.

class gfn.gym.helpers.box_utils.BoxStateFlowModule(logZ_value, **kwargs)

Bases: gfn.utils.NeuralNet

A deep neural network for the state flow function.

Parameters

logZ_value (torch.Tensor) –

forward(preprocessed_states)

Forward method for the neural network.

Parameters

preprocessed_states (torchtyping.TensorType[batch_shape, input_dim, float]) – a batch of states appropriately preprocessed for ingestion by the MLP.

Return type

torchtyping.TensorType[batch_shape, output_dim, float]

Returns: out, a set of continuous variables.

gfn.gym.helpers.box_utils.CLAMP
class gfn.gym.helpers.box_utils.DistributionWrapper(states, delta, epsilon, mixture_logits, alpha_r, beta_r, alpha_theta, beta_theta, exit_probability, n_components, n_components_s0)

Bases: torch.distributions.Distribution

Parameters
log_prob(sampled_actions)
sample(sample_shape=())
gfn.gym.helpers.box_utils.PI_2
gfn.gym.helpers.box_utils.PI_2_INV
class gfn.gym.helpers.box_utils.QuarterCircle(delta, northeastern, centers, mixture_logits, alpha, beta)

Bases: torch.distributions.Distribution

Represents distributions on quarter circles (or parts thereof), either the northeastern ones or the southwestern ones, centered at a point in (0, 1)^2. The distributions are Mixture of Beta distributions on the possible angle range.

When a state is of norm <= delta, and northeastern=False, then the distribution is a Dirac at the state (i.e. the only possible parent is s_0).

Adapted from https://github.com/saleml/continuous-gfn/blob/master/sampling.py

This is useful for the Box environment.

Parameters
  • delta (float) –

  • northeastern (bool) –

  • centers (torchtyping.TensorType[n_states, 2]) –

  • mixture_logits (torchtyping.TensorType[n_states, n_components]) –

  • alpha (torchtyping.TensorType[n_states, n_components]) –

  • beta (torchtyping.TensorType[n_states, n_components]) –

get_min_and_max_angles()
Return type

Tuple[torchtyping.TensorType[n_states], torchtyping.TensorType[n_states]]

log_prob(sampled_actions)
Parameters

sampled_actions (torchtyping.TensorType[batch_size, 2]) –

Return type

torchtyping.TensorType[batch_size]

sample(sample_shape=torch.Size())
Parameters

sample_shape (torch.Size) –

Return type

torchtyping.TensorType[QuarterCircle.sample.sample_shape, 2]

class gfn.gym.helpers.box_utils.QuarterCircleWithExit(delta, centers, exit_probability, mixture_logits, alpha, beta, epsilon=0.0001)

Bases: torch.distributions.Distribution

Extends the previous QuarterCircle distribution by considering an extra parameter, called exit_probability of shape (n_states,). When sampling, then with probability exit_probability, the exit_action [-inf, -inf] is sampled. The log_prob function needs to change accordingly

Parameters
  • delta (float) –

  • centers (torchtyping.TensorType[n_states, 2]) –

  • exit_probability (torchtyping.TensorType[n_states]) –

  • mixture_logits (torchtyping.TensorType[n_states, n_components]) –

  • alpha (torchtyping.TensorType[n_states, n_components]) –

  • beta (torchtyping.TensorType[n_states, n_components]) –

  • epsilon (float) –

log_prob(sampled_actions)
sample(sample_shape=())
class gfn.gym.helpers.box_utils.QuarterDisk(delta, mixture_logits, alpha_r, beta_r, alpha_theta, beta_theta)

Bases: torch.distributions.Distribution

Represents a distribution on the northeastern quarter disk centered at (0, 0) of maximal radius delta. The radius and the angle follow Mixture of Betas distributions.

Adapted from https://github.com/saleml/continuous-gfn/blob/master/sampling.py

This is useful for the Box environment

Parameters
  • delta (float) –

  • mixture_logits (torchtyping.TensorType[n_components]) –

  • alpha_r (torchtyping.TensorType[n_components]) –

  • beta_r (torchtyping.TensorType[n_components]) –

  • alpha_theta (torchtyping.TensorType[n_components]) –

  • beta_theta (torchtyping.TensorType[n_components]) –

log_prob(sampled_actions)
Parameters

sampled_actions (torchtyping.TensorType[batch_size, 2]) –

Return type

torchtyping.TensorType[batch_size]

sample(sample_shape=torch.Size())
Parameters

sample_shape (torch.Size) –

Return type

torchtyping.TensorType[QuarterDisk.sample.sample_shape, 2]

gfn.gym.helpers.box_utils.split_PF_module_output(output, n_comp_max)

Splits the module output into the expected parameter sets.

Parameters
  • output (torchtyping.TensorType[batch_shape, output_dim, float]) – the module_output from the P_F model.

  • n_comp_max (int) – the larger number of the two n_components and n_components_s0.

Returns

A probability unique to QuarterCircleWithExit. mixture_logits: Parameters shared by QuarterDisk and QuarterCircleWithExit. alpha_r: Parameters shared by QuarterDisk and QuarterCircleWithExit. beta_r: Parameters shared by QuarterDisk and QuarterCircleWithExit. alpha_theta: Parameters unique to QuarterDisk. beta_theta: Parameters unique to QuarterDisk.

Return type

exit_probability