Graph Tensor

class molgraph.tensors.graph_tensor.GraphTensor(tensorflow.experimental.BatchableExtensionType)[source]

A custom tensor encoding a graph.

A (molecular) graph, encoded as a GraphTensor instance, could encode a single subgraph (single molecule) or multiple subgraphs (multiple molecules). Furthermore, the GraphTensor can either encode multiple molecules (molecular graphs) as a single disjoint graph (nested tf.Tensor values) or multiple subgraphs (nested tf.RaggedTensor values). It is recommended to encode a (molecular) graph as a disjoint graph as it is a significantly more efficient representation, both in terms of memory and runtime.

Note: every method that seemingly modifies the GraphTensor instance actually does not modify it. Instead, a new GraphTensor instance is constructed and returned by these methods. This is necessary to allow TF to properly track the GraphTensor instances. These methods include: propagate(), merge(), separate(), update(), remove(), etc.

Parameters

sizes (tf.Tensor) – A 1-D or 0-D tf.Tensor specifying the sizes of the subgraphs.
node_feature (tf.Tensor, tf.RaggedTensor) – A 2-D tf.Tensor or 3-D tf.RaggedTensor encoding the features associated with the nodes of the graph.
edge_src (tf.Tensor, tf.RaggedTensor) – A 1-D tf.Tensor or 2-D tf.RaggedTensor encoding the source node indices of the edges of the graph. Entry i in edge_src corresponds to node i (index i of node_Feature).
edge_dst (tf.Tensor, tf.RaggedTensor) – A 1-D tf.Tensor or 2-D tf.RaggedTensor encoding the destination (target) node indices of the edges of the graph. Entry i in edge_src corresponds to node i (index i of node_Feature).
edge_feature (tf.Tensor, tf.RaggedTensor, None) – A 2-D tf.Tensor or 3-D tf.RaggedTensor encoding the features associated with the edges of the graph. Index j corresponds to edge j (index j of edge_src and edge_dst). Edge features are optional, but commonly used for molecular graphs.
edge_weight (tf.Tensor, tf.RaggedTensor, None) – A 1-D tf.Tensor or 2-D tf.RaggedTensor encoding the weights associated with the edges of the graph. Index j corresponds to edge j (index j of edge_feature, edge_src and edge_dst). Edge weights are optional, but useful to encode e.g. attention coefficients.
node_position (tf.Tensor, tf.RaggedTensor, None) – A 2-D tf.Tensor or 3-D tf.RaggedTensor encoding the node positions (commony laplacian positional encoding) corresponding to the nodes. Index i corresponds to node i (index i of node_feature). Node positions are optional, but useful to better encode <3D molecular graphs wherein node positions are not encoded.
**auxiliary (tf.Tensor, tf.RaggedTensor) – Auxiliary graph data to be supplied to the GraphTensor instance. These are user specified data fields and can be useful to supplement the graph with additional information. If the data field added should be associated with the edges or nodes of the graph, prepend ‘edge’ or ‘node’ to the names respectively. If not, a single underscore (‘_’) needs to be prepended; an underscore indiates that the field is static and should not be manipulated (e.g. with merge(), separate()). A static field should not be used in a tf.data.Dataset instance as it requires the data fields to be non-static.

Example usage:

>>> graph_tensor = molgraph.GraphTensor(
...     sizes=[2, 3],
...     node_feature=[
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [0.0, 1.0]
...     ],
...     edge_src=[1, 0, 3, 4, 2, 4, 3, 2],
...     edge_dst=[0, 1, 2, 2, 3, 3, 4, 4],
... )
>>> gnn_model = tf.keras.Sequential([
...     molgraph.layers.GCNConv(32),
...     molgraph.layers.GCNConv(32)
... ])
>>> gnn_model.predict(graph_tensor, verbose=0).shape
TensorShape([2, None, 32])

update(data=None, **data_as_kwargs)[source]

Update data field(s) of the GraphTensor instance.

This method either updates existing data fields or adds new data fields to the GraphTensor instance.

Caution when adding new data fields:

If name of data field starts with ‘node’ or ‘edge’ it is assumed that the size of the corresponding values match with that of node_feature or edge_src respetively. In other words, the new data need to encode the same number of nodes or edges respectively.
If new data should not be associated with the nodes or edges of the GraphTensor instance, then the name of the data field should start with and underscore (‘_’). The underscore indicate that the corresponding values are static and should not be tampered with.

Caution when updating the GraphTensor instance with values of a different type. E.g. when updating a GraphTensor instance (encoding nested tf.RaggedTensor values) with tf.Tensor values:

A GraphTensor instance should only be updated with values originating from the existing values, or corresponding to the existing values. Reason: although very rare, tf.Tensor values coming from another graph structure may have the same size (namely, the same node or edge dimension), but different row lengths (namely, different sized subgraphs). This will result in a silent error, where the GraphTensor instance is updated without error, but with wrongly partioned values.

Example usage:

>>> graph_tensor = molgraph.GraphTensor(
...     sizes=[2, 3],
...     node_feature=[
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [0.0, 1.0]
...     ],
...     edge_src=[1, 0, 3, 4, 2, 4, 3, 2],
...     edge_dst=[0, 1, 2, 2, 3, 3, 4, 4],
... )
>>> random_node_features = tf.random.uniform(
...     graph_tensor.node_feature.shape
... )
>>> random_edge_features = tf.random.uniform(
...     graph_tensor.edge_src.shape.concatenate([1])
... )
>>> graph_tensor = graph_tensor.update({
...     'node_feature': random_node_features,
...     'edge_feature': random_edge_features,
... })

Parameters: data (dict) – Nested data. Specifically, a dictionary of tensors.
Returns: A new updated GraphTensor instance.

remove(fields)[source]

Removes data from the GraphTensor instance.

Example usage:

>>> graph_tensor = molgraph.GraphTensor(
...     sizes=[2, 3],
...     node_feature=[
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [0.0, 1.0]
...     ],
...     edge_feature=[
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...     ],
...     edge_src=[1, 0, 3, 4, 2, 4, 3, 2],
...     edge_dst=[0, 1, 2, 2, 3, 3, 4, 4],
... )
>>> graph_tensor = graph_tensor.remove(['edge_feature'])

Parameters: fields (str, list[str]) – Data fields to be removed from the GraphTensor instance. Currently, edge_dst, edge_src, node_feature and sizes cannot be removed.
Returns: An updated GraphTensor instance.
Return type: GraphTensor

separate(other=None, /)[source]

Converts the GraphTensor into a ragged state.

In other words, this method separates each subgraph of the GraphTensor instance, resulting in a new GraphTensor instance with each subgraph separated by rows:

>>> graph_tensor = molgraph.GraphTensor(
...     sizes=[2, 3],
...     node_feature=[
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [0.0, 1.0]
...     ],
...     edge_src=[1, 0, 3, 4, 2, 4, 3, 2],
...     edge_dst=[0, 1, 2, 2, 3, 3, 4, 4],
... )
>>> graph_tensor = graph_tensor.separate()

This method can optionally be used as a “static method” to separate another GraphTensor instance:

>>> graph_tensor = molgraph.GraphTensor(
...     sizes=[2, 3],
...     node_feature=[
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [0.0, 1.0]
...     ],
...     edge_src=[1, 0, 3, 4, 2, 4, 3, 2],
...     edge_dst=[0, 1, 2, 2, 3, 3, 4, 4],
... )
>>> ds = tf.data.Dataset.from_tensor_slices(graph_tensor)
>>> ds = ds.batch(2).map(molgraph.GraphTensor.separate)

Note: although not a common use case, the separate and merge methods are both implemented in this way to make it convenient to go between states with tf.data.Dataset.

Parameters: other (None, GraphTensor) – A GraphTensor instance passed as a positional-argument-only. If None, self will be separated. Default to None.
Returns: A GraphTensor instance with its subgraphs separated into rows (nested ragged tensors).
Return type: GraphTensor

merge(other=None, /)[source]

Converts the GraphTensor into a non-ragged state.

In other words, this method merged the row-separated subgraphs into a single disjoint graph (all nodes and edges along the same dimension/row):

>>> graph_tensor = molgraph.GraphTensor(
...     sizes=[2, 3],
...     node_feature=[
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [0.0, 1.0]
...     ],
...     edge_src=[1, 0, 3, 4, 2, 4, 3, 2],
...     edge_dst=[0, 1, 2, 2, 3, 3, 4, 4],
... )
>>> graph_tensor = graph_tensor.separate()
>>> graph_tensor = graph_tensor.merge()

This is the preferred state of a GraphTensor instance as it is an efficient representation.

This method can optionally be used as a “static method” to merge another GraphTensor instance:

>>> graph_tensor = molgraph.GraphTensor(
...     sizes=[2, 3],
...     node_feature=[
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [0.0, 1.0]
...     ],
...     edge_src=[1, 0, 3, 4, 2, 4, 3, 2],
...     edge_dst=[0, 1, 2, 2, 3, 3, 4, 4],
... )
>>> graph_tensor = graph_tensor.separate()
>>> ds = tf.data.Dataset.from_tensor_slices(graph_tensor)
>>> ds = ds.batch(2).map(molgraph.GraphTensor.merge)

Note: although not a common use case, the separate and merge methods are both implemented in this way to make it convenient to go between states with tf.data.Dataset.

Parameters: other (None, GraphTensor) – A GraphTensor instance passed as a positional-argument-only. If None, self will be merged. Default to None.
Returns: A GraphTensor instance with its subgraphs merged into a single disjoint graph (nested “rectangular” tensors).
Return type: GraphTensor

propagate(mode='sum', normalize=False, reduction=None, residual=None, **kwargs)[source]

Propagates node features of the GraphTensor instance.

This is a helper method for passing information between nodes; specifically, it aggregates information (features) from source nodes to destination nodes. Roughly, this method uses three gnn_ops in sequence, the first and third being optional:

normalizes edge weights via softmax_edge_weights();
propagates node features via propagate_node_features();
reduces aggregated node features via reduce_features().

Example usage:

>>> graph_tensor = molgraph.GraphTensor(
...     sizes=[2, 3],
...     node_feature=[
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [1.0, 0.0],
...         [0.0, 1.0]
...     ],
...     edge_src=[1, 0, 3, 4, 2, 4, 3, 2],
...     edge_dst=[0, 1, 2, 2, 3, 3, 4, 4],
... )
>>> graph_tensor = graph_tensor.propagate()

Parameters

mode (str) – The type of aggregation to be performed, either of ‘sum’, ‘mean’, ‘min’ or ‘max’. If None, ‘sum’ will be used. Default to ‘sum’.
normalize (bool) – Whether the edge weights (if available) should be normalized (via softmax) before aggregation. Edge weights are usually the attention scores applied to each incoming (source) node feature.
reduction (None, str) – The type of reduction (“merging”) to be performed if the node features span another dimension (e.g. when using multiple attention heads in GATConv or GTConv). Either of ‘concat’, ‘mean’, ‘sum’ or None. Default to None.
residual (None, tf.Tensor) – Residual node features to be added to the output of the aggregated node features. Default to None.
**kwargs –
Valid (optional) keyword arguments are:
- activation: The activation to be performed on the aggregated node features. Default to None.
- exponentiate: Whether to exponentiate edge weights before softmax (defualt to True).
- clip_values: The clipping range that should be applied to the (potentially exponentiated) edge weights. For stability. (default to True).
- output_units: the output dimension (innermost dimension) after reshaping. Only relevant if reduction='concat'. Default to None.
- reduce_axis: Axis to be reduced (“merged”). Ignored if None. Default to 1, which is the axis of the heads of GATConv, GTConv etc.

is_scalar(other=None, /)[source]

Checks whether the GraphTensor instance is a “scalar”.

A “scalar” is loosely defined, but basically means that the GraphTensor instance is “unbatched”. The method should rarely be used. Currently only used for the custom batch encoder to flag that the GraphTensor instance should not be separated when encoded (in the custom batch encoder).

Returns: A boolean indicating whether the GraphTensor instance is a “scalar”.

is_ragged(other=None, /)[source]

Checks whether the GraphTensor instance is in its ragged state.

Returns: A boolean indicating whether the GraphTensor instance is in its ragged state.

property spec

Spec of the GraphTensor instance.

Unlike _type_spec, spec specifies a more realistic (or rather useful) specification of the GraphTensor instance. In other words, the dimension corresponding to the size of the disjoint graph (i.e. number of nodes and edges) is None rather than a specific value. If exact specification is desired, use _type_spec or tf.type_spec_from_value instead.

Returns: the corresponding spec of the GraphTensor.
Return type: GraphTensor.Spec

property shape

Partial shape of the GraphTensor instance.

Note: shape now returns a tf.TensorShape with the following dimensions, regardless of its state: (num_subgraphs, num_nodes, num_features)

Returns: the partial shape of the GraphTensor.
Return type: tf.TensorShape

property dtype

Partial dtype of the GraphTensor instance.

Returns: the partial dtype of the GraphTensor.
Return type: tf.DType

property rank

Partial rank of the GraphTensor instance.

Returns: the partial rank of the GraphTensor.
Return type: int

property data

Unpacks the nested data of the GraphTensor instance.

meth:~data corresponds to data_spec().

When working with values returned from data, make sure to also work with specs returned from data_spec of the associated Spec instance. If not, conflicts will likely occur.

This property is implemented to be more selective and have more control over what data should be unpacked from the GraphTensor instance.

Unfortunately, tf.nest ops cannot be used directly on the GraphTensor instance as it will expand all composites including tf.RaggedTensor values. E.g., when performing tf.nest.map_structure it will flatten tf.RaggedTensor into its composites, which is an undesired behavior. By returning a dict of nested data, tf.nest ops can be used on the dict without specifying expand_composites=True, resulting in tf.RaggedTensor values not being flattened. Furthermore, it allows us to unpack the auxiliary data approprietly.

Returns: A dictionary with nested data values.

__getattr__(name)[source]

Access data fields of the GraphTensor as attributes.

Only called when attribute lookup has not found attribute name in the usual places.

Parameters: name (str) – The data field to be extracted.
Returns: A tf.Tensor corresponding to the data field name.
Raises: AttributeError – if name is not a data field of

:raises the GraphTensor.:

__getitem__(index)[source]

Access subgraphs of the GraphTensor via indexing.

Parameters

index (slice, int, list[int]) – Indices or slice for accessing certain subgraphs of the GraphTensor instance.

Returns

A GraphTensor instance with the specified subgraphs.

Raises

KeyError – if index (str) does not exist in data spec.
tf.errors.InvalidArgumentError – if index (int, list[int]) is out
of range. –