Graph Tensor
- class molgraph.tensors.graph_tensor.GraphTensor(tensorflow.experimental.BatchableExtensionType)[source]
A custom tensor encoding a graph.
A (molecular) graph, encoded as a
GraphTensorinstance, could encode a single subgraph (single molecule) or multiple subgraphs (multiple molecules). Furthermore, theGraphTensorcan either encode multiple molecules (molecular graphs) as a single disjoint graph (nested tf.Tensor values) or multiple subgraphs (nested tf.RaggedTensor values). It is recommended to encode a (molecular) graph as a disjoint graph as it is a significantly more efficient representation, both in terms of memory and runtime.Note: every method that seemingly modifies the
GraphTensorinstance actually does not modify it. Instead, a newGraphTensorinstance is constructed and returned by these methods. This is necessary to allow TF to properly track theGraphTensorinstances. These methods include:propagate(),merge(),separate(),update(),remove(), etc.- Parameters
sizes (tf.Tensor) – A 1-D or 0-D tf.Tensor specifying the sizes of the subgraphs.
node_feature (tf.Tensor, tf.RaggedTensor) – A 2-D tf.Tensor or 3-D tf.RaggedTensor encoding the features associated with the nodes of the graph.
edge_src (tf.Tensor, tf.RaggedTensor) – A 1-D tf.Tensor or 2-D tf.RaggedTensor encoding the source node indices of the edges of the graph. Entry i in edge_src corresponds to node i (index i of node_Feature).
edge_dst (tf.Tensor, tf.RaggedTensor) – A 1-D tf.Tensor or 2-D tf.RaggedTensor encoding the destination (target) node indices of the edges of the graph. Entry i in edge_src corresponds to node i (index i of node_Feature).
edge_feature (tf.Tensor, tf.RaggedTensor, None) – A 2-D tf.Tensor or 3-D tf.RaggedTensor encoding the features associated with the edges of the graph. Index j corresponds to edge j (index j of edge_src and edge_dst). Edge features are optional, but commonly used for molecular graphs.
edge_weight (tf.Tensor, tf.RaggedTensor, None) – A 1-D tf.Tensor or 2-D tf.RaggedTensor encoding the weights associated with the edges of the graph. Index j corresponds to edge j (index j of edge_feature, edge_src and edge_dst). Edge weights are optional, but useful to encode e.g. attention coefficients.
node_position (tf.Tensor, tf.RaggedTensor, None) – A 2-D tf.Tensor or 3-D tf.RaggedTensor encoding the node positions (commony laplacian positional encoding) corresponding to the nodes. Index i corresponds to node i (index i of node_feature). Node positions are optional, but useful to better encode <3D molecular graphs wherein node positions are not encoded.
**auxiliary (tf.Tensor, tf.RaggedTensor) – Auxiliary graph data to be supplied to the
GraphTensorinstance. These are user specified data fields and can be useful to supplement the graph with additional information. If the data field added should be associated with the edges or nodes of the graph, prepend ‘edge’ or ‘node’ to the names respectively. If not, a single underscore (‘_’) needs to be prepended; an underscore indiates that the field is static and should not be manipulated (e.g. withmerge(),separate()). A static field should not be used in a tf.data.Dataset instance as it requires the data fields to be non-static.
Example usage:
>>> graph_tensor = molgraph.GraphTensor( ... sizes=[2, 3], ... node_feature=[ ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [0.0, 1.0] ... ], ... edge_src=[1, 0, 3, 4, 2, 4, 3, 2], ... edge_dst=[0, 1, 2, 2, 3, 3, 4, 4], ... ) >>> gnn_model = tf.keras.Sequential([ ... molgraph.layers.GCNConv(32), ... molgraph.layers.GCNConv(32) ... ]) >>> gnn_model.predict(graph_tensor, verbose=0).shape TensorShape([2, None, 32])
- update(data=None, **data_as_kwargs)[source]
Update data field(s) of the
GraphTensorinstance.This method either updates existing data fields or adds new data fields to the
GraphTensorinstance.- Caution when adding new data fields:
If name of data field starts with ‘node’ or ‘edge’ it is assumed that the size of the corresponding values match with that of node_feature or edge_src respetively. In other words, the new data need to encode the same number of nodes or edges respectively.
If new data should not be associated with the nodes or edges of the
GraphTensorinstance, then the name of the data field should start with and underscore (‘_’). The underscore indicate that the corresponding values are static and should not be tampered with.
Caution when updating the
GraphTensorinstance with values of a different type. E.g. when updating aGraphTensorinstance (encoding nested tf.RaggedTensor values) with tf.Tensor values:A
GraphTensorinstance should only be updated with values originating from the existing values, or corresponding to the existing values. Reason: although very rare, tf.Tensor values coming from another graph structure may have the same size (namely, the same node or edge dimension), but different row lengths (namely, different sized subgraphs). This will result in a silent error, where theGraphTensorinstance is updated without error, but with wrongly partioned values.
Example usage:
>>> graph_tensor = molgraph.GraphTensor( ... sizes=[2, 3], ... node_feature=[ ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [0.0, 1.0] ... ], ... edge_src=[1, 0, 3, 4, 2, 4, 3, 2], ... edge_dst=[0, 1, 2, 2, 3, 3, 4, 4], ... ) >>> random_node_features = tf.random.uniform( ... graph_tensor.node_feature.shape ... ) >>> random_edge_features = tf.random.uniform( ... graph_tensor.edge_src.shape.concatenate([1]) ... ) >>> graph_tensor = graph_tensor.update({ ... 'node_feature': random_node_features, ... 'edge_feature': random_edge_features, ... })
- Parameters
data (dict) – Nested data. Specifically, a dictionary of tensors.
- Returns
A new updated
GraphTensorinstance.
- remove(fields)[source]
Removes data from the
GraphTensorinstance.Example usage:
>>> graph_tensor = molgraph.GraphTensor( ... sizes=[2, 3], ... node_feature=[ ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [0.0, 1.0] ... ], ... edge_feature=[ ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... ], ... edge_src=[1, 0, 3, 4, 2, 4, 3, 2], ... edge_dst=[0, 1, 2, 2, 3, 3, 4, 4], ... ) >>> graph_tensor = graph_tensor.remove(['edge_feature'])
- Parameters
fields (str, list[str]) – Data fields to be removed from the
GraphTensorinstance. Currently, edge_dst, edge_src, node_feature and sizes cannot be removed.- Returns
An updated
GraphTensorinstance.- Return type
- separate(other=None, /)[source]
Converts the
GraphTensorinto a ragged state.In other words, this method separates each subgraph of the
GraphTensorinstance, resulting in a newGraphTensorinstance with each subgraph separated by rows:>>> graph_tensor = molgraph.GraphTensor( ... sizes=[2, 3], ... node_feature=[ ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [0.0, 1.0] ... ], ... edge_src=[1, 0, 3, 4, 2, 4, 3, 2], ... edge_dst=[0, 1, 2, 2, 3, 3, 4, 4], ... ) >>> graph_tensor = graph_tensor.separate()
This method can optionally be used as a “static method” to separate another
GraphTensorinstance:>>> graph_tensor = molgraph.GraphTensor( ... sizes=[2, 3], ... node_feature=[ ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [0.0, 1.0] ... ], ... edge_src=[1, 0, 3, 4, 2, 4, 3, 2], ... edge_dst=[0, 1, 2, 2, 3, 3, 4, 4], ... ) >>> ds = tf.data.Dataset.from_tensor_slices(graph_tensor) >>> ds = ds.batch(2).map(molgraph.GraphTensor.separate)
Note: although not a common use case, the separate and merge methods are both implemented in this way to make it convenient to go between states with tf.data.Dataset.
- Parameters
other (None, GraphTensor) – A
GraphTensorinstance passed as a positional-argument-only. If None, self will be separated. Default to None.- Returns
A
GraphTensorinstance with its subgraphs separated into rows (nested ragged tensors).- Return type
- merge(other=None, /)[source]
Converts the
GraphTensorinto a non-ragged state.In other words, this method merged the row-separated subgraphs into a single disjoint graph (all nodes and edges along the same dimension/row):
>>> graph_tensor = molgraph.GraphTensor( ... sizes=[2, 3], ... node_feature=[ ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [0.0, 1.0] ... ], ... edge_src=[1, 0, 3, 4, 2, 4, 3, 2], ... edge_dst=[0, 1, 2, 2, 3, 3, 4, 4], ... ) >>> graph_tensor = graph_tensor.separate() >>> graph_tensor = graph_tensor.merge()
This is the preferred state of a
GraphTensorinstance as it is an efficient representation.This method can optionally be used as a “static method” to merge another
GraphTensorinstance:>>> graph_tensor = molgraph.GraphTensor( ... sizes=[2, 3], ... node_feature=[ ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [0.0, 1.0] ... ], ... edge_src=[1, 0, 3, 4, 2, 4, 3, 2], ... edge_dst=[0, 1, 2, 2, 3, 3, 4, 4], ... ) >>> graph_tensor = graph_tensor.separate() >>> ds = tf.data.Dataset.from_tensor_slices(graph_tensor) >>> ds = ds.batch(2).map(molgraph.GraphTensor.merge)
Note: although not a common use case, the separate and merge methods are both implemented in this way to make it convenient to go between states with tf.data.Dataset.
- Parameters
other (None, GraphTensor) – A
GraphTensorinstance passed as a positional-argument-only. If None, self will be merged. Default to None.- Returns
A
GraphTensorinstance with its subgraphs merged into a single disjoint graph (nested “rectangular” tensors).- Return type
- propagate(mode='sum', normalize=False, reduction=None, residual=None, **kwargs)[source]
Propagates node features of the
GraphTensorinstance.This is a helper method for passing information between nodes; specifically, it aggregates information (features) from source nodes to destination nodes. Roughly, this method uses three
gnn_opsin sequence, the first and third being optional:normalizes edge weights via
softmax_edge_weights();propagates node features via
propagate_node_features();reduces aggregated node features via
reduce_features().
Example usage:
>>> graph_tensor = molgraph.GraphTensor( ... sizes=[2, 3], ... node_feature=[ ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [1.0, 0.0], ... [0.0, 1.0] ... ], ... edge_src=[1, 0, 3, 4, 2, 4, 3, 2], ... edge_dst=[0, 1, 2, 2, 3, 3, 4, 4], ... ) >>> graph_tensor = graph_tensor.propagate()
- Parameters
mode (str) – The type of aggregation to be performed, either of ‘sum’, ‘mean’, ‘min’ or ‘max’. If None, ‘sum’ will be used. Default to ‘sum’.
normalize (bool) – Whether the edge weights (if available) should be normalized (via softmax) before aggregation. Edge weights are usually the attention scores applied to each incoming (source) node feature.
reduction (None, str) – The type of reduction (“merging”) to be performed if the node features span another dimension (e.g. when using multiple attention heads in
GATConvorGTConv). Either of ‘concat’, ‘mean’, ‘sum’ or None. Default to None.residual (None, tf.Tensor) – Residual node features to be added to the output of the aggregated node features. Default to None.
**kwargs –
Valid (optional) keyword arguments are:
activation: The activation to be performed on the aggregated node features. Default to None.
exponentiate: Whether to exponentiate edge weights before softmax (defualt to True).
clip_values: The clipping range that should be applied to the (potentially exponentiated) edge weights. For stability. (default to True).
output_units: the output dimension (innermost dimension) after reshaping. Only relevant if
reduction='concat'. Default to None.reduce_axis: Axis to be reduced (“merged”). Ignored if None. Default to 1, which is the axis of the heads of
GATConv,GTConvetc.
- is_scalar(other=None, /)[source]
Checks whether the
GraphTensorinstance is a “scalar”.A “scalar” is loosely defined, but basically means that the
GraphTensorinstance is “unbatched”. The method should rarely be used. Currently only used for the custom batch encoder to flag that theGraphTensorinstance should not be separated when encoded (in the custom batch encoder).- Returns
A boolean indicating whether the
GraphTensorinstance is a “scalar”.
- is_ragged(other=None, /)[source]
Checks whether the
GraphTensorinstance is in its ragged state.- Returns
A boolean indicating whether the
GraphTensorinstance is in its ragged state.
- property spec
Spec of the
GraphTensorinstance.Unlike _type_spec, spec specifies a more realistic (or rather useful) specification of the
GraphTensorinstance. In other words, the dimension corresponding to the size of the disjoint graph (i.e. number of nodes and edges) is None rather than a specific value. If exact specification is desired, use _type_spec or tf.type_spec_from_value instead.- Returns
the corresponding spec of the
GraphTensor.- Return type
GraphTensor.Spec
- property shape
Partial shape of the
GraphTensorinstance.Note: shape now returns a tf.TensorShape with the following dimensions, regardless of its state: (num_subgraphs, num_nodes, num_features)
- Returns
the partial shape of the
GraphTensor.- Return type
tf.TensorShape
- property dtype
Partial dtype of the
GraphTensorinstance.- Returns
the partial dtype of the
GraphTensor.- Return type
tf.DType
- property rank
Partial rank of the
GraphTensorinstance.- Returns
the partial rank of the
GraphTensor.- Return type
int
- property data
Unpacks the nested data of the
GraphTensorinstance.meth:~data corresponds to
data_spec().When working with values returned from data, make sure to also work with specs returned from data_spec of the associated
Specinstance. If not, conflicts will likely occur.This property is implemented to be more selective and have more control over what data should be unpacked from the
GraphTensorinstance.Unfortunately, tf.nest ops cannot be used directly on the
GraphTensorinstance as it will expand all composites including tf.RaggedTensor values. E.g., when performing tf.nest.map_structure it will flatten tf.RaggedTensor into its composites, which is an undesired behavior. By returning a dict of nested data, tf.nest ops can be used on the dict without specifying expand_composites=True, resulting in tf.RaggedTensor values not being flattened. Furthermore, it allows us to unpack the auxiliary data approprietly.- Returns
A dictionary with nested data values.
- __getattr__(name)[source]
Access data fields of the
GraphTensoras attributes.Only called when attribute lookup has not found attribute name in the usual places.
- Parameters
name (str) – The data field to be extracted.
- Returns
A tf.Tensor corresponding to the data field name.
- Raises
AttributeError – if name is not a data field of
:raises the
GraphTensor.:
- __getitem__(index)[source]
Access subgraphs of the
GraphTensorvia indexing.- Parameters
index (slice, int, list[int]) – Indices or slice for accessing certain subgraphs of the
GraphTensorinstance.- Returns
A
GraphTensorinstance with the specified subgraphs.- Raises
KeyError – if index (str) does not exist in data spec.
tf.errors.InvalidArgumentError – if index (int, list[int]) is out
of range. –