Each component can have a set of sparse acceleration features or SAFs that allow it hardware to exploit sparsity in the data to reduce energy consumption and/or increase throughput.
Sparse optimizations are expressed in the following format:
- !Component
sparse_optimizations:
action_optimization: <Action optimization dictionary>
representation_format: <Representation format dictionary>
compute_optimization: <Compute optimization dictionary>
Alternatively, sparse optimizations may be expressed in a separate file by using the following format:
sparse_optimizations:
version: 0.4
targets:
- target: <Name of component>
action_optimization: <Action optimization dictionary>
representation_format: <Representation format dictionary>
compute_optimization: <Compute optimization dictionary>
- target: <Name of component>
...
- target: <Name of component>
...
By default Timeloop assumes that all processing is done on dense data in an
uncompressed format. The sparse_optimizations
key specifies architectural
features that optimize the behavior of the system to exploit sparsity. These
optimizations include:
There are two types of SAF that we can apply to storage components:
action_optimization
: key for specifying gating/skipping optimizations that explicitly reduce the number of actions happened during the run.representation_format
: key for specifying the format that the sparse data is represented in. A compressed format can help save storage spaces and accesses. action_optimzation
keyTo specify which type of explicit optimization is applied to the component, we
use the type
key under action_optimzation
. In the example below, we specify
a gating options by assigning gating
as the value of the type
key. There are
three potential values that we can specify for the type of the optimization:
gating
, skipping
and spatial-skipping
. gating
and skipping
are applied
temporally, i.e., storage accesses are reduced, but the required number of
spatial instances stay the same as dense. Whereas spatial-skipping
reduces the
required spatial instances. For example, when there are only two nonzeros out of
a block of four values that are spatially sents to different PEs, applying
spatial-skipping
allows the analysis to realize only two PEs are required.
Please note that the current implementation do not support applying
skipping/gating
and spatial-skipping
at the same time.
action_optimization:
type: gating
...
At a high level, each action optimization is specified as a leader-follower intersection, whcih checks one operand, and if this operand is zero, it avoids accessing the other operand. We call the checked operand the leader and the operand with gated access the follower.
To specify the leader-follow intersection, we need to specify a target
key and a condition-on
key to indicate exactly how the optimization has happened. More specifically, we use two special keys:
target
key: the dataspace whose accesses will be optimized away, i.e., the follower in a leader follower intersection. condition-on
key: takes a list as its value, which specifies what dataspaces we should be checking for zeros when performing the optimization, i.e., the leaders in a leader follower intersection. The example below essentially specifies " at the Buffer
level, if A
has a
zero value, we gate the read access to B
".
- !Component
name: buffer
sparse_optimizations:
action_optimization:
- type: gating
target: B
condition-on: [ A ]
...
...
To specify a double-sided intersection, e.g., accesses to A and B are gated based on the values from the other tensor, we can simply specify two complementary leader-follower intersection like below. Note how the role of leader and follower is switched in the second item in the list.
- !Component
name: buffer
sparse_optimizations:
action_optimization:
- type: gating
target: B
condition-on: [ A ]
- type: gating
target: A
condition-on: [ B ]
...
...
representation_format
keyThe representation format specification is based on the fibertree concpet, which is a format agnostic way to represent a sparse tensor. In the fibertree notation, each dimension of the tensor is referred to as a rank
. Each rank
should be assigned a representation format, More specifically, each rank could be assigned the following format:
Please refer to Sparseloop paper Section III.A for more detailed explainantions for how to use these per-rank format to represent more sophisticated formats.
To specify a reprentation format, we use the data_spaces
key to first specify
a list. Each element in the list specifies the format for specific tensor, which
is identified using
name
key: component that the SAF is applied to.ranks
key: specify a list of formats assigned to each rank in the fiber-treebased representation.!Component
name: buffer
sparse_optimizations:
representation_format:
data_spaces:
- name: A
ranks:
- format: CP
...
...
By default, the number of bits used for each rank's metadata is the same as the hardware metadata storage word-width. However, the users do have the option to specufy the metadata bitwidth in the sparse-optimiation specification like below. The specific example specifies each coodinate meta is 4 bits.
sparse_optimizations:
targets:
- name: Buffer
representation_format:
data_spaces:
- name: A
ranks:
- format: CP
metadata_word_bits: 4
...
In our analysis, we require the tensors to be pre-tiled for compressed processing, i.e., the hardware should support metadata traversal for non-trivial for-loop in the mapping. As a result, the representation format can have more ranks than the number of dimensions in the tensor, as each dimension can be tiled multiple times to fit in multiple storage levels. In a pre-tiled tensor, stroage levels that are closer to the root of the storage hierarchy must have more (or equal) number of ranks compared to its child levels.
For example, for the mapping below, tensor B
(indexed by K
and N
, or, a
K-by-N tensor) is tiled along the N
dimension at the BackStorage
with the
factor N1
.
----- BackingStorage -----
for n1 = [0, N1)
----- Buffer -----
for k in [0, K)
for n0 in [0, N0=N/N1)
As a result, the representation format for tensor B
in the BackingStorage
needs to support three ranks, one for N1
, one for K
, and one for N0
.
Whereas the Buffer
only needs to support two ranks for B
, K
and N0
.
In the below example, we show sparse_optimizations as a separate file.
sparse_optimizations:
version: 0.4
targets:
- name: BackingStorage
representation_format:
data_spaces:
- name: B # pretiling is required, so B's is represented with 3 ranks
ranks:
- format: UOP
- format: UOP
- format: CP
- name: Buffer
representation_format:
data_spaces:
- name: B
ranks: # each tile of B requires two ranks only
- format: UOP
- format: CP
...
Here is an example:
sparse_optimizations:
targets:
- name: BackingStorage
representation-format:
data-spaces:
- name: A
ranks:
- format: UOP
- format: CP
- name: B
ranks:
- format: UOP
- format: CP
- name: Buffer
representation-format:
data-spaces:
- name: B
ranks:
- format: UOP # uncompressed offset pair
- format: CP # coordinate payload
- name: A
ranks:
- format: UOP # uncompressed offset pair
- format: CP # coordinate payload
action-optimization:
- type: skipping
options:
- target: A
condition-on: [ B ]
- type: skipping
options:
- target: B
condition-on: [ A ]
- type: skipping
options:
- target: Z
condition-on: [ A, B ]