The optional sparse_optimizations
top-level key specifies the sparse acceleration features or SAFs that provide a way for the hardware to exploit sparsity in the data to improve performance or energy consumption.
By default Timeloop assumes that all processing is done on dense data in an uncompressed format. The sparse-optimizations
key specifies architectural features that optimize the behavior of the system to exploit sparsity. These optimizations include:
The sparse_optimizations:
key contains a version and a list of records containing targets (subtree names and their associated SAFs.
version:
keyEach top-level YAML key used as a Timeloop/Accelergy input has a version. The current version for the mapspace_constraints:
is TBD, which is specified as follows:
sparse_optimizations:
version: TBD
...
Note: The version key is currently ignored...
Just like the mapping/constraint specifications, sparse acceleration features (SAFs) are specified as a list, which each entry describing the SAFs applied to a architecture level, storage and compute.
targets
keyList all of the applied sparse optimizations, each targeting a specific level in the architecture.
sparse_optimizations:
targets:
...
For each entry in the target list, we first need to specify the name of the component that we want to apply the SAF to with the name
key:
sparse_optimizations:
targets:
- name: Buffer
...
Once we have the component name specified, there are two types of SAF that we can apply to
action-optimization
: key for specifying gating/skipping optimizations that explicitly reduce the number of actions happened during the run.representation-format
: key for specifying the format that the sparse data is represented in. A compressed format can help save storage spaces and accesses. action-optimzation
keyTo specify which type of explicit optimizaiton is applied to the component, we use the type
key under action-optimzation
. In the example below, we specify a gating options by assigning gating
as the value of the type
key. There are three potential values that we can specify for the type of the optimization: gating
, skipping
and spatial-skipping
. gating
and skipping
are applied temporally, i.e., storage accesses are reduced, but the required number of spatial instances stay the same as dense. Whereas spatial-skipping
reduces the required spatial instances. For example, when there are only two nonzeros out of a block of four values that are spatially sents to different PEs, applying spatial-skipping
allows the analysis to realize only two PEs are required.
Please note that the current implementation do not support applying skipping/gating
and spatail-skipping
at the same time.
sparse_optimizations:
targets:
- name: Buffer
action-optimization:
type: gating
...
...
At a high level, each action optimization is specified as a leader-follower intersection, whcih checks one operand, and if this operand is zero, it avoids accessing the other operand. We call the checked operand the leader and the operand with gated access the follower.
To specify the leader-follow intersection, we need to specify a target
key and a condition-on
key to indicate exactly how the optimization has happened. More specifically, we use two special keys:
target
key: the dataspace whose accesses will be optimized away, i.e., the follower in a leader follower intersection. condition-on
key: takes a list as its value, which specifies what dataspaces we should be checking for zeros when performing the optimization, i.e., the leaders in a leader follower intersection. The example below essentially specifies " at the Buffer
level, if A
has a zero value, we gate the read access to B
".
sparse_optimizations:
targets:
- name: Buffer
action-optimization:
- type: gating
target: B
condition-on: [ A ]
...
...
To specify a double-sided intersection, e.g., accesses to A and B are gated based on the values from the other tensor, we can simply specify two complementary leader-follower intersection like below. Note how the role of leader and follower is switched in the second item in the list.
sparse_optimizations:
targets:
- name: Buffer
action-optimization:
- type: gating
target: B
condition-on: [ A ]
- type: gating
target: A
condition-on: [ B ]
...
representation-format
keyThe representation format specification is based on the fibertree concpet, which is a format agnostic way to represent a sparse tensor. In the fibertree notation, each dimension of the tensor is referred to as a rank
. Each rank
should be assigned a representation format, More specifically, each rank could be assigned the following format:
Please refer to Sparseloop paper Section III.A for more detailed explainantions for how to use these per-rank format to represent more sophisticated formats.
To specify a reprentation format, we use the data-spaces
key to first specify a list. Each element in the list specifies the format for specific tensor, which is identified using
name
key: component that the SAF is applied to.ranks
key: specify a list of formats assigned to each rank in the fiber-treebased representation.sparse_optimizations:
targets:
- name: Buffer
representation-format:
data-spaces:
- name: A
ranks:
- format: CP
...
By default, the number of bits used for each rank's metadata is the same as the hardware metadata storage word-width. However, the users do have the option to specufy the metadata bitwidth in the sparse-optimiation specification like below. The specific example specifies each coodinate meta is 4 bits.
sparse_optimizations:
targets:
- name: Buffer
representation-format:
data-spaces:
- name: A
ranks:
- format: CP
metadata-word-bits: 4
...
In our analysis, we require the tensors to be pre-tiled for compressed processing, i.e., the hardware should support metadata traversal for non-trivial for-loop in the mapping. As a result, the representation format can have more ranks than the number of dimensions in the tensor, as each dimension can be tiled multiple times to fit in multiple storage levels. In a pre-tiled tensor, stroage levels that are closer to the root of the storage hierarchy must have more (or equal) number of ranks compared to its child levels.
For example, for the mapping below, tensor B
(indexed by K
and N
, or, a K-by-N tensor) is tiled along the N
dimension at the BackStorage
with the factor N1
.
----- BackingStorage -----
for n1 = [0, N1)
----- Buffer -----
for k in [0, K)
for n0 in [0, N0=N/N1)
As a result, the representation format for tensor B
in the BackingStorage
needs to support three ranks, one for N1
, one for K
, and one for N0
. Whereas the Buffer
only needs to support two ranks for B
, K
and N0
.
sparse_optimizations:
targets:
- name: BackingStorage
representation-format:
data-spaces:
- name: B # pretiling is required, so B's is represented with 3 ranks
ranks:
- format: UOP
- format: UOP
- format: CP
- name: Buffer
representation-format:
data-spaces:
- name: B
ranks: # each tile of B requires two ranks only
- format: UOP
- format: CP
...
Here is an example:
sparse_optimizations:
targets:
- name: BackingStorage
representation-format:
data-spaces:
- name: A
ranks:
- format: UOP
- format: CP
- name: B
ranks:
- format: UOP
- format: CP
- name: Buffer
representation-format:
data-spaces:
- name: B
ranks:
- format: UOP # uncompressed offset pair
- format: CP # coordinate payload
- name: A
ranks:
- format: UOP # uncompressed offset pair
- format: CP # coordinate payload
action-optimization:
- type: skipping
options:
- target: A
condition-on: [ B ]
- type: skipping
options:
- target: B
condition-on: [ A ]
- type: skipping
options:
- target: Z
condition-on: [ A, B ]