Sparse Optimizations

The optional sparse_optimizations top-level key specifies the sparse acceleration features or SAFs that provide a way for the hardware to exploit sparsity in the data to improve performance or energy consumption.

By default Timeloop assumes that all processing is done on dense data in an uncompressed format. The sparse-optimizations key specifies architectural features that optimize the behavior of the system to exploit sparsity. These optimizations include:

  • Assigning a compressed tensor format to the data to save space
  • Gating of ineffectual operations to save energy
  • Skipping of ineffectual operations to save time and energy

The sparse_optimizations: key contains a version and a list of records containing targets (subtree names and their associated SAFs.

The version: key

Each top-level YAML key used as a Timeloop/Accelergy input has a version. The current version for the mapspace_constraints: is TBD, which is specified as follows:

sparse_optimizations:
  version: TBD
  ...

Note: The version key is currently ignored...

Sparse Acceleration Features

Just like the mapping/constraint specifications, sparse acceleration features (SAFs) are specified as a list, which each entry describing the SAFs applied to a architecture level, storage and compute.

The targets key

List all of the applied sparse optimizations, each targeting a specific level in the architecture.

  sparse_optimizations:
    targets:
      ...

For each entry in the target list, we first need to specify the name of the component that we want to apply the SAF to with the name key:

sparse_optimizations:
  targets:
     - name:  Buffer
     ...

Once we have the component name specified, there are two types of SAF that we can apply to

  • action-optimization: key for specifying gating/skipping optimizations that explicitly reduce the number of actions happened during the run.
  • representation-format: key for specifying the format that the sparse data is represented in. A compressed format can help save storage spaces and accesses.

The action-optimzation key

To specify which type of explicit optimizaiton is applied to the component, we use the type key under action-optimzation. In the example below, we specify a gating options by assigning gating as the value of the type key. There are three potential values that we can specify for the type of the optimization: gating, skipping and spatial-skipping. gating and skipping are applied temporally, i.e., storage accesses are reduced, but the required number of spatial instances stay the same as dense. Whereas spatial-skipping reduces the required spatial instances. For example, when there are only two nonzeros out of a block of four values that are spatially sents to different PEs, applying spatial-skipping allows the analysis to realize only two PEs are required.

Please note that the current implementation do not support applying skipping/gating and spatail-skipping at the same time.

sparse_optimizations:
  targets:
     - name:  Buffer            
        action-optimization:  
            type: gating          
            ...
     ...

At a high level, each action optimization is specified as a leader-follower intersection, whcih checks one operand, and if this operand is zero, it avoids accessing the other operand. We call the checked operand the leader and the operand with gated access the follower.

To specify the leader-follow intersection, we need to specify a target key and a condition-on key to indicate exactly how the optimization has happened. More specifically, we use two special keys:

  • the target key: the dataspace whose accesses will be optimized away, i.e., the follower in a leader follower intersection.
  • the condition-on key: takes a list as its value, which specifies what dataspaces we should be checking for zeros when performing the optimization, i.e., the leaders in a leader follower intersection.

The example below essentially specifies " at the Buffer level, if A has a zero value, we gate the read access to B".

sparse_optimizations:
  targets:
     - name:  Buffer            
        action-optimization:   
            - type: gating           
              target: B                
              condition-on: [ A ]  
            ...
     ...

To specify a double-sided intersection, e.g., accesses to A and B are gated based on the values from the other tensor, we can simply specify two complementary leader-follower intersection like below. Note how the role of leader and follower is switched in the second item in the list.

sparse_optimizations:
  targets:
     - name:  Buffer            
        action-optimization:   
           - type: gating           
             target: B                
             condition-on: [ A ]  
           - type: gating           
             target: A                
             condition-on: [ B ]  
     ...

The representation-format key

The representation format specification is based on the fibertree concpet, which is a format agnostic way to represent a sparse tensor. In the fibertree notation, each dimension of the tensor is referred to as a rank. Each rank should be assigned a representation format, More specifically, each rank could be assigned the following format:

  • Coordinate Payload (CP): the coordinates of each nonzero value are encoded with multiple bits. The payloads are either the nonzero value or a pointer to another dimension. CP explicitly lists the coordinates and the corresponding payloads.
  • Bitmask (B): a single bit is used to encode whether each coordinate is nonzero or not.
  • Run Length Encoding (RLE): multiple bits are used to encode the run length, which represents the number of zeros between nonzeros (eg, an r-bit run length can encode up to a 2^r-1 run of zeros).
  • Uncompressed Offset Pairs (UOP): multiple bits are used to encode the start (inclusive) and end (noninclusive) positions of nonzero values.

Please refer to Sparseloop paper Section III.A for more detailed explainantions for how to use these per-rank format to represent more sophisticated formats.

To specify a reprentation format, we use the data-spaces key to first specify a list. Each element in the list specifies the format for specific tensor, which is identified using

  • the name key: component that the SAF is applied to.
  • the ranks key: specify a list of formats assigned to each rank in the fiber-treebased representation.
sparse_optimizations:
  targets:
    - name: Buffer                
      representation-format:  
        data-spaces:
          - name: A            
            ranks:                
              - format: CP   
              ...

By default, the number of bits used for each rank's metadata is the same as the hardware metadata storage word-width. However, the users do have the option to specufy the metadata bitwidth in the sparse-optimiation specification like below. The specific example specifies each coodinate meta is 4 bits.

sparse_optimizations:
  targets:
    - name: Buffer                
      representation-format:  
        data-spaces:
          - name: A            
            ranks:                
              - format: CP   
                metadata-word-bits: 4
                ...

In our analysis, we require the tensors to be pre-tiled for compressed processing, i.e., the hardware should support metadata traversal for non-trivial for-loop in the mapping. As a result, the representation format can have more ranks than the number of dimensions in the tensor, as each dimension can be tiled multiple times to fit in multiple storage levels. In a pre-tiled tensor, stroage levels that are closer to the root of the storage hierarchy must have more (or equal) number of ranks compared to its child levels.

For example, for the mapping below, tensor B (indexed by K and N, or, a K-by-N tensor) is tiled along the N dimension at the BackStorage with the factor N1.

 ----- BackingStorage -----
 for n1 = [0, N1)

 ----- Buffer -----
      for k in [0, K)
         for n0 in [0, N0=N/N1)

As a result, the representation format for tensor B in the BackingStorage needs to support three ranks, one for N1, one for K, and one for N0. Whereas the Buffer only needs to support two ranks for B, K and N0.

sparse_optimizations:
  targets:
    - name: BackingStorage
      representation-format:
        data-spaces:
          - name: B          # pretiling is required, so B's is represented with 3 ranks
            ranks:
              - format: UOP
              - format: UOP
              - format: CP

    - name: Buffer
      representation-format:
        data-spaces:
          - name: B
            ranks:          # each tile of B requires two ranks only
              - format: UOP
              - format: CP
          ...

A Complete Example

Here is an example:

sparse_optimizations:
  targets:
    - name: BackingStorage
      representation-format:
        data-spaces:
          - name: A
            ranks:
              - format: UOP
              - format: CP
          - name: B
            ranks:
              - format: UOP
              - format: CP

    - name: Buffer
      representation-format:
        data-spaces:
          - name: B
            ranks:
              - format: UOP # uncompressed offset pair
              - format: CP  # coordinate payload
          - name: A
            ranks:
              - format: UOP # uncompressed offset pair
              - format: CP  # coordinate payload

      action-optimization:
        - type: skipping
          options:
            - target: A
              condition-on: [ B ]
        - type: skipping
          options:
            - target: B
              condition-on: [ A ]
        - type: skipping
          options:
            - target: Z
              condition-on: [ A, B ]