Sparse Optimizations

Sparse Optimizations

Each component can have a set of sparse acceleration features or SAFs that allow it hardware to exploit sparsity in the data to reduce energy consumption and/or increase throughput.

Sparse optimizations are expressed in the following format:

- !Component
  sparse_optimizations:
    action_optimization: <Action optimization dictionary>
    representation_format: <Representation format dictionary>
    compute_optimization: <Compute optimization dictionary>

Alternatively, sparse optimizations may be expressed in a separate file by using the following format:

sparse_optimizations:
  version: 0.4
  targets:
  - target: <Name of component>
    action_optimization: <Action optimization dictionary>
    representation_format: <Representation format dictionary>
    compute_optimization: <Compute optimization dictionary>
  - target: <Name of component>
    ...
  - target: <Name of component>
    ...

By default Timeloop assumes that all processing is done on dense data in an uncompressed format. The sparse_optimizations key specifies architectural features that optimize the behavior of the system to exploit sparsity. These optimizations include:

  • Assigning a compressed tensor format to the data to save space
  • Gating of ineffectual operations to save energy
  • Skipping of ineffectual operations to save time and energy

There are two types of SAF that we can apply to storage components:

  • action_optimization: key for specifying gating/skipping optimizations that explicitly reduce the number of actions happened during the run.
  • representation_format: key for specifying the format that the sparse data is represented in. A compressed format can help save storage spaces and accesses.

The action_optimzation key

To specify which type of explicit optimization is applied to the component, we use the type key under action_optimzation. In the example below, we specify a gating options by assigning gating as the value of the type key. There are three potential values that we can specify for the type of the optimization: gating, skipping and spatial-skipping. gating and skipping are applied temporally, i.e., storage accesses are reduced, but the required number of spatial instances stay the same as dense. Whereas spatial-skipping reduces the required spatial instances. For example, when there are only two nonzeros out of a block of four values that are spatially sents to different PEs, applying spatial-skipping allows the analysis to realize only two PEs are required.

Please note that the current implementation do not support applying skipping/gating and spatial-skipping at the same time.

action_optimization:  
    type: gating          
    ...

At a high level, each action optimization is specified as a leader-follower intersection, whcih checks one operand, and if this operand is zero, it avoids accessing the other operand. We call the checked operand the leader and the operand with gated access the follower.

To specify the leader-follow intersection, we need to specify a target key and a condition-on key to indicate exactly how the optimization has happened. More specifically, we use two special keys:

  • the target key: the dataspace whose accesses will be optimized away, i.e., the follower in a leader follower intersection.
  • the condition-on key: takes a list as its value, which specifies what dataspaces we should be checking for zeros when performing the optimization, i.e., the leaders in a leader follower intersection.

The example below essentially specifies " at the Buffer level, if A has a zero value, we gate the read access to B".

- !Component
  name: buffer
  sparse_optimizations:
    action_optimization:   
        - type: gating           
          target: B                
          condition-on: [ A ]  
    ...
  ...

To specify a double-sided intersection, e.g., accesses to A and B are gated based on the values from the other tensor, we can simply specify two complementary leader-follower intersection like below. Note how the role of leader and follower is switched in the second item in the list.

- !Component
  name: buffer
  sparse_optimizations:
    action_optimization:   
        - type: gating           
          target: B                
          condition-on: [ A ]  
        - type: gating           
          target: A                
          condition-on: [ B ]  
    ...
  ...

The representation_format key

The representation format specification is based on the fibertree concpet, which is a format agnostic way to represent a sparse tensor. In the fibertree notation, each dimension of the tensor is referred to as a rank. Each rank should be assigned a representation format, More specifically, each rank could be assigned the following format:

  • Coordinate Payload (CP): the coordinates of each nonzero value are encoded with multiple bits. The payloads are either the nonzero value or a pointer to another dimension. CP explicitly lists the coordinates and the corresponding payloads.
  • Bitmask (B): a single bit is used to encode whether each coordinate is nonzero or not.
  • Run Length Encoding (RLE): multiple bits are used to encode the run length, which represents the number of zeros between nonzeros (eg, an r-bit run length can encode up to a 2^r-1 run of zeros).
  • Uncompressed Offset Pairs (UOP): multiple bits are used to encode the start (inclusive) and end (noninclusive) positions of nonzero values.

Please refer to Sparseloop paper Section III.A for more detailed explainantions for how to use these per-rank format to represent more sophisticated formats.

To specify a reprentation format, we use the data_spaces key to first specify a list. Each element in the list specifies the format for specific tensor, which is identified using

  • the name key: component that the SAF is applied to.
  • the ranks key: specify a list of formats assigned to each rank in the fiber-treebased representation.
!Component
  name: buffer
  sparse_optimizations:
    representation_format:   
      data_spaces:
        - name: A            
          ranks:                
            - format: CP   
            ...
  ...

By default, the number of bits used for each rank's metadata is the same as the hardware metadata storage word-width. However, the users do have the option to specufy the metadata bitwidth in the sparse-optimiation specification like below. The specific example specifies each coodinate meta is 4 bits.

sparse_optimizations:
  targets:
    - name: Buffer                
      representation_format:  
        data_spaces:
          - name: A            
            ranks:                
              - format: CP   
                metadata_word_bits: 4
                ...

In our analysis, we require the tensors to be pre-tiled for compressed processing, i.e., the hardware should support metadata traversal for non-trivial for-loop in the mapping. As a result, the representation format can have more ranks than the number of dimensions in the tensor, as each dimension can be tiled multiple times to fit in multiple storage levels. In a pre-tiled tensor, stroage levels that are closer to the root of the storage hierarchy must have more (or equal) number of ranks compared to its child levels.

For example, for the mapping below, tensor B (indexed by K and N, or, a K-by-N tensor) is tiled along the N dimension at the BackStorage with the factor N1.

 ----- BackingStorage -----
 for n1 = [0, N1)

 ----- Buffer -----
      for k in [0, K)
         for n0 in [0, N0=N/N1)

As a result, the representation format for tensor B in the BackingStorage needs to support three ranks, one for N1, one for K, and one for N0. Whereas the Buffer only needs to support two ranks for B, K and N0.

In the below example, we show sparse_optimizations as a separate file.

sparse_optimizations:
  version: 0.4
  targets:
    - name: BackingStorage
      representation_format:
        data_spaces:
          - name: B          # pretiling is required, so B's is represented with 3 ranks
            ranks:
              - format: UOP
              - format: UOP
              - format: CP

    - name: Buffer
      representation_format:
        data_spaces:
          - name: B
            ranks:          # each tile of B requires two ranks only
              - format: UOP
              - format: CP
          ...

A Complete Example

Here is an example:

sparse_optimizations:
  targets:
    - name: BackingStorage
      representation-format:
        data-spaces:
          - name: A
            ranks:
              - format: UOP
              - format: CP
          - name: B
            ranks:
              - format: UOP
              - format: CP

    - name: Buffer
      representation-format:
        data-spaces:
          - name: B
            ranks:
              - format: UOP # uncompressed offset pair
              - format: CP  # coordinate payload
          - name: A
            ranks:
              - format: UOP # uncompressed offset pair
              - format: CP  # coordinate payload

      action-optimization:
        - type: skipping
          options:
            - target: A
              condition-on: [ B ]
        - type: skipping
          options:
            - target: B
              condition-on: [ A ]
        - type: skipping
          options:
            - target: Z
              condition-on: [ A, B ]