Mapping

A mapping describes how the computation and tensors in a workload are partitioned and scheduled across the hardware substrate. A mapping specification for Timeloop is effectively a YAML-based representation of an annotated loop nest. However, instead of writing out the full nest, the mapping in YAML is described as a list of directives, with each directive describing some aspect of the sub-nest mapped to a particular level in the hardware. The following keys apply to each item (i.e., directive) in the mapping.

The target: key

This key refers to the hardware level being targeted by a mapping directive. This hardware level must have been instantiated in the Architecture specification. Spatial sub-nests must target the parent hardware level. Example:

mapping:
  - target: GlobalBuffer
    ...

  - target: PE
    ...

The type: key

This key identifies whether we are describing a temporal, spatial, or bypass directive.

The permutation: key

This key is only applicable to type: temporal and type: spatial directives. It is used to describe the permutation (i.e., nesting) of loops in the sub-nest at the target level. Timeloop requires each level's temporal and spatial sub-nests to have exactly 1 loop from each original problem dimension. The permutation is described as a string of problem dimensions representing the loops in little-endian (i.e., inner-to-outer) order. For example, the directive:

    permutation: RPK

results in the following loop nest:

for k in [0:K)
  for p in [0:P)
    for r in [0:R)

Note that this is not the complete loop nest in the mapping; it is only the sub-nest at the hardware level being targeted by this directive.

The factors: key

This key is only applicable to type: temporal and type: spatial directives. It is used to set the bounds of each loop in the sub-nest at that level. As an example, stating:

    factors: R=3

results in a loop for r in [0:3). As another example, stating:

   factors: R=3 P=16

results in two loops: for r in [0:3) and for p in [0:16), but says nothing about how those loops must be ordered (i.e., nested) with respect to each other. The ordering is specified by the permutation key.

Timeloop requires each level's temporal and spatial sub-nests to have exactly 1 loop from each original problem dimension. For example, for a convolutional network layer with the 7 dimensions C, K, R, S, P, Q, N, the factors directive can state:

   factors: C=16 K=4 R=3 S=1 P=16 Q=1 N=1

Observe that some of the factors are set to 1. This could be because the original workload dimension is 1, or that the specific tiling being expressed does not expand the tile along that dimension at this level. Effectively, this loop does not exist in the mapping. These unit-factor loops are dropped during analysis, but we sometimes explicitly write them out in hand-written mappings for clarity (and to prevent bugs). Leaving a factor un-specified causes Timeloop to automatically assume it is a unit-factor.

Also observe that each factor (and permutation) descriptor uses the dimension name (e.g., C) without the numerical suffix denoting the tiling level (e.g., C2) that is traditionally used when describing mappings as loop nests. This is because the tiling level is explicitly called out using the target: key, and adding a numerical suffix is (a) unnecessary and (b) error-prone (for example, if an inner hardware level is added in the architecture spec, all numerical suffixes will have to be re-computed).

For each problem dimension, the product of factors across all levels (spatial and temporal) must equal the full workload's problem dimension size. In other words, this represents perfect (or remainderless) factorization.

In order to be legal, the mapping must also fit into the available resources on the hardware. This includes buffer capacities and spatial instances at each hardware level. Timeloop performs this check during evaluation and will throw an error if a mapping exceeds the available hardware resources. The error message will point out which hardware level caused the violation.

Imperfect factorization

Timeloop also allows for imperfect factorization. This allows each loop to have a residual bound. The residual bound is triggered if and only if all ancestor loops in the entire mapping (including loops at outer hardware levels) are in their final iteration. An imperfectly factorized loop is described by using a comma to separate the regular and residual bound. For example, the directive:

   factors: R=3 P=16,5

states that the P loop will run as for p = [0:16) in all situations, except when all ancestor P loops are in their final iteration, in which case this loop will run as for p = [0:5).

Note that the directive factors: P=16,16 is identical to factors: P=16, i.e., the residual bound is the same as the regular bound.

The split: key

This key only applies to type: spatial directives. A particular hardware level may have a fanout along two dimensions, X and Y. The split directive assigns loops at that sub-nest to one of these dimensions. As Timeloop walks through the permutation string (inner-to-outer), it assigns loops to the hardware X-dimension. The split directive is an unsigned integer that describes the point at which it must switch over to the hardware Y-dimension. For example:

    permutation: RPK

Given this permutation, here are all the legal split values along with the resultant assignments to hardware-X and hardware-Y:

  • split: 0 => RPK mapped to hardware-Y.
  • split: 1 => R mapped to hardware-X, PK mapped to hardware-Y.
  • split: 2 => RP mapped to hardware-X, K mapped to hardware-Y.
  • split: 3 => RPK mapped to hardware-X.

If left unspecified, the default split is the rank of the problem (3 in the above example), resulting in all loops being mapped to hardware-X.

The bypass: key

This key only applies to the type: bypass directive. It specifies which tensors are bypassed (i.e., not stored) at that level. The tensor names must have been defined in the problem specification. For example:

    bypass: [ Weights, Inputs ]

The keep: key

This key only applies to the type: bypass directive. It specifies which tensors are kept (i.e., stored) at that level. The tensor names must have been defined in the problem specification. For example:

    keep: [ Outputs ]

By default Timeloop assumes that each tensor is stored at each level, unless explicitly bypassed via a bypass: directive. Therefore, the keep: directive is completely optional, but we often include it in hand-written mappings for clarity.

Full Example

Here is a complete example of a mapping:

mapping:
  - target: MainMemory
    type: temporal
    factors: R=1 P=1 K=1 C=1
    permutation: PRKC
    
  - target: GlobalBuffer
    type: temporal
    factors: R=3 P=1 K=32 C=2
    permutation: PRKC

  - target: GlobalBuffer
    type: spatial
    factors: R=1 P=1 K=1 C=16
    permutation: PRKC
    
  - target: RegisterFile
    type: temporal
    factors: R=1 P=16 K=1 C=1
    permutation: RPKC