A mapping describes how the computation and tensors in a workload are partitioned and scheduled across the hardware substrate. A mapping specification for Timeloop is effectively a YAML-based representation of an annotated loop nest. However, instead of writing out the full nest, the mapping in YAML is described as a list of directives, with each directive describing some aspect of the sub-nest mapped to a particular level in the hardware. The following keys apply to each item (i.e., directive) in the mapping.
target:
keyThis key refers to the hardware level being targeted by a mapping directive. This hardware level must have been instantiated in the Architecture specification. Spatial sub-nests must target the parent hardware level. Example:
mapping:
- target: GlobalBuffer
...
- target: PE
...
type:
keyThis key identifies whether we are describing a temporal
, spatial
, or bypass
directive.
permutation:
keyThis key is only applicable to type: temporal
and type: spatial
directives. It is used to describe the permutation (i.e., nesting) of loops in the sub-nest at the target level. Timeloop requires each level's temporal and spatial sub-nests to have exactly 1 loop from each original problem dimension. The permutation is described as a string of problem dimensions representing the loops in little-endian (i.e., inner-to-outer) order. For example, the directive:
permutation: RPK
results in the following loop nest:
for k in [0:K)
for p in [0:P)
for r in [0:R)
Note that this is not the complete loop nest in the mapping; it is only the sub-nest at the hardware level being targeted by this directive.
factors:
keyThis key is only applicable to type: temporal
and type: spatial
directives. It is used to set the bounds of each loop in the sub-nest at that level. As an example, stating:
factors: R=3
results in a loop for r in [0:3)
. As another example, stating:
factors: R=3 P=16
results in two loops: for r in [0:3)
and for p in [0:16)
, but says nothing about how those loops must be ordered (i.e., nested) with respect to each other. The ordering is specified by the permutation
key.
Timeloop requires each level's temporal and spatial sub-nests to have exactly 1 loop from each original problem dimension. For example, for a convolutional network layer with the 7 dimensions C
, K
, R
, S
, P
, Q
, N
, the factors
directive can state:
factors: C=16 K=4 R=3 S=1 P=16 Q=1 N=1
Observe that some of the factors are set to 1
. This could be because the original workload dimension is 1
, or that the specific tiling being expressed does not expand the tile along that dimension at this level. Effectively, this loop does not exist in the mapping. These unit-factor loops are dropped during analysis, but we sometimes explicitly write them out in hand-written mappings for clarity (and to prevent bugs). Leaving a factor un-specified causes Timeloop to automatically assume it is a unit-factor.
Also observe that each factor (and permutation) descriptor uses the dimension name (e.g., C
) without the numerical suffix denoting the tiling level (e.g., C2
) that is traditionally used when describing mappings as loop nests. This is because the tiling level is explicitly called out using the target:
key, and adding a numerical suffix is (a) unnecessary and (b) error-prone (for example, if an inner hardware level is added in the architecture spec, all numerical suffixes will have to be re-computed).
For each problem dimension, the product of factors across all levels (spatial and temporal) must equal the full workload's problem dimension size. In other words, this represents perfect (or remainderless) factorization.
In order to be legal, the mapping must also fit into the available resources on the hardware. This includes buffer capacities and spatial instances at each hardware level. Timeloop performs this check during evaluation and will throw an error if a mapping exceeds the available hardware resources. The error message will point out which hardware level caused the violation.
Timeloop also allows for imperfect factorization. This allows each loop to have a residual bound. The residual bound is triggered if and only if all ancestor loops in the entire mapping (including loops at outer hardware levels) are in their final iteration. An imperfectly factorized loop is described by using a comma to separate the regular and residual bound. For example, the directive:
factors: R=3 P=16,5
states that the P
loop will run as for p = [0:16)
in all situations, except when all ancestor P
loops are in their final iteration, in which case this loop will run as for p = [0:5)
.
Note that the directive factors: P=16,16
is identical to factors: P=16
, i.e., the residual bound is the same as the regular bound.
split:
keyThis key only applies to type: spatial
directives. A particular hardware level may have a fanout along two dimensions, X
and Y
. The split
directive assigns loops at that sub-nest to one of these dimensions. As Timeloop walks through the permutation string (inner-to-outer), it assigns loops to the hardware X-dimension. The split
directive is an unsigned integer that describes the point at which it must switch over to the hardware Y-dimension. For example:
permutation: RPK
Given this permutation, here are all the legal split
values along with the resultant assignments to hardware-X and hardware-Y:
split: 0
=> RPK
mapped to hardware-Y.split: 1
=> R
mapped to hardware-X, PK
mapped to hardware-Y.split: 2
=> RP
mapped to hardware-X, K
mapped to hardware-Y.split: 3
=> RPK
mapped to hardware-X.If left unspecified, the default split
is the rank of the problem (3 in the above example), resulting in all loops being mapped to hardware-X.
bypass:
keyThis key only applies to the type: bypass
directive. It specifies which tensors are bypassed (i.e., not stored) at that level. The tensor names must have been defined in the problem specification. For example:
bypass: [ Weights, Inputs ]
keep:
keyThis key only applies to the type: bypass
directive. It specifies which tensors are kept (i.e., stored) at that level. The tensor names must have been defined in the problem specification. For example:
keep: [ Outputs ]
By default Timeloop assumes that each tensor is stored at each level, unless explicitly bypassed via a bypass:
directive. Therefore, the keep:
directive is completely optional, but we often include it in hand-written mappings for clarity.
Here is a complete example of a mapping:
mapping:
- target: MainMemory
type: temporal
factors: R=1 P=1 K=1 C=1
permutation: PRKC
- target: GlobalBuffer
type: temporal
factors: R=3 P=1 K=32 C=2
permutation: PRKC
- target: GlobalBuffer
type: spatial
factors: R=1 P=1 K=1 C=16
permutation: PRKC
- target: RegisterFile
type: temporal
factors: R=1 P=16 K=1 C=1
permutation: RPKC