The optional architecture_constraints:
top-level key specifies constraints that limit the set of mappings allowed by the hardware.
By default, Timeloop assumes that the hardware specified by a given organization is completely un-constrained in terms of the mappings it supports. In other words, the hardware supports all factorizations of the problem dimension and all loop orderings at each spatial and temporal dimension (as long as the tiles fit within the buffers and spatial instance counts).
In reality, most architectures only support a limited subset of these mappings, e.g., due to limitations in their state machines or interconnection networks. These are specified via architecture constraints. The language of constraints is very similar to the language used to describe a mapping. Therefore, we strongly encourage users to develop a thorough understanding of mapping specification before reading the remainder of this page.
Any factor or permutation described in the constraints are "baked-in" in the sense that the timeloop-mapper
is not allowed to explore alternatives for those specific things, and timeloop-model
validates that any user-specified mapping does not violate any of the constraints.
In addition to to constraints imposed because the hardware does not support a particular mapping, one can specify constraints that are used to avoid a mapspace search into mappings that are known (or expected) to be sub-optimal. These constraints are specified with the mapspace_constraints:
top-level key. The format of the information under this key is identical to the architecture_constraints:
key and the constraints imposed on a particular element of the design are the "union" of the constraints specified by the two keys.
The architecture_constraints:
and mapspace_constraints:
keys contain a version and a list of records of targets (subtree names and their associated constraints.
version:
keyEach top-level YAML key used as a Timeloop/Accelergy input has a version. The current version for the mapspace_constraints:
is TBD, which is specified as follows:
architecture:
version: TBD
...
Note: The version key is currently ignored...
Each constraint record has two mandatory keys:
target
- the name of a subtree or local element in the architecturetype
- the type of constraint, one of: bypass, temporal, or spatialEach type of constraint has additional type-specific keys described below.
Bypass type constraints describe the bypassing (or storage) of particular operands at a local:
storage element. They have two additional keys:
bypass:
- a list of operand names that bypass this element keep:
- a list of operand names that are kept in this elementFor example to specify that the Inputs
and Weights
bypass the psum_spad
, but the Outputs
are held there one would write:
...
- target: psum_spad
type: bypass
bypass: [Inputs, Weights]
keep: [Outputs]
...
Note: that by specifying different bypass constraints for different storage elements in the same local:
block one can describe partitioned storage.
Temporal type constraints describe the permutations and factorizations of particular operands ranks allowed at a particular storage element. They have two additional keys:
permutation:
(optional) - a string with an ordered sequence of rank names specifying a loop nest order (outermost loop last) factors:
(optional) - a string with rank names specifying arithmetic conditions on their loop limits at this levelFor example for an operand with ranks H
and W
to specify that H
is the outermost for loop in the loop nest one could specify:
- target: psum_spad
type: temporal
permutation: WH
Thus the mapping under this constraint would look like:
for h in [0, H):
for w in [0, W):
...
TBD: Explain more complicated constraints...
To specify that the loop limit on rank H
is 8 and that W
must be less than 4 one could write:
- target: psum_spad
type: temporal
factors: H=8 W<4
Spatial type constraints describe the ... They have three additional keys:
split:
- Given a permutation, indices before the split index are mapped to the spatial X dimension, and indices after the split index are mapped to the spatial Y dimension. For example, if the permutation is ABCDEF and the split index is 2, then A and B are mapped to spatial X and CDEF are mapped to spatial Y.permutation:
(optional) - a string with an ordered sequence of rank names specifying a loop nest order (outermost loop last) factors:
(optional) - a string with rank names specifying arithmetic conditions on their loop limits at this levelno_link_transfer:
(optional) - a list of operand names that cannot be link-transfered at this level. Used when an architecture does not support specific link transfer movements, e.g. no network exists for peer-peer communication of partial sums. By default, all datatypes can be link transfered.no_multicast_no_reduction:
(optional) - a list of operand names that cannot be multicast or reduced at this level. Used when an architecture does not support multicast/reduction movements, e.g. no network exists for multicast of inputs. By default, all datatypes can be multicast/reduced.TBD
Here is an example:
#
# The following constraints are limitations of the hardware architecture and dataflow
#
architecture_constraints:
targets:
# certain buffer only stores certain datatypes
- target: psum_spad
type: bypass
bypass: [Inputs, Weights]
keep: [Outputs]
- target: weights_spad
type: bypass
bypass: [Inputs, Outputs]
keep: [Weights]
- target: ifmap_spad
type: bypass
bypass: [Weights, Outputs]
keep: [Inputs]
- target: DummyBuffer
type: bypass
bypass: [Inputs, Outputs, Weights]
- target: shared_glb
type: bypass
bypass: [Weights]
keep: [Inputs, Outputs]
- target: DummyBuffer
type: spatial
split: 4
permutation: NPQR SCM
factors: N=1 P=1 Q=1 R=1 S=0
# only allow fanout of M, Q out from glb
- target: shared_glb
type: spatial
split: 7
permutation: NCPRSQM
factors: N=1 C=1 P=1 R=1 S=1
# one ofmap position but of different output channels
- target: psum_spad
type: temporal
permutation: NCPQRS M
factors: N=1 C=1 R=1 S=1 P=1 Q=1
# row stationary -> 1 row at a time
- target: weights_spad
type: temporal
permutation: NMPQS CR
factors: N=1 M=1 P=1 Q=1 S=1 R=0
- target: ifmap_spad
type: temporal
permutation: NMCPQRS
factors: N=1 M=1 C=1 P=1 Q=1 R=1 S=1
# enforce the hardware limit of the bypassing everything
- target: DummyBuffer
type: temporal
factors: N=1 M=1 C=1 P=1 Q=1 R=1 S=1