Constraints

The optional architecture_constraints: top-level key specifies constraints that limit the set of mappings allowed by the hardware.

By default, Timeloop assumes that the hardware specified by a given organization is completely un-constrained in terms of the mappings it supports. In other words, the hardware supports all factorizations of the problem dimension and all loop orderings at each spatial and temporal dimension (as long as the tiles fit within the buffers and spatial instance counts).

In reality, most architectures only support a limited subset of these mappings, e.g., due to limitations in their state machines or interconnection networks. These are specified via architecture constraints. The language of constraints is very similar to the language used to describe a mapping. Therefore, we strongly encourage users to develop a thorough understanding of mapping specification before reading the remainder of this page.

Any factor or permutation described in the constraints are "baked-in" in the sense that the timeloop-mapper is not allowed to explore alternatives for those specific things, and timeloop-model validates that any user-specified mapping does not violate any of the constraints.

In addition to to constraints imposed because the hardware does not support a particular mapping, one can specify constraints that are used to avoid a mapspace search into mappings that are known (or expected) to be sub-optimal. These constraints are specified with the mapspace_constraints: top-level key. The format of the information under this key is identical to the architecture_constraints: key and the constraints imposed on a particular element of the design are the "union" of the constraints specified by the two keys.

The architecture_constraints: and mapspace_constraints: keys contain a version and a list of records of targets (subtree names and their associated constraints.

The version: key

Each top-level YAML key used as a Timeloop/Accelergy input has a version. The current version for the mapspace_constraints: is TBD, which is specified as follows:

architecture:
  version: TBD
  ...

Note: The version key is currently ignored...

Target Constraints

Each constraint record has two mandatory keys:

  • target - the name of a subtree or local element in the architecture
  • type - the type of constraint, one of: bypass, temporal, or spatial

Each type of constraint has additional type-specific keys described below.

Bypass Constraints

Bypass type constraints describe the bypassing (or storage) of particular operands at a local: storage element. They have two additional keys:

  • bypass: - a list of operand names that bypass this element
  • keep: - a list of operand names that are kept in this element

For example to specify that the Inputs and Weights bypass the psum_spad, but the Outputs are held there one would write:

...
   - target: psum_spad
     type: bypass
     bypass: [Inputs, Weights]
     keep: [Outputs]
...

Note: that by specifying different bypass constraints for different storage elements in the same local: block one can describe partitioned storage.

Temporal Constraints

Temporal type constraints describe the permutations and factorizations of particular operands ranks allowed at a particular storage element. They have two additional keys:

  • permutation: (optional) - a string with an ordered sequence of rank names specifying a loop nest order (outermost loop last)
  • factors: (optional) - a string with rank names specifying arithmetic conditions on their loop limits at this level

For example for an operand with ranks H and W to specify that H is the outermost for loop in the loop nest one could specify:

  - target: psum_spad
    type: temporal
    permutation: WH

Thus the mapping under this constraint would look like:

for h in [0, H):
   for w in [0, W):
      ...

TBD: Explain more complicated constraints...

To specify that the loop limit on rank H is 8 and that W must be less than 4 one could write:

  - target: psum_spad
    type: temporal
    factors: H=8 W<4

Spatial Constraints

Spatial type constraints describe the ... They have three additional keys:

  • split: - Given a permutation, indices before the split index are mapped to the spatial X dimension, and indices after the split index are mapped to the spatial Y dimension. For example, if the permutation is ABCDEF and the split index is 2, then A and B are mapped to spatial X and CDEF are mapped to spatial Y.
  • permutation: (optional) - a string with an ordered sequence of rank names specifying a loop nest order (outermost loop last)
  • factors: (optional) - a string with rank names specifying arithmetic conditions on their loop limits at this level
  • no_link_transfer: (optional) - a list of operand names that cannot be link-transfered at this level. Used when an architecture does not support specific link transfer movements, e.g. no network exists for peer-peer communication of partial sums. By default, all datatypes can be link transfered.
  • no_multicast_no_reduction: (optional) - a list of operand names that cannot be multicast or reduced at this level. Used when an architecture does not support multicast/reduction movements, e.g. no network exists for multicast of inputs. By default, all datatypes can be multicast/reduced.
  • TBD

A Complete Example

Here is an example:

#
# The following constraints are limitations of the hardware architecture and dataflow
#

architecture_constraints:
  targets:
  # certain buffer only stores certain datatypes
  - target: psum_spad
    type: bypass
    bypass: [Inputs, Weights]
    keep: [Outputs]
  - target: weights_spad
    type: bypass
    bypass: [Inputs, Outputs]
    keep: [Weights]
  - target: ifmap_spad
    type: bypass
    bypass: [Weights, Outputs]
    keep: [Inputs]
  - target: DummyBuffer
    type: bypass
    bypass: [Inputs, Outputs, Weights]
  - target: shared_glb
    type: bypass
    bypass: [Weights]
    keep: [Inputs, Outputs]
  - target: DummyBuffer
    type: spatial
    split: 4
    permutation: NPQR SCM
    factors: N=1 P=1 Q=1 R=1 S=0
  # only allow fanout of M, Q out from glb
  - target: shared_glb
    type: spatial
    split: 7
    permutation: NCPRSQM
    factors: N=1 C=1 P=1 R=1 S=1
  # one ofmap position but of different output channels
  - target: psum_spad
    type: temporal
    permutation: NCPQRS M
    factors: N=1 C=1 R=1 S=1 P=1 Q=1
  # row stationary -> 1 row at a time
  - target: weights_spad
    type: temporal
    permutation: NMPQS CR
    factors: N=1 M=1 P=1 Q=1 S=1 R=0
  - target: ifmap_spad
    type: temporal
    permutation: NMCPQRS
    factors: N=1 M=1 C=1 P=1 Q=1 R=1 S=1
  # enforce the hardware limit of the bypassing everything
  - target: DummyBuffer
    type: temporal
    factors: N=1 M=1 C=1 P=1 Q=1 R=1 S=1