Regent is implemented as a Terra language extension. Every Regent source file must therefore start with:
This loads the Regent compiler and enables hooks to start Regent on certain keywords (task
and fspace
).
The top-level of a Regent source file executes in a Lua/Terra context, and Lua, Terra, and Regent constructs can be freely mixed at this level. For example:
Most Terra language features can also be used in Regent tasks, and compilation of Regent programs proceeds similarly to Terra. For example, Lua variables referenced in Regent tasks are specialized prior to type checking, and are effectively constant from the perspective of Regent.
The following Terra features are not supported in Regent:
o:f()
does not automatically dereference Regent’s ptr
type.In general, use Terra’s raw pointer types (&T
) with caution. Regent may execute tasks in a distributed environment, so a pointer created in one task might not be valid in another. As long as pointers stay within a task, it is ok to use raw pointers (and traditional C APIs like malloc
and free
). The same principle applies to process-local data such as file descriptors: they are ok to use within a task but should not be passed between tasks.
Tasks are the fundamental unit of execution in Regent. Tasks are similar to functions in most other programming languages: tasks take arguments and (optionally) return a value, and contain a body of statements which execute top-to-bottom. Unlike traditional functions, tasks must explicitly specify any interactions with the calling context through privileges, coherence modes, and constraints.
Regent programs execute in a top-level Lua/Terra context, but Regent tasks cannot be called from Lua/Terra. Instead, a Regent program may begin execution of tasks by calling regentlib.start
with a task argument. This task becomes the top-level task in the Regent program, and may call other tasks as desired.
The call does not return, and is typically placed at the end of a Regent source file. At this time, the runtime is not reentrant, so even if the call did return, it would still not be possible to launch another top-level task.
Privileges describe how a task interacts with region-typed arguments. For example, reads
is required in order to read from a region argument, and writes
is required to modify a region. Reductions allow the application of certain commutative operators to regions. Note that privileges in general apply only to region-typed, and not immediate arguments passed by-value (such as int
, float
, and ptr
data types).
Privileges are most frequently seen in the where
clause of a task.
Coherence modes specify a task’s expectations of isolation with respect to sibling tasks on the marked regions. Regent supports four coherence modes:
The modes behave as follows:
exclusive
mode (the default) guarantees that tasks will execute in a manner that preserves the original sequential semantics of the code.
atomic
mode allows tasks to be reordered in a manner that preserves serializability, similar to a transaction-based system. Atomicity is provided at the level of a task.
simultaneous
mode allows tasks to run concurrently as long as they use the same physical instance for all simultaneous regions. This guarantees that the regions in question behave with shared memory semantics, similar to pthreads, etc.
relaxed
mode allows marked tasks to run concurrently with no restrictions.
Coherence modes are most frequently seen in the where
clause of a task.
Constraints specify the desired relationships between regions. Constraints are checked at compile time and must be satisfied by the caller. The supported constraints are disjointness (*
) and subregion (<=
).
Constraints are most frequently seen in the where
clause of a task or field space.
Copy operations copy the contents of one region to another (for all or some subset of fields). The number and types of fields so named must match.
The following restrictions apply to the source S
and destination D
regions in a copy operations. Note that the restrictions depend on which branch of the Legion runtime is being used.
On master
branch: D <= S
, that is, D
must be a subregion of S
.
On nopaint
branch (and branches derived from nopaint
such as control_replication
): D
must contain a subset of the elements in S
, but is not otherwise required to be related. That is, it must be valid to write the following code:
Fill operations replace the contents of a region (for all or some subset of fields) with a single specified value. The type of the value must match the named fields.
The attach and detach operations connect a region with an external resource, like a file on disk. Attaching a region overwrites the contents of the region and replaces it with the contents of the external resource. Note that for external resources such as files, attaching a region does NOT copy the contents of the file from disk into memory. Instead the region should be thought of as a view onto the contents of the on-disk file. Such a region is said to be restricted and must be acquired before it can be used by a task.
For example, using an external HDF5 file:
The detach operation is used to disassociate the region from an attached resource once it is no longer being used. The contents of the region are considered to be uninitialized following a detach operation, and region is no longer restricted.
See below for detailed instructions on using file I/O with HDF5.
Important: attach and detach currently require that you not use the region within the task where you attach or detach. If you attempt to do so, the behavior is undefined. One way to guarantee that you do this correctly is to require the task to be inner, though this is a more strict condition than is required for correctness.
A region which is restricted (e.g. due to an attach operation) cannot be copied, and therefore cannot be directly accessed by a task if the original contents are e.g. on disk. The acquire operation is used to indicate that it is safe to make a copy of the region (e.g. into memory) so that can be directly accessed by a task. After using acquire, the region is no longer considered restricted.
The release operation guarantees that any copies of a region made following an acquire operation are flushed back to their original location (e.g. disk).
Note that if the original contents of the region are on disk, any concurrent writes (e.g. by other processes running on the machine) to the file on disk may or may not be seen by tasks. In order to safely perform concurrent writes to the file, the region must be released prior to any external writes being made, and only re-acquired after the writes are complete. Similarly, any external process which reads the file must wait until after the region is released. The user is responsible for ensuring that the correct synchronization is used with any external processes that perform concurrent access to the file.
Regent supports file I/O via the HDF5 file format. Support for HDF5 can be enabled by passing the --hdf5
argument to install.py
or setting the environment variable USE_HDF5=1
. Note that a serial build of HDF5 is required, as parallel support in HDF5 depends on MPI.
Currently, Regent does not support creating HDF5 files directly. HDF5 files can be created either prior to running the Regent program, or can be created by calling the HDF5 C API directly from inside Regent. For an example of creating an HDF5 file in Regent, see this test program.
To read or write an existing HDF5 file, the attach operation is used to connect the region to the contents of the external file. Using attach effectively overwrites the region, and any existing contents will be lost.
Following an attach operation, the region should be thought of as a view onto the data stored on disk. Note that the contents of the file are NOT automatically copied from disk into memory. The acquire operation is subsequently used to permit the contents of the region to be copied into memory. In the example below, the copy will be issued prior to executing some_task
.
The value regentlib.file_read_only
can be used with attach if the file is to be read and not written.
The release and detach operations reverse the actions performed by acquire and attach, respectively. For more information on the semantics of these operations, see the documentation on attach and detach and acquire and release above.
More examples of using HDF5 file I/O can be found in the test suite.
Field spaces are sets of fields, and behave similarly to Terra structs. For example, field spaces may be instantiated by casting an anonymous struct to the appropriate type.
Field spaces differ from structs in that they may take region-typed arguments. Such arguments are useful for declaring recursive data types. References to field spaces with arguments must be escaped.
In the presence of partitions, it can be difficult to choose right region to use as an argument to a field space. In these cases, it can be helpful to use the wild
operator (which matches any region) in the declaration of the field space. Note that this currently exposes unsoundness in the type system; the user is responsible for making sure that the right regions are used when the field space is actually instantiated. (For those interested in the type theory behind this, see the DPL paper.)
For example, a quad-tree implementation might feature the following field space declaration:
Index types define points in an N dimensional space, and are used to define the elements of index spaces. Regent supports 10 built-in index types: ptr
, int1d
, int2d
, int3d
, and so on up to int9d
. Note that int4d
and above require additional flags at compile-time and runtime (see below).
The fields of the build-in index types are called x
, y
, z
, w
, v
, u
, t
, s
, and r
, respectively (up to the maximum number of dimensions of the particular index type).
When using int4d
or above (or custom index types with 4 or more dimensions), Regent must be compiled and run with additional flags as shown below.
Custom index types can be defined in Regent with any number of dimensions by defining a struct and passing it to the function index_type
.
Note when creating custom index types with 4 or more dimensions, it is important to ensure that Regent is compiled and run with the necessary flags, shown above.
Index spaces are sets of indices, used most frequently to define the set of keys in a region. Index spaces may be unstructured (i.e. indices are opaque pointers), or structured (i.e. indices are N-dimensional points with an implied geometric relationship). Index spaces of either type are created with a size (this is an N-dimensional point for structured index spaces) and optional offset.
Currently this is only possible for structured index spaces:
Regions are the cross-product between an index space and a field space. The name of the region exists in the scope of the declaration, so recursive data types may refer to the region being defined.
Partitions subdivide regions into subregions, in order to more precisely specify the data used by tasks and to enable parallelism. Partitions in Regent may be:
disjoint
or aliased
.
disjoint
subregions are non-overlapping, and therefore can be safely modified in parallel.aliased
subregions are permitted to overlap, but can only be used in parallel with reads
or reduces
privileges.disjoint
IFF all of its subregions are mutually disjoint.complete
or incomplete
.
complete
partitions cover their parent region (i.e., the union of the all the subregions is equal to the parent region). This is useful for several optimizations, though it does not impact parallelism.incomplete
partitions do not cover their parent region.Subregions should be thought of as views onto the original region. They do not contain their own data but instead reference the data contained by the parent region.
A given region can be partitioned multiple times, and the subregions can be partitioned recursively into finer regions.
The subregions of a partition are identified by points in a special index space called a color space. Subregions can be retrieved by their color within the partition.
Regent provides a very expressive sub-language of partition operators for creating partitions, described in more detail below.
To pass a partition to a task, declare the region which is the parent of the partition first, so that the region variable can be used to define the partition type:
Note that r
is referred to in the type of p
so that the compiler can determine what region p
is a partition of.
Produces roughly equal subregions, one for each color in the supplied color space. The resulting partition is guaranteed to be disjoint. If the size of the color space is evenly divisible by the requested number of subregions then they will be of equal size and contiguous—otherwise the exact way in which the remaining elements are partitioned is unspecified.
Equal partitons are always disjoint and complete by construction, so there is not need to specify this explicitly.
Partitions a region based on a coloring stored in a field of the region. The resulting partition is guaranteed to be disjoint.
Partitions by field are always disjoint by construction, so this need not be specified. They are NOT always complete, as the color space may not cover all of the values of the color field (and therefore some points may not be inclued in the final partition). If the user wishes to specify that the partition is complete (i.e., that all color field values are in the color space), then this can be accomplished with an optional complete
keyword:
Partitions a region by computing the image of each of the subregions of a partition through the supplied (pointer-typed) field of a region. The resulting partition is NOT guaranteed to be disjoint.
Disjointness and completeness can be specified via optional disjoint
/complete
keywords:
Partitions a region by computing the preimage of each of the subregions of a partition through the supplied (pointer-typed) field of a region. The resulting partition is guaranteed to be disjoint IF the supplied target partition is disjoint.
Disjointness and completeness can be specified via optional disjoint
/complete
keywords:
Computes the zipped union of the subregions in the supplied partitions. The resulting partition is NOT guaranteed to be disjoint.
This can be thought to be equivalent to setting each subregion p[i]
to be equal to lhs_partition[i] | rhs_partition[i]
.
Computes the zipped intersection of the subregions in the supplied partitions. The resulting partition is guaranteed to be disjoint IF either or both of the arguments are disjoint.
This can be thought to be equivalent to setting each subregion p[i]
to be equal to lhs_partition[i] & rhs_partition[i]
.
Computes the zipped difference of the subregions in the supplied partitions. The resulting partition is guaranteed to be disjoint IF the left-hand-side partition is disjoint.
This can be thought to be equivalent to setting each subregion p[i]
to be equal to lhs_partition[i] - rhs_partition[i]
.
A cross product is the cartesian product of two or more partitions of a single parent region, resulting in a tree of nested partitions defined by the intersections of the corresponding subregions.
In the general case of N
partitions p1
through pN
, a cross product can be created with:
Afterwards, the expression cp[i1][i2]...[iN]
returns the subregion p1[i1] & p2[i2] & ... & pN[iN]
, where &
is the intersection operator. Note that each index iK
must be valid for the corresponding partition pK
, and that empty sub-regions can be returned (if there are no elements in the resulting intersection).
For example, a cross product of two partitions might be defined as cp = cross_product(p, q)
. In this case, cp[i]
is a partition defined by the intersection of p[i]
with q
, and cp[i][j]
is p[i] & q[j]
.
Regent makes it easy to write optimized code for GPUs. Currently, Regent supports NVIDIA GPUs (via CUDA) and AMD GPUs (via HIP).
Note that Regent GPU code can be written even when no GPUs are available on the system. This is because the applicability of GPU code generation optimizations are checked even when Regent has not been built with GPU support. However, in order to actually generate GPU code, Regent must be built with the corresponding GPU toolchain enabled.
In order to enable GPU code generation for a task, mark it with __demand(__cuda)
:
(Note that despite the name, __demand(__cuda)
applies to both NVIDIA and AMD GPUs. In the future, we may change this name to something more generic.)
Within a GPU task, every top-level loop is automatically run on the GPU. For example:
This task will result in two GPU kernels being generated, one for each for
loop. The first loop reads region r
and performs a scalar reduction to the variable t
. The second loop reads t
and adds the value into the region s
. Note that because the the value of t
is used inside the task, the task will block on the execution of the first GPU kernel before running the second. In most cases, Regent avoids blocking on the execution of GPU kernels as much as possible to enable overlap of compute and communication.
In order to run this code on a GPU, run Regent with the flag -fgpu cuda
for CUDA or -fgpu hip
for HIP. If an NVIDIA GPU is available on the current node, the architecture will be auto-detected by default. Otherwise (or if on AMD GPUs), the architecture must be specified manually with the -fgpu-arch
flag or GPU_ARCH
environment variable. (E.g., ampere
for NVIDIA or gfx90a
for AMD.)
If a GPU is not available on the current node, run with -fgpu-offline 1
. In this mode, Regent can still perform GPU code generation to generate a GPU-enabled binary. The binary can then be run on a node with a working GPU.
To identify the GPU at runtime, the -ll:gpu
flag must be used to instruct Legion to use the GPU. There are additional flags to specify GPU memory and other properties.
The Regent compiler provides a wide variety of optimizations. Most of these optimizations are automatic and require no user intervention. Even though they’re automatic, it may be desirable to ensure that optimizations are occuring (that is, the user hasn’t written any code that prevents the optimization from being applied). Annotations allow users to specify where optimizations are expected in a Regent codebase, so that the compiler can ensure the code is being optimized as expected.
Annotations can be applied to tasks, statements, or expressions, and come in two basic flavors:
__demand
requests that the compiler throw an error if an optimization cannot be applied.__forbid
requires that the compiler not apply an optimization.Note that in contrast to pragmas in languages like C++, annotations cannot be used to force the compiler to optimize code when it is not safe to do so. Instead, the effect of the __demand
annotation is to force the compiler to issue an error if a given optimization cannot be applied. Thus, it is better to think of annotations as a defensive programming feature that allows the programmer to sanity check that the compiler is behaving as expected, rather than as a way to enable or force optimizations.
In some cases, annotations labeled as “experimental” may deviate from this behavior. These are described in a separate section below.
The __leaf
annotation indicates that a task will not call any subtasks, copies, fills, or create regions or partitions. In certain cases such tasks can be executed more efficiently.
The __inner
annotation indicates that a task will not directly access the contents of any regions. In certain cases such tasks can be executed more efficiently.
The __idempotent
annotation indicates that a task will not perform I/O or any other action with externally-visible side effects. (Writing to regions is ok.) Currently this annotation has no effect, but will be used to enable optimizations in the future.
The __replicable
annotation indicates that a task is control deterministic, i.e. that all tasks (and other operations) are issued with the same arguments. Idempotent tasks are replicable by definition, but a task need not be idempotent to be replicable. Currently this annotation has effect only in the control_replication
branch of Legion, where it enables dynamic control replication.
The __inline
annotation indicates that calls to the marked task must (or must not) be inlined into the caller, and will cause the compiler to issue an error if this is not possible.
The __index_launch
annotation on a for
loop indicates that the marked loop must be converted into an index launch, and will cause the compiler to issue an error if this is not possible. Index launches of tasks can be analyzed in O(1)
time instead of O(N)
for N
tasks.
The __vectorize
annotation on a for
loop indicates that the marked loop must be vectorized, and will cause the compiler to issue an error if this is not possible.
The __inline
annotation on a task call expression indicates that the marked call must (or must not) be inlined into the caller, and will cause the compiler to issue an error if this is not possible. This annotation overrides any __inline
annotations on the called task.
The __spmd
annotation on a loop or block indicates that the marked loop or block must be optimized with static control replication, an optimization described in this paper. Control replicated programs are substantially more scalable than non-control replicated programs.
Currently the Regent compiler consider only statements marked with this annotation for the optimization.
This annotation can be used in conjunction with the __trace
optimization via __demand(__spmd, __trace)
.
The __trace
annotation on a loop indicates that the marked loop should be traced. This is only possible when the sequence of tasks called within the traced loop is identical on every trip through the loop.
Currently the Regent compiler consider only statements marked with this annotation for the optimization. The compiler does not currently have the ability to check whether traced loops are valid (i.e., will always execute the same sequence of tasks), so invalid loops will result in a runtime error.
This annotation can be used in conjunction with the __spmd
optimization via __demand(__spmd, __trace)
.
Note that this will traces the inside of the loop. That is, the above code is more or less equivalent to:
If you would like to trace an arbitrary block of code (instead of a loop), this can be accomplished with:
The __cuda
annotation on a task indicates that the marked task should be considered for CUDA code generation. Any loops over regions inside the marked task must not contain loop-carried dependencies except for reductions via commutative and associative operators.
Currently the Regent compiler consider only statements marked with this annotation for the optimization.
The __openmp
annotation on a for
loop indicates that the marked loop should be considered for OpenMP code generation. The loop must not contain loop-carried dependencies except for reductions via commutative and associative operators.
Currently the Regent compiler consider only statements marked with this annotation for the optimization.
The __parallel
annotation on a task indicates that the task should be considered for auto-parallelization. Any loops over regions inside the marked task must not contain loop-carried dependencies except for reductions via commutative and associative operators.
Regent supports Terra-style metaprogramming. Metaprogramming can be used to accomplish a variety of purposes:
More generally, Regent can be used as a full-featured code generator for Legion, in the same way that Terra is used (by Regent itself) as a code generator for LLVM.
For the most part, these features work the same as in Terra. (For example, types are still Lua expressions, and quotes can still be inserted with the escape operator []
.) Regent-specific features are described below.
A symbol can be used as a variable or task parameter. To generate a fresh, unique symbol, call:
Regent provides an rquote
operator which is analogous to Terra’s quote
feature.
Regent provides an rexpr
operator which is analogous to Terra’s `
. (Unfortunately, Regent is not able to overload punctuation operators at this time, making this somewhat more verbose than Terra.)
The example below shows how to generate a simple type-parametric task.
To inspect the contents of generated tasks, invoke Regent with the flag -fpretty 1
. On the code above, this produces the following output.
This can also be used to determine what optimizations are being triggered. (For example, leaf optimization is enabled on the tasks above.)
Regent code can call C functions via Terra’s foreign function interface (FFI). For example, the following snippet calls the C standard library function printf
:
For more information on Terra’s FFI, please see the FFI documentation.
In some cases, it can be useful to call to call Legion APIs directly. These work the same as any other C function. As a convenience, Regent exposes a standard set of headers via the variable regentlib.c
. This corresponds to the Legion header file legion_c.h
.
Certain Legion API calls may require a runtime and/or context. These can be obtained in Regent via the operators __runtime()
and __context()
. A full list of operators to obtain C API object handles is available below.
For example, the following code calls a Legion execution fence:
At this time, the best source of documentation on the C API is the source code of the legion_c.h
header file. Note that in most cases, the functions of the C API correspond one-to-one with the C++ API, so most C APIs are documented simply by pointing to the corresponding methods in legion.h
.
__runtime()
returns the Legion runtime (legion_runtime_t
).__context()
returns the Legion context (legion_context_t
).__physical(r.{f, g, ...})
returns an array of physical regions (legion_physical_region_t
) for r
, one per field, for fields f
, g
, etc. in the order that the fields are listed in the call.__fields(r.{f, g, ...})
returns an array of the field IDs (legion_field_id_t
) of r
, one per field, for fields f
, g
, etc. in the order that the fields are listed in the call.__raw(r)
returns the C API object handle that corresponds to the given object, e.g. a legion_logical_region_t
for a region or legion_logical_partition_t
for a partition.The operators below can be used to import C API handles for objects created in C/C++ so that they can be used in Regent.
Important: C API handles can only be imported once into Regent. Subsequent attempts to import C API handles will fail. All objects created by Regent are considered to be already imported and thus cannot be imported again using this mechanism. These restrictions guarantee that certain assumptions made by the Regent compiler are not violated, and are critical to ensuring that Regent’s type checker and optimizer work correctly.
raw_ispace
is of type legion_index_space_t
.
raw_region
is of type legion_logical_region_t
.
raw_field_id_array
is of type legion_field_id_t[N]
where N
is the number of fields in field space fs
. Field IDs are enumerated in the same order as listed in the original field space or struct. Note that field spaces and structs in Regent are recursively flattened, such that the field space fs
below contains 3 fields (center.x
, center.y
and mass
).
raw_partition_*
are of type legion_logical_partition_t
.
p
and q
are the source partitions of the cross product, colors
is an array of uint32
containing one color for each source partition, and raw_cp
is of type legion_terra_index_cross_product_t
.
x
is of type legion_future_t
and f
will have type T
. If -ffuture 1
is enabled, this can be automatically optimized into a Regent future.