From 76f21adeb52880fb1bf1885b4a189d09157d979f Mon Sep 17 00:00:00 2001 From: Gordon Date: Sun, 26 Aug 2018 20:45:11 +0100 Subject: [PATCH 1/2] Issue #50: Make execution_resource iterable. * Replace execution_resource::resources with execution_resource::begin and execution_resource::end. * Add execution_resource::size. * Add execution_resource::operator[]. * Add aliases to execution_resource to support new interface. * Improve woring for execution resource and system topology. * Update examples to suppoort change. * Rename this_system::resources to this_system::discover_topology. --- affinity/cpp-20/d0796r3.md | 123 +++++++++++++++++++++++++------------ 1 file changed, 83 insertions(+), 40 deletions(-) diff --git a/affinity/cpp-20/d0796r3.md b/affinity/cpp-20/d0796r3.md index d287939..56ca353 100644 --- a/affinity/cpp-20/d0796r3.md +++ b/affinity/cpp-20/d0796r3.md @@ -16,6 +16,10 @@ ### P0796r3 (SAN 2018) +* Make `execution_resource`s iterable by replacing `execution_resource::resources` with `execution_resource::begin` and `execution_resource::end`. +* Add `size` and `operator[]` for `execution_resource`. +* Rename `this_system::get_resources` to `this_system::discover_topology`. + ### P0796r2 (RAP 2018) * Introduce a free function for retrieving the execution resource underlying the current thread of execution. @@ -211,26 +215,39 @@ Below *(Listing 2)* is an example of executing a parallel task over 8 threads us ## Execution resource topology -### Execution resources +### System topology -An `execution_resource` is a lightweight structure which acts as an identifier to particular piece of hardware within a system. It can be queried for whether it can allocate memory via `can_place_memory`, whether it can execute work via `can_place_agents`, and for its name via `name`. An `execution_resource` can also represent other `execution_resource`s. We call these *members of* that `execution_resource`, and can be queried via `resources`. Additionally the `execution_resource` which another is a *member of* can be queried via `member_of`. An `execution_resource` can also be queried for the concurrency it can provide, the total number of *threads of execution* supported by that *execution_resource*, and all resources it represents. +The **system topology** is comprised of a directed acyclic graph (DAG) of **execution resources**, representing all unique hardware and software components available within the system capable of executing work. The root node of the DAG is the **system execution resource** and represents the entire system. Each **execution resource** within the DAG may have any number of child **execution resources** representing a finer granularity of the parent **execution resource**. Every **execution resource** within the **system topology** is exposed via an `execution_resource` object. -> [*Note:* Note that an execution resource is not limited to resources which execute work, but also a general resource where no execution can take place but memory can be allocated such as off-chip memory. *--end note*] +The **system topology** can be discovered by calling `this_system::discover_topology`. This will discover all **execution resources** available within the system and construct the **system topology** DAG, describing a read-only snapshot at the point of the call, and then return an `execution_resource` object exposing the **system execution resource**. -> [*Note:* The intention is that the actual implementation details of a resource topology are described in an execution context when required. This allows the execution resource objects to be lightweight objects that serve as identifiers that are only referenced. *--end note*] +> [*Note:* A call to `this_system::discover_topology` may invoke C++ library calls, system calls or third party library APIs required to discover certain **execution resources**. *--end note*] -### System topology +### Execution resources + +An `execution_resource` is a lightweight structure which identifies a particular **execution resource** within a snapshot of the **system topology**. It can be queried for whether the associated **execution resource** can allocate memory via `can_place_memory`, whether the associated **execution resource** can execute work via `can_place_agents`, and for a name via `name`. -The system topology is made up of a number of system-level `execution_resource`s, which can be queried through `this_system::get_resources` which returns a `std::vector`. A run-time library may initialize the `execution_resource`s available within the system dynamically. However, this must be done before `main` is called, given that after that point, the system topology may not change. +An `execution_resource` object can be queried for a pointer to it's parent `execution_resource` via `member_of`, and can also be iterated over for it's child `execution_resource`s via `begin` and `end`. -Below *(Listing 3)* is an example of iterating over the system-level resources and printing out their capabilities. +An `execution_resource` object can also be queried for the amount concurrency it can provide, the total number of **threads of execution** supported by the associated **execution resource**. + +> [*Note:* An **execution resource** is not limited to resources which execute work, but also a general resource where no execution can take place but memory can be allocated such as off-chip memory. *--end note*] + +Below *(Listing 3)* is an example of iterating over every **execution resource** within the **system topology** and printing out their capabilities. ```cpp -for (auto res : execution::this_system::get_resources()) { - std::cout << res.name() `\n`; - std::cout << res.can_place_memory() << `\n`; - std::cout << res.can_place_agents() << `\n`; - std::cout << res.concurrency() << `\n`; +void print_topology(const execution::execution_resource &resource, int indent = 0) { + for (int i = 0; i < indent; i++) { std::cout << " "; } + std::cout << resource.name() << ": " << resource.can_place_memory() << ", " + << resource.can_place_agents() << ", " << resource.concurrency() << "\n"; + for (const execution::execution_resource child : resource) { + print_topology(child, indent + 1); + } +} + +int main(int argc, char * argv[]) { + auto systemResource = this_system::discover_topology(); + print_topology(systemResource); } ``` *Listing 3: Example of querying all the system level execution resources* @@ -242,14 +259,13 @@ The `affinity_query` class template provides an abstraction for a relative affin Below *(listing 4)* is an example of how to query the relative affinity between two `execution_resource`s. ```cpp -auto systemLevelResources = execution::this_system::get_resources(); -auto memberResources = systemLevelResources.resources(); +auto systemResource = this_system::discover_topology(); auto relativeLatency01 = execution::affinity_query(memberResources[0], memberResources[1]); + execution::affinity_metric::latency>(systemResource[0], systemResource[1]); auto relativeLatency02 = execution::affinity_query(memberResources[0], memberResources[2]); + execution::affinity_metric::latency>(systemResource[0], systemResource[2]); auto relativeLatency = relativeLatency01 > relativeLatency02; ``` @@ -264,16 +280,16 @@ The `execution_context` class provides an abstraction for managing a number of l Below *(Listing 5)* is an example of how this extended interface could be used to construct an *execution context* from an *execution resource* which is retrieved from the *system’s resource topology*. Once an *execution context* is constructed it can then still be queried for its *execution resource*, and that *execution resource* can be further partitioned. ```cpp -auto &resources = execution::this_system::get_resources(); +auto systemResource = std::this_system::discover_topology(); -execution::execution_context execContext(resources[0]); +execution::execution_context execContext(systemResource[0]); -auto &systemLevelResource = execContext.resource(); +auto &execResource = execContext.resource(); -// resource[0] should be equal to execResource +// systemResource[0] should be equal to execResource -for (auto res : systemLevelResource.resources()) { - std::cout << res.name() << `\n`; +for (const execution::execution_resource res : execResource()) { + std::cout << res.name() << "\n"; } ``` *Listing 5: Example of constructing an execution context from an execution resource* @@ -281,10 +297,10 @@ for (auto res : systemLevelResource.resources()) { When creating an `execution_context` from a given `execution_resource`, the executors and allocators associated with it are bound to that `execution_resource`. For example, when creating an `execution_resource` from a CPU socket resource, all executors associated with the given socket will spawn execution agents with affinity to the socket partition of the system *(Listing 6)*. ```cpp -auto cList = std::execution::this_system::get_resources(); +auto systemResource = std::this_system::discover_topology(); // FindASocketResource is a user-defined function that finds a // resource that is a CPU socket in the given resource list -auto& socket = findASocketResource(cList); +auto& socket = findASocketResource(systemResource); execution_contextC{socket} // Associated with the socket auto executor = eC.executor(); // By transitivity, associated with the socket too auto socketAllocator = eC.allocator(); // Retrieve an allocator to the closest memory node @@ -326,6 +342,14 @@ A *thread of execution* can be requested to bind to a particular `execution_reso class execution_resource { public: + using pointer = execution_resource *; + using const_pointer = const execution_resource *; + using iterator = execution_resource *; + using const_iterator = const execution_resource *; + using reference = execution_resource &; + using const_reference = const execution_resource &; + using size_type = std::size_t; + execution_resource() = delete; execution_resource(const execution_resource &); execution_resource(execution_resource &&); @@ -333,11 +357,16 @@ A *thread of execution* can be requested to bind to a particular `execution_reso execution_resource &operator=(execution_resource &&); ~execution_resource(); - size_t concurrency() const noexcept; + size_type size() const noexcept; + + const_iterator begin() const noexcept; + const_iterator end() const noexcept; + + const_reference operator[](int child) const noexcept; - std::vector resources() const noexcept; + const_pointer member_of() const noexcept; - const execution_resource member_of() const noexcept; + size_t concurrency() const noexcept; std::string name() const noexcept; @@ -403,7 +432,7 @@ A *thread of execution* can be requested to bind to a particular `execution_reso /* This system */ namespace this_system { - std::vector resources() noexcept; + const execution_resource discover_topology(); } /* This thread */ @@ -447,7 +476,7 @@ The `execution_resource` class provides an abstraction over a system's hardware, ### `execution_resource` constructors - execution_resource(); + execution_resource() = delete; > [*Note:* An implementation of `execution_resource` is permitted to provide non-public constructors to allow other objects to construct them. *--end note*] @@ -468,13 +497,25 @@ The `execution_resource` class provides an abstraction over a system's hardware, *Returns:* The total concurrency available to this resource. More specifically, the number of *threads of execution* collectively available to this `execution_resource` and any resources which are *members of*, recursively. - std::vector resources() const noexcept; + size_type size() const noexcept; + +*Returns:* The number of child `execution_resource`s. + + const_iterator begin() const noexcept; -*Returns:* All `execution_resource`s which are *members of* this resource. +*Returns:* A const iterator to the beggining of the child `execution_resource`s. + + const_iterator end() const noexcept; + +*Returns:* A const iterator to the end of the child `execution_resource`s. + + const_reference operator[](int child) const noexcept; + +*Returns:* A const reference to the specified child `execution_resource`s. const execution_resource &member_of() const noexcept; -*Returns:* The `execution_resource` which this resource is a *member of*. +*Returns:* The parent `execution_resource`. std::string name() const noexcept; @@ -482,11 +523,11 @@ The `execution_resource` class provides an abstraction over a system's hardware, bool can_place_memory() const noexcept; -*Returns:* If this resource is capable of allocating memory with affinity, 'true'. +*Returns:* If the associated **execution resource* is capable of allocating memory with affinity, 'true'. bool can_place_agent() const noexcept; -*Returns:* If this resource is capable of execute with affinity, 'true'. +*Returns:* If the associated **execution resource* is capable of execute with affinity, 'true'. ## Class `execution_context` @@ -589,17 +630,19 @@ The `affinity_query` class template provides an abstraction for a relative affin ## Free functions -### `this_system::get_resources` +### `this_system::discover_topology` + +The free function `this_system::discover_topology` is provided for discovering the **system topology**. -The free function `this_system::get_resources` is provided for retrieving the `execution_resource`s which encapsulate the hardware platforms available within the system. We refer to these resources as the *system level resources*. + const execution_resource discover_topology(); - std::vector resources() noexcept; +*Returns:* An `execution_resource` object exposing the **system execution resource**. -*Returns:* An `std::vector` containing all *system level resources*. +*Requires:* If `this_system::discover_topology().size() > 0`, `this_system::discover_topology()[0]` be the `execution_resource` use by `std::thread`. Calls to `this_system::discover_topology()` may not introduce a data race with any other call to `this_system::discover_topology()`. -*Requires:* If `this_system::get_resources().size() > 0`, `this_system::get_resources()[0]` be the `execution_resource` use by `std::thread`. The value returned by `this_system::get_resources()` be the same at any point after the invocation of `main`. +*Effects:* Discovers all **execution resources** available within the system and constructs the **system topology** DAG, describing a read-only snapshot at the point of the call. -> [*Note:* Returning a `std::vector` allows users to potentially manipulate the container of `execution_resource`s after it is returned. We may want to replace this at a later date with an alternative type which is more restrictive, such as a range or span. *--end note*] +*Throws:* Any exception thrown as a result of **system topology** discovery. ### `this_thread::bind` & `this_thread::unbind` From 92841d897455e75908fc4cafb67b6abf9f3dd0a1 Mon Sep 17 00:00:00 2001 From: Gordon Date: Thu, 4 Oct 2018 21:06:55 +0100 Subject: [PATCH 2/2] CP013: Make modifications based on feedback. * Add minor corrections. * Add requirement for execution_resource iterators to be random access. * Add section to background discussing handle partial errors on topology discovery. --- affinity/cpp-20/d0796r3.md | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/affinity/cpp-20/d0796r3.md b/affinity/cpp-20/d0796r3.md index 9d57cf1..4743cd6 100644 --- a/affinity/cpp-20/d0796r3.md +++ b/affinity/cpp-20/d0796r3.md @@ -165,7 +165,13 @@ From a historic perspective, programming models for traditional high-performance Some of these programming models also address *fault tolerance*. In particular, PVM has native support for this, providing a mechanism [[27]][pvm-callback] which can notify a program when a resource is added or removed from a system. MPI lacks a native *fault tolerance* mechanism, but there have been efforts to implement fault tolerance on top of MPI [[28]][mpi-post-failure-recovery] or by extensions[[29]][mpi-fault-tolerance]. -Due to the complexity involved in standardizing *dynamic resource discovery* and *fault tolerance*, these are currently out of the scope of this paper. However, we leave open the possibility of accommodating both in the future, by not overconstraining *resources*' lifetimes (see next section). +Due to the complexity involved in standardizing *dynamic resource discovery* and *fault tolerance*, these are currently out of the scope of this paper. However, we leave open the possibility of accommodating both in the future, by not over constraining *resources*' lifetimes (see next section). + +### Reporting errors in topology discovery + +As querying the topology of a system can invoke a number of different system and third-party library, we have to consider what will happen when a call to one of these fails. Firstly we want to be able to report this failure so that it can be reported or handled in user code. Secondly as there will often be more than one source of topology discovery we have to avoid short-circuiting the discovery on an error and preventing potentially valid topology information being reported to users. For example if a system were to report both Hwloc and OpenCL execution resources and one of these failed we want the other to still be able to return it's resources. + +A potential solution to this could be support partial errors in topology discovery, where querying the system for it's topology could be permitted to fail but still return a valid topology structure representing the topology that was discovered successfully. The way in which these errors are reported (i.e. exceptions or error values) would have to be decided, exceptions could be problematic as it could unwind the stack before capturing important topology information so perhaps an error value based approach would be preferable. ### Resource lifetime @@ -286,7 +292,7 @@ An `execution_resource` object can be queried for a pointer to it's parent `exec An `execution_resource` object can also be queried for the amount concurrency it can provide, the total number of **threads of execution** supported by the associated **execution resource**. -> [*Note:* An **execution resource** is not limited to resources which execute work, but also a general resource where no execution can take place but memory can be allocated such as off-chip memory. *--end note*] +> [*Note:* An **execution resource** is not limited to resources which execute work, but also a general resource where no execution can take place but memory can be allocated, such as off-chip memory. *--end note*] Below *(Listing 3)* is an example of iterating over every **execution resource** within the **system topology** and printing out their capabilities. @@ -343,7 +349,7 @@ auto &execResource = execContext.resource(); // systemResource[0] should be equal to execResource -for (const execution::execution_resource res : execResource()) { +for (const execution::execution_resource &res : execResource) { std::cout << res.name() << "\n"; } ``` @@ -393,10 +399,11 @@ The `execution_resource` which underlies the current thread of execution can be class execution_resource { public: + using value_type = execution_resource; using pointer = execution_resource *; using const_pointer = const execution_resource *; - using iterator = execution_resource *; - using const_iterator = const execution_resource *; + using iterator = see-below; + using const_iterator = see-below; using reference = execution_resource &; using const_reference = const execution_resource &; using size_type = std::size_t; @@ -413,7 +420,7 @@ The `execution_resource` which underlies the current thread of execution can be const_iterator begin() const noexcept; const_iterator end() const noexcept; - const_reference operator[](int child) const noexcept; + const_reference operator[](std::size_t child) const noexcept; const_pointer member_of() const noexcept; @@ -522,6 +529,18 @@ The `execution_resource` class provides an abstraction over a system's hardware, > [*Note:* Creating an `execution_resource` may require initializing the underlying software abstraction when the `execution_resource` is constructed, in order to discover other `execution_resource`s accessible through it. However, an `execution_resource` is nonowning. *--end note*] +### `execution_resource` member types + + iterator + +*Requires:* `iterator` to model `RandomAccessIterator` with the value type `execution_resource::value_type`. + + const_iterator + +*Requires:* `const_iterator` to model `RandomAccessIterator` with the value type `execution_resource::value_type`. + +iterator_traits<>iterator_category + ### `execution_resource` constructors execution_resource() = delete; @@ -551,13 +570,13 @@ The `execution_resource` class provides an abstraction over a system's hardware, const_iterator begin() const noexcept; -*Returns:* A const iterator to the beggining of the child `execution_resource`s. +*Returns:* A const iterator to the beginning of the child `execution_resource`s. const_iterator end() const noexcept; *Returns:* A const iterator to the end of the child `execution_resource`s. - const_reference operator[](int child) const noexcept; + const_reference operator[](std::size_t child) const noexcept; *Returns:* A const reference to the specified child `execution_resource`s. @@ -581,7 +600,7 @@ The `execution_resource` class provides an abstraction over a system's hardware, The `execution_context` class provides an abstraction for managing a number of lightweight execution agents executing work on an `execution_resource` and any `execution_resource`s encapsulated by it. The `execution_resource` which an `execution_context` encapsulates is referred to as the *contained resource*. -### `execution_context` types +### `execution_context` member types using executor_type = see-below;