Proposal for Reorg connect/destroy operations

DRI Terminology (much borrowed from MPI 1.1, propose incorporation into DRI glossary and API spec)

Use Case #1 (individual DRI_Reorg connect/disconnect)
 
Function Blocking semantics Locality semantics Synchronous connect semantics
(same side)
Synchronous connect
semantics
(other side)
Notes
DRI_Reorg_create
Block
Non-Local
Async
Async
  • could perform DRI_Reorg registration here
  • can be called at any time (after DRI_Init, before DRI_Finalize)
DRI_Reorg_connect
Block
Non-Local
Async
Sync
(not barrier)
  • DRI_Reorg_create must have been called (by the calling process)
  • synchronous with respect to completion (i.e., not a barrier) of DRI_Reorg_create by other side processes
DRI_Reorg_iconnect
Block
Non-Local
Async
Async
  • all semantics same as DRI_Reorg_connect, except that the synchronization needed with other side's processes is deferred until the first call to DRI_Reorg_put/get
  • "immediate" connect method
  • probably requires a corresponding test method (e.g., DRI_Reorg_isconnected)
DRI_Reorg_get/put
Block (get)
Nonblk(put)
Non-Local
Async
(depends on connect approach used)
  • The DRI_Reorg must be in the connected state (via DRI_Reorg_connect or _iconnect)
  • If DRI_Reorg_iconnect was called, perform the connection (synchronization) steps with other side processes now
  • If synchronous, it is with respect to either:
    • the completion of DRI_Reorg_connect by other side processes OR
    • the first call to DRI_Reorg_get/put by other side processes
DRI_Reorg_disconnect
Block
Non-Local
Async
Async
  • DRI_Reorg must be in connected state
  • disallows any subsequent calls to put/get
  • reclaims control of any buffers (DRI_Datapart objs) that are currently "checked out" by the application (via prior get call)
  • blocks, waiting for any put/get communications in-progress to finish
    • pending put operations may not possible to complete if disconnect indication from other side occurs before pre-acknowledgement to write to destination buffers. Implementation will detect this condition and not send the data.
  • Restores DRI_Reorg object to its state following DRI_Reorg_create, and before DRI_Reorg_connect
DRI_Reorg_destroy
Block
Local
Async
Async
  • The DRI_Reorg object must be in the disconnected state. For example, here are legal scenarios:
    • created, but not yet connected
    • created, connected, possibly used with get/put, and then disconnected




Use Case #2 (collective DRI_Reorg connect/disconnect)

Function
Blocking semantics
Locality semantics
Synchronous connect semantics
(same side)
Synchronous connect
semantics
(other side)
Notes
DRI_Reorg_create
Block
Non-Local
Async
Async
  • could perform DRI_Reorg registration here
  • can only be called when DRI library is in the disconnected state (either before DRI_connect or after DRI_disconnect)
DRI_connect
Block
Non-Local
Sync
Sync
(barrier)
  • library must be in disconnected state
  • all DRI_Reorgs affiliated with the calling process are connected with this call
  • implementation cannot defer connection of DRI_Reorg objects until first use of put/get
  • granularity of synchronization is only among the process groups with whom the calling process will interact with DRI_Reorg_put/get (i.e., this is not necessarily a barrier sync across all processes in the DRI_Network scope)
DRI_Reorg_put/get
Block (get)
Nonblk(put)
Non-Local
Async
Async
  • DRI_connect must have been called
  • DRI_disconnect must not have been called
DRI_disconnect
Block
Non-Local
Sync
Sync
(barrier)
  • Can only be called when library is connected
  • Marks all connected DRI_Reorg objects as disconnected
  • Blocks, waiting for any put/get communications on all DRI_Reorg objects to finish
  • Restores the state of all DRI_Reorg objects in the program's scope to what it was following DRI_Reorg_create, and before DRI_connect
DRI_Reorg_destroy
Block
Local
Async
Async
  • The DRI library must be in the disconnected state (either before DRI_connect, or after DRI_disconnect)