Data Reorg meeting minutes, 6/8/2000 and 6/9/2000

Attendance
 
Name
Organization
Ken Cain MITRE
Dimitris Christodoulou Sky Computers
Dennis Cottel SPAWARSYSCEN, S.D.
Zhenqian Cui MPI Software Technology
Nathan Doss Lockheed Martin GES
Jon Greene Mercury Computer Systems
Arkady Kanevsky Mercury Computer Systems
James Lebak MIT Lincoln Laboratory
Steve Paavola Sky Computers
Myra Prelle Mercury Computer Systems



Regarding multiple levels of API specification and implementation (similar in spirit to VSIPL "CORE-Lite", etc.)

Regarding portability of applications:

Dimitris: There should be a null object for all defined objects in the API
 


Myra: proposing in LIS API that we might want to supply lots of detailed (previously specified) info to dri_bufferset_create. In fact, you may even need per-process detailed partitioning information (if you are layering on something like Mercury's PAS, for example). Options:

Group concensus for now is to keep memory allocation scoped locally, and to allow flexible alignment of the local data buffer as has been discussed in prior meetings.
 



Local memory buffers, and their alignment:

A big problem is looming regarding data alignment between communication and computation library APIs. Creating buffers in DRI, then trying to compute in VSIPL. You could admit the DRI created buffers to VSIPL, but if the data being operated on by VSIPL isn't sufficiently aligned, memory copies could happen under the hood in VSIPL impl.

The general area of layouts and memory allocation needs to be revisited

Strong desire by group to allow users to admit their own buffers to Data Reorg
 

API change proposals:

dri_bufferset_create (IN nbufs, IN dist, OUT bufset)
 - if nbufs is 0 creates an empty bufferset

dri_bufferset_add_buffer(IN bufaddress, INOUT bufset)
 - user would call this one time per buffer that needs to be included in the bufferset

Alternative:
dri_bufferset_import (IN nbufs, IN [ ] buffers, OUT bufset)
 [ this avoids the user doing a dri_bufferset_add_buffer on a previously created bufferset from dri_bufferset_create ]
 

Group agrees that the bufferset_import approach is probably more useful, and avoids overloading the intent of dri_bufferset_create (which performs implementation-provided memory allocation services)
 
 

Jon: proposes a new calc_size routine (simpler name, and returns more than just size - maybe an opaque object)
Ken: minimal interface should allow query for buffer size (already present), and then the beginning alignment (needs introduction into LIS API)
The BIG question here is "are these 2 integer quantities sufficient"?
 


Co-layer and  related issues discussion

What are the possibilities?


Decision: The group should acknowledge the wide spectrum of possible implementations of DR. Perhaps write up conceived instantiations in the final report. The question is: which MPI co-layer form of DR are we going to produce in the final "product" of this activity (we have discussed producing such a specification as part of our work)? A classical "co-layer" approach that preserves early-binding focus,  or one that preserves an MPI-like (late-binding) set of interfaces?

Resolution: The LIS API will consist of 2 categories:


List of "minimal, standalone" functions:
 
 
Function Related Object(s)
DRI_Init DRI_Group (DRI_GROUP_WORLD pre-defined group - see below)
DRI_Finalize
DRI_Group_create DRI_Group
DRI_Group_get_rank DRI_Group
DRI_Group_get_size DRI_Group
DRI_Bufferset_system_create DRI_Bufferset, DRI_Buffer_Id
DRI_Bufferset_user_create DRI_Bufferset, DRI_Buffer_Id
DRI_Channel_create_send DRI_Channel
DRI_Channel_create_recv DRI_Channel
DRI_Channel_connect DRI_Channel
DRI_Channel_connect_sendrecv DRI_Channel
DRI_Channel_get DRI_Channel
DRI_Channel_put DRI_Channel

 

List of "DRI Core" functions - those that MUST be provided everywhere, regardless of underlying implementation approach:
 
 
 
Function Related Object(s)
DRI_Global_data_create DRI_Global_data
DRI_Overlap_create DRI_Overlap
DRI_Partition_block_create DRI_Partition
DRI_Partition_blockcyclic_create DRI_Partition
DRI_Partition_whole_create DRI_Partition
DRI_Distribution_create DRI_Distribution
DRI_Layout_create (or comparable equivalent when specified) DRI_Layout

 
 
 

Focus on process sets:

Group agrees to provide a DRI_GROUP_WORLD predefined object, and create sub-groups that are subsets of DRI_GROUP_WORLD.
DRI_Init implementation could construct DRI_GROUP_WORLD as a side-effect.

DRI_group_create (IN original_group, IN list_of_ranks, IN num_ranks, OUT new_group)
 


Miscellaneous changes to some data types in the LIS API:

DECISION: DRI_distspec type is now renamed to DRI_Partition

DECISION: DRI_dist is now renamed to DRI_Distribution



DRI_Global_data_create()
 

**** CORRECTION needed: dri_global_data_create is listed twice: once for the create function in earnest, one for the destroy function (didn't copy and paste and edit properly)
 


Specific API issues discussed
 

Talking about put/get channel interface and interaction with buffersets. The issue of setting up efficient dma transfers in advance was raised. Three areas of concern were discussed:


Group agrees that the following restrictions should be imposed to facilitate high performance

NOTE: the number of buffers CAN differ between processes that are on different sides of the same data reorg channel. As long as the same order of access to the buffers is maintained on both sides of the data reorg, a cycle of buffer pairings can be established in the middleware, permitting the type of dma transfer optimization that is desired.

Dennis/Arkady: need somethiing similar to buffer iterator "policies" at bufferset creation time. This basically enforces an ordering on subsequent operations.

This topic area may need some additional thought offline.
 


Buffersets discussion:

Do we need buffersets? Can they just be an internal object to channels? Would perhaps simplify the user api? (but then if you want to share the internal bufferset among channels (e.g., on the same process, use the same bufferset for received "upstream" data, and data to be sent "downstream" to another set of processes following some processing) then we would need some type of channel "clone" operation)
 


What are our products?
 


What's our near-term plan




 

Mercury proposal regarding memory layout objects, and a new proposed "memory descriptor object" (MDO)

THE DETAILS OF THIS ARE YET TO BE OFFICIALLY PUT IN THE MINUTES. It requires some post-meeting coordination with the principals to verify  that what was recorded is accurate.

Details will be published under separate cover, and presumably posted on www.data-re.org in the "Meetings" section, along with the official minutes.