DRI Forum Minutes, June 2001 Meeting

Attendance

Murali Beddhu (MPI Softtech)
Ken Cain (Mercury Computer Systems)
Dennis Cottel (SPAWAR)
Zhenqian Cui (MPI Softtech)
Randy Judd (SPAWAR)
Jeremy Kepner (MIT/LL)
Steve Paavola (Sky Computers)
Brian Sroka (MITRE) - host
Chris Young (MPI Softtech)
 

Legend: Official decisions made at this meeting are shown in bold and italic format

 

Agenda

  1. Discuss how the voting process (to occur soon) should take place
  2. New proposals
  3. Discuss "mini-proposals" (changes to document following May 2001 meeting that need review by the forum)
  4. Discuss the "big picture" issues surfacing for DRI Forum and DRI technology
  5. Overlapping process groups in data reorgs
  6. Do we need a bufferset object or not? (latest status is that recent activity has removed it from the spec)
  7. Data types in DRI (DRI_Dataspec)
  8. Accessor routines for the DRI_Blockinfo object
  9. Suggested intermediate goal: settle on data distribution and DRI_Blockinfo portions of the API, and limit "volatility" in the spec to data movement (buffering)

Discussion

Agenda item #1: discuss how the voting process (to occur soon) should take place

Agenda item #4: the big picture for DRI Forum

Agenda item #3: Discuss "mini-proposals" (changes to document following May 2001 meeting that need review by the forum)

Agenda item #2: discuss new proposals

Agenda item: making it legal (but not required) to allow overlapping process groups in a data reorganization

Agenda item: DRI_Dataspec types

Agenda item: DRI_Blockinfo accessors

Agenda item: Do we need a bufferset abstraction (or not)?

Misc topic: early vs. late binding, and being able to control it at a coarse level

Misc topic: DRI_Reorg_create functions and how the alloc and dealloc handlers are presented.

Misc topic: we're missing DRI_Buffer_get_blockcount

Misc topic: error code approach

Appendix: Shortcut proposal handout

+ I forgot to get blockinfo in my examples.

+ I'm performing a clique 2D cornerturn

 

0. Yesterday's 2D cornerturn without "default" objects (June 8, 2001 version of DRI document, Annex D)

(see the DRI document! Note that this example is "out of date" because
 it does not use the built-in "default" objects that have been added
 to the API between January 2001 and today)
 

1. Today's 2D cornerturn using recently-added "default" objects

Specifically, using the following default objects:
 + DRI_PARTITION_BLOCK_DEFAULT
 + DRI_PARTITION_WHOLE
 + DRI_LAYOUT_DEFAULT
 + DRI_DIST_GROUPDIMS_ALL_DEFAULT

#define NDIMS 2
int global_dims[NDIMS] = {1024, 256};

int layout_order_recvside[NDIMS] = {1, 0};

DRI_Network *drinet;

DRI_Global_Data *gdo;
DRI_Group *drigroup;

DRI_Partition parts_sendside[NDIMS];
DRI_Partition parts_recvside[NDIMS];

DRI_Distribution *dist_sendside;
DRI_Distribution *dist_recvside;

DRI_Layout *layout_sendside;
DRI_Layout *layout_recvside;

DRI_Reorg *reorg_sendside;
DRI_Reorg *reorg_recvside;
int reorg_numbufs = 2;

DRI_Buffer *buf_send;
DRI_Buffer *buf_recv;

complex_f *send_ptr;
complex_f *recv_ptr;

const char *chan_string = "transpose";

int main (int argc, char **argv) {

 MPI_Init (&argc, &argv);
 DRI_Init (&argc, &argv, &drinet);

 DRI_Group_import (MPI_COMM_WORLD, &drigroup);

 DRI_Global_Data_create(NDIMS, global_dims, &gdo);

 parts_sendside[0] = DRI_PARTITION_BLOCK_DEFAULT;
 parts_sendside[1] = DRI_PARTITION_WHOLE;

 parts_recvside[0] = parts_sendside[1];
 parts_recvside[1] = parts_sendside[0];

 layout_sendside = DRI_LAYOUT_DEFAULT;
 DRI_Layout_create_packed (NDIMS, layout_order_recvside, &layout_recvside);

 DRI_Distribution_create (gdo, drigroup, DRI_DIST_GROUPDIMS_ALL_DEFAULT,
                          parts_sendside, DRI_LAYOUT_DEFAULT, &dist_sendside);

 DRI_Distribution_create (gdo, drigroup, DRI_DIST_GROUPDIMS_ALL_DEFAULT,
                          parts_recvside, layout_recvside, &dist_recvside);

 DRI_Reorg_create_system (drinet, DRI_REORG_SEND, chan_string, DRI_COMPLEX,
                          dist_sendside, reorg_numbufs, &reorg_sendside);

 DRI_Reorg_create_system (drinet, DRI_REORG_RECV, chan_string, DRI_COMPLEX,
                          dist_recvside, reorg_numbufs, &reorg_recvside);

 DRI_connect(drinet);

 /***** Entering the operational loop *****/
 for (; ;) {
   DRI_Reorg_get_buffer (reorg_sendside, &buf_send);
   DRI_Buffer_get_ptr (buf_send, &send_ptr);

   /* produce data in the send buffer by using send_ptr */

   DRI_Reorg_put_buffer (reorg_sendside, buf_send);
   DRI_Reorg_get_buffer (reorg_recvside, &buf_recv);
   DRI_Buffer_get_ptr (buf_recv, &recv_ptr);

   /* consume data in the receive buffer by using recv_ptr */
   DRI_Reorg_put_buffer (reorg_recvside, buf_recv);
 }

 DRI_Finalize(drinet);
 MPI_Finalize();

} /* end of main() */
 

2. What we could do with NEW canned multi-dimensional layout and partitioning de
fault parameters?

Using these EXISTING default objects:
 + DRI_DIST_GROUPDIMS_ALL_DEFAULT

Using these PROPOSED new defaults:
 + DRI_PARTITION_BLOCK_2D_10
 + DRI_PARTITION_BLOCK_2D_01
 + DRI_LAYOUT_PACKED_2D_10
 + DRI_LAYOUT_PACKED_2D_01
 + NOTE: These could be implemented in the following way:

     const DRI_Partition DRI_PARTITION_BLOCK_2D_10[2] = {DRI_PARTITION_BLOCK,
                                                         DRI_PARTITION_WHOLE};

     const DRI_Partition DRI_PARTITION_BLOCK_2D_01[2] = {DRI_PARTITION_WHOLE,
                                                         DRI_PARTITION_BLOCK};

     (*** AND WE COULD DO SIMILAR DEFINITIONS FOR 3D, 4D objects! ***)

     DRI_Layout DRI_LAYOUT_PACKED_2D_10;
       (at DRI_Init time, DRI_Layout_create_packed is called by library
        with ndims=2, order[] = {1, 0})

     DRI_Layout DRI_LAYOUT_PACKED_2D_01;
       (at DRI_Init time, DRI_Layout_create_packed is called by library
        with ndims=2, order[] = {0, 1})
 

Assumptions: default layout order follows gdo, no overlap

#define NDIMS 2
int global_dims[NDIMS] = {1024, 256};

DRI_Network *drinet;

DRI_Global_Data *gdo;
DRI_Group *drigroup;

DRI_Distribution *dist_sendside;
DRI_Distribution *dist_recvside;

DRI_Reorg *reorg_sendside;
DRI_Reorg *reorg_recvside;
int reorg_numbufs = 2;

DRI_Buffer *buf_send;
DRI_Buffer *buf_recv;

complex_f *send_ptr;
complex_f *recv_ptr;

const char *chan_string = "transpose";

int main (int argc, char **argv) {

 MPI_Init (&argc, &argv);
 DRI_Init (&argc, &argv, &drinet);

 DRI_Group_import (MPI_COMM_WORLD, &drigroup);

 DRI_Global_Data_create(NDIMS, global_dims, &gdo);
 

 DRI_Distribution_create (gdo, drigroup, DRI_DIST_GROUPDIMS_ALL_DEFAULT,
                          DRI_PARTITION_BLOCK_2D_10, DRI_LAYOUT_PACKED_2D_10,
                          &dist_sendside);

 DRI_Distribution_create (gdo, drigroup, DRI_DIST_GROUPDIMS_ALL_DEFAULT,
                          DRI_PARTITION_BLOCK_2D_01, DRI_LAYOUT_PACKED_2D_01,
                          &dist_recvside);

 DRI_Reorg_create_system (drinet, DRI_REORG_SEND, chan_string, DRI_COMPLEX,
                          dist_sendside, reorg_numbufs, &reorg_sendside);

 DRI_Reorg_create_system (drinet, DRI_REORG_RECV, chan_string, DRI_COMPLEX,
                          dist_recvside, reorg_numbufs, &reorg_recvside);

 DRI_connect(drinet);

 /***** Entering the operational loop *****/
 for (; ;) {
   DRI_Reorg_get_buffer (reorg_sendside, &buf_send);
   DRI_Buffer_get_ptr (buf_send, &send_ptr);

   /* produce data in the send buffer by using send_ptr */

   DRI_Reorg_put_buffer (reorg_sendside, buf_send);
   DRI_Reorg_get_buffer (reorg_recvside, &buf_recv);
   DRI_Buffer_get_ptr (buf_recv, &recv_ptr);

   /* consume data in the receive buffer by using recv_ptr */
   DRI_Reorg_put_buffer (reorg_recvside, buf_recv);
 }

 DRI_Finalize(drinet);
 MPI_Finalize();
} /* end of main() */
 

3. Adding GDO creation step into the Distribution_create calls.

Using these EXISTING shortcut default objects:
 + NONE!

Using these PROPOSED new defaults:
 + DRI_PARTITION_BLOCK_2D_10
 + DRI_PARTITION_BLOCK_2D_01
 + DRI_LAYOUT_PACKED_2D_10
 + DRI_LAYOUT_PACKED_2D_01

and REMOVING the Global_Data_create, instead moving its input parameters
into an alternative DRI_Distribution_create_simple routine
 
 

#define NDIMS 2
int global_dims[NDIMS] = {1024, 256};

DRI_Network *drinet;
DRI_Group *drigroup;

DRI_Distribution *dist_sendside;
DRI_Distribution *dist_recvside;

DRI_Reorg *reorg_sendside;
DRI_Reorg *reorg_recvside;
int reorg_numbufs = 2;

DRI_Buffer *buf_send;
DRI_Buffer *buf_recv;

complex_f *send_ptr;
complex_f *recv_ptr;

const char *chan_string = "transpose";

int main (int argc, char **argv) {

 MPI_Init (&argc, &argv);
 DRI_Init (&argc, &argv, &drinet);

 DRI_Group_import (MPI_COMM_WORLD, &drigroup);

 DRI_Distribution_create_simple (NDIMS, global_dims, drigroup,
                                 DRI_PARTITION_BLOCK_2D_10,
                                 DRI_LAYOUT_PACKED_2D_10,
                                 &dist_sendside);

 DRI_Distribution_create_simple (NDIMS, global_dims, drigroup,
                                 DRI_PARTITION_BLOCK_2D_01,
                                 DRI_LAYOUT_PACKED_2D_01,
                                 &dist_recvside);

 DRI_Reorg_create_system (drinet, DRI_REORG_SEND, chan_string, DRI_COMPLEX,
                          dist_sendside, reorg_numbufs, &reorg_sendside);

 DRI_Reorg_create_system (drinet, DRI_REORG_RECV, chan_string, DRI_COMPLEX,
                          dist_recvside, reorg_numbufs, &reorg_recvside);

 DRI_connect(drinet);

 /***** Entering the operational loop *****/
 for (; ;) {
   DRI_Reorg_get_buffer (reorg_sendside, &buf_send);
   DRI_Buffer_get_ptr (buf_send, &send_ptr);

   /* produce data in the send buffer by using send_ptr */

   DRI_Reorg_put_buffer (reorg_sendside, buf_send);
   DRI_Reorg_get_buffer (reorg_recvside, &buf_recv);
   DRI_Buffer_get_ptr (buf_recv, &recv_ptr);

   /* consume data in the receive buffer by using recv_ptr */
   DRI_Reorg_put_buffer (reorg_recvside, buf_recv);
 }

 DRI_Finalize(drinet);
 MPI_Finalize();
} /* end of main() */
 

4. What could we do to simplify SPMD cases?

Using these EXISTING shortcut default objects:
 + NONE!

Using these PROPOSED new defaults:
 + DRI_PARTITION_BLOCK_2D_10
 + DRI_PARTITION_BLOCK_2D_01
 + DRI_LAYOUT_PACKED_2D_10
 + DRI_LAYOUT_PACKED_2D_01

REMOVING the Global_Data_create, instead moving its input parameters
into an alternative DRI_Distribution_create_simple routine

A SINGLE DRI_Distribution creation to create both "sides"
   (DRI_Distribution_create_SPMD)

A SINGLE DRI_Reorg creation to create both "sides"
   (DRI_Reorg_create_SPMD)

A SINGLE transfer call (DRI_Reorg_getput -- we could use "transfer" instead)
   (DRI_Reorg_SPMD_getput)
 

#define NDIMS 2
int global_dims[NDIMS] = {1024, 256};

DRI_Network *drinet;
DRI_Group *drigroup;

DRI_Distribution *dist_sendside;
DRI_Distribution *dist_recvside;

DRI_Reorg *reorg_sendside;
DRI_Reorg *reorg_recvside;
int reorg_numbufs = 2;

DRI_Buffer *buf_send;
DRI_Buffer *buf_recv;

complex_f *send_ptr;
complex_f *recv_ptr;

const char *chan_string = "transpose";

int main (int argc, char **argv) {

 MPI_Init (&argc, &argv);
 DRI_Init (&argc, &argv, &drinet);

 DRI_Group_import (MPI_COMM_WORLD, &drigroup);

 DRI_Distribution_create_simple_SPMD (NDIMS, global_dims, drigroup,
                                      DRI_PARTITION_BLOCK_2D_10,
                                      DRI_LAYOUT_PACKED_2D_10,
                                      DRI_PARTITION_BLOCK_2D_01,
                                      DRI_LAYOUT_PACKED_2D_01,
                                      &dist_sendside, &dist_recvside);

 DRI_Reorg_create_SPMD (drinet, chan_string, DRI_COMPLEX,
                        reorg_numbufs,
                        dist_sendside, dist_recvside,
                        &reorg_sendside, &reorg_recvside);

 DRI_connect(drinet);

 /***** Entering the operational loop *****/
 for (; ;) {
   DRI_Reorg_get_buffer (reorg_sendside, &buf_send);
   DRI_Buffer_get_ptr (buf_send, &send_ptr);

   /* produce data in the send buffer by using send_ptr */

   DRI_Reorg_SPMD_getput (reorg_sendside, buf_send, reorg_recvside, &buf_recv);

   DRI_Buffer_get_ptr (buf_recv, &recv_ptr);

   /* consume data in the receive buffer by using recv_ptr */
   DRI_Reorg_put_buffer (reorg_recvside, buf_recv);
 }

 DRI_Finalize(drinet);
 MPI_Finalize();
} /* end of main() */