--------------------------------------------------- Return-Path: matthai@franklin.cs.washington.edu Received: from franklin.cs.washington.edu (franklin.cs.washington.edu [128.95.2.103]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id SAA01775 for ; Wed, 15 Jan 1997 18:01:55 -0800 Received: from localhost (localhost [127.0.0.1]) by franklin.cs.washington.edu (8.8.3+CSE/7.2ws+) with SMTP id SAA13795 for ; Wed, 15 Jan 1997 18:01:54 -0800 (PST) Message-Id: <199701160201.SAA13795@franklin.cs.washington.edu> X-Mailer: exmh version 1.5.3 12/28/94 To: bershad@franklin.cs.washington.edu Subject: ActiveMessages 552-reading summary Reply-to: Matthai Philipose Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Wed, 15 Jan 1997 18:01:54 PST From: Matthai Philipose This paper introduces a low-level communication mechanism called Active Messages. Active messages are a way to send short messages, often, quickly and asynchronously from one address space to another on a multiprocessor machine, where security and flexible naming are less important than low latency. In functionality, the active message seems comparable to the __send+receive part__ of the SRC RPC. The main differences are: 1)In RPC, the address of the handler is not known by the sender (there's no SPMD model); any server thread can be woken up and delivered the message, after paying the corresponding scheduling overhead. In Active Message world, it seems there's only a single process running on the destination processor, and the sending process can name a "handler" in the user-space of this process that puts the data where it belongs. No expensive handler thread wake-up here, because there are no threads. 2)In RPC, the destination buffer for the message is not pre-determined. Re-use of buffers (as in SRC RPC) means that the buffers in the server-thread's user space are generally pre-allocated; but nevertheless, some overhead is paid in buffer management. 3)The Active Message interrupt software does not seem to do much (software) checking of headers or checksumming, unlike RPC. This is probably because the system runs on dedicated hardware, in a trusted environment. The above three factors contributed heavily to the send/receive part of SRC RPC; active messages avoids these overheads because of the simpler domain they work in. The main achievement of the paper seems to be to realize that in an important class of applications, i.e. SPMD-ish parallel programs running in a trusted environment, the communication model is constrained enough that message passing can be optimized by taking advantage of the constraints (such as the above three). As evidenced by the native communication primitives they compare themselves to, others apparently hadn't made full use of the constraints. This seemed like a parallel programming paper to me (homogeneous, trusted, uniform fixed naming, SPMD-ish environment), rather than a distributed systems paper. It seems difficult to transfer the tricks they talk about to distributed- sytem world; for instance, I found their compiler tricks interesting, but they seem to be oriented towards a SPMD world. --------------------------------------------------- Return-Path: yasushi@silk Received: from silk.cs.washington.edu (silk.cs.washington.edu [128.95.2.238]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id WAA04085 for ; Wed, 15 Jan 1997 22:34:50 -0800 Received: (yasushi@localhost) by silk.cs.washington.edu (8.7.2/7.2ws+) id WAA24544; Wed, 15 Jan 1997 22:34:50 -0800 (PST) Date: Wed, 15 Jan 1997 22:34:50 -0800 (PST) From: yasushi@silk Message-Id: <199701160634.WAA24544@silk.cs.washington.edu> To: Brian Bershad Subject: ActiveMessages 552-reading summary The basic idea behind the active messages is to execute an application specific handler directly in the interrupt context in response to incoming messages. There are two benefits. One is that the handler can read the message without any buffering. The other is that there is no context switching overhead. The paper also proposes a language called split-C, which is a C augumented with asynchronous remote memory reads and writes. Using this language, the authors were able to write a matrix multiplication program that achieves 95% of the maximum possible throughput. Finally, the paper describes what hardware support is needed to exploit active messages fully. I think the idea of active message is not new; many embedded systems allow user defined interrupt handling. What's new is that the paper linked this idea to high performance computing, and actually have shown that it can achieve response magnitude better than other standard mechanisms. The shortcoming of this idea is that it's not generic. It can be used only in a parallel program in which same code image is copied to all the processors. Also, it is applicable only in a situation where synchronization between message handlers and the main computation is not required. --------------------------------------------------- Return-Path: sparekh@crocus Received: from crocus.cs.washington.edu (crocus.cs.washington.edu [128.95.1.67]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with SMTP id XAA04616 for ; Wed, 15 Jan 1997 23:43:14 -0800 Received: (sparekh@localhost) by crocus.cs.washington.edu (8.6.12/7.2ws+) id XAA02422; Wed, 15 Jan 1997 23:43:09 -0800 Date: Wed, 15 Jan 1997 23:43:09 -0800 Message-Id: <199701160743.XAA02422@crocus.cs.washington.edu> From: Sujay Parekh To: bershad@cs Subject: ActiveMessages 552-reading summary Active messages is a new communication mechanism for interprocessor communication in multiprocessor machines. Standard message-passing architectures support a send/receive programming model to exchange data between processors. However, they have extremely high communication overheads (due to the co-ordination required between the processors) which do not permit effective overlap of computation and communication. Consequently, either processor or network bandwidth (or both) end up being wasted. In message-based architectures like the J-machine or Monsoon, support for message handling is explicitly built into the processor. However, their model has the incoming message handlers doing significant computations. This ties up precious network buffer resources. Additionally, this model causes the computation progress to be highly dependent on message arrival, so that each message arrival enables a small amount of computation. Again, little overlap of communication and computation is exploited. In the Active Message model, the communication exchanged between processors consists of some data and a user-defined handler for dealing with the data. The handler's job is to quickly remove the message from the network buffers and integrate it into the computation on that node. Thus, network resources are not tied up for arbitrary periods of time, communication overheads are minimized and communication can be better overlapped with computation. The authors demonstrate efficient implementations of Active messages on both kinds of multiprocessors, and it is encouraging to note that they get good performance even on hardware designed for a different programming paradigm. However, there are several hardware improvements that would further enhance the efficiency of Active Messages. Some network interface improvements include DMA support for large messages and well-designed message registers. Processor support includes ideas like fast polling, user-level trap handling for network interrupts and a dedicate CPU at each node for running message-handlers. --------------------------------------------------- Return-Path: rgrimm@cs.washington.edu Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id AAA04750 for ; Thu, 16 Jan 1997 00:07:09 -0800 Received: from [128.95.8.129] (h129.dyn.cs.washington.edu [128.95.8.129]) by june.cs.washington.edu (8.8.3+CSE/7.2ju) with SMTP id AAA06132 for ; Thu, 16 Jan 1997 00:07:07 -0800 X-Sender: rgrimm@june.cs.washington.edu Message-Id: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Thu, 16 Jan 1997 00:04:51 -0800 To: bershad@cs From: rgrimm@cs.washington.edu (Robert Grimm) Subject: ActiveMessages 552-reading summary Von Eicken et al. observe a mismatch between the traditional programming model on large-scale multiprocessors and the underlying hardware. This mismatch can can be attributed to some combination of complex communication protocols and patterns, blocking communication primitives (for synchronous send/receive operations) and excessive message buffering (for asynchronous send/receive operations). The result is poor overlap between computation and communication. To present a better alternative, von Eicken et al. introduce a new and simple communication mechanism, called "active messages." Essentially, an active message adds the address of a user-space message handler to the header of each message. This short, non-blocking handler is executed at interrupt time and is responsible for injecting the rest of the message data into the computation at the receiving machine. Von Eicken et al. present a simple programming model, implemented on top of active messages in an extension to the C programming language called Split-C, that allows for the better overlapping of computation and communication. They also introduce a new compilation target that allows for the fine-grained scheduling of small computations and builds on top of active messages as the communication mechanism. Using measurements on micro-benchmarks, the authors show that active messages provide just the right balance between software communication mechanism and general purpose computing hardware in a design space whose extremes are marked by the traditional programming model, which favors computation at the cost of communication, and message driven architectures such as the J-Machine and Monsoon, which favor communication at the cost of general purpose computation. However, two points remain unclear: First, to achieve optimal overlap between computation and communication, it seems necessary to hard code the performance characteristics of a given computer platform into the program (or, alternatively, some form of library would have to dynamically determine the communication latencies and then set the appropriate program parameters) which would adversely affect portability (even from one generation of one platform to another). Second, no high-level application benchmarks are presented that compare the different programming models and architectures and clearly show the benefit of active messages. --------------------------------------------------- Return-Path: ddion@june Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id BAA05040 for ; Thu, 16 Jan 1997 01:03:51 -0800 Received: (ddion@localhost) by june.cs.washington.edu (8.8.3+CSE/7.2ju) id BAA08563; Thu, 16 Jan 1997 01:03:50 -0800 From: ddion@june (David Dion) Message-Id: <199701160903.BAA08563@june.cs.washington.edu> Subject: ActiveMessages 552-reading summary To: bershad@cs Date: Thu, 16 Jan 1997 01:03:49 -0800 (PST) Cc: ddion@june (David Dion) X-Mailer: ELM [version 2.4 PL23] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Existing message-passing machines focus on raw processor power, rather than network performance or interaction between processors and the network. As a result, there is poor overlap between communication and computation, and processor usage is degraded. Active Messages seeks to integrate computation and communication, laying the foundations of a new programming model for message-passing multiprocessors. Active Messages contain in their headers the address of a handler to be invoked upon message arrival. The handler extracts the contents of the message from the network packet and integrates it into computation on the processing node. The handler must execute quickly and to completion. It is not preempted and it cannot block, as it runs at interrupt level. Active Messages have several advantages over traditional send/receive message-passing. For instance, complex buffering is not required on the receive end, since packet data is extracted immediately by the handler. Active Messages is a mechanism which supports several programming models. Split-C provides split-phase remote memory operations for C by adding two operations: PUT, which copies a local memory to remote memory, and GET, which retrieves a block of remote memory and makes a local copy. Both operations are asynchronous, allowing computation to continue while they take place. In order to maximize this benefit, the programmer must estimate the operation latency so that data is ready when needed. A second programming model is based on TAM (Threaded Abstract Machine). TAM is used as a compilation target for parallel languages. It organizes threads into activation frames for each function call, creating a two-level scheduler. Handlers deliver messages directly to executing activations or to activations waiting on the ready queue. Active Messages can be improved by hardware support in network interfaces and processors. --------------------------------------------------- Return-Path: tian@wally Received: from wally.cs.washington.edu (wally.cs.washington.edu [128.95.2.122]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id CAA05783 for ; Thu, 16 Jan 1997 02:42:51 -0800 Received: (tian@localhost) by wally.cs.washington.edu (8.8.3+CSE/7.2ws+) id CAA28377; Thu, 16 Jan 1997 02:42:50 -0800 (PST) Date: Thu, 16 Jan 1997 02:42:50 -0800 (PST) From: Tian Lim To: bershad@cs Subject: ActiveMessages 552-reading summary Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Active messages are an asynchronous messaging mechanism based on a fundamental observation : the traditional "send/receive" model does not match the way current machines process network packets, which is essentially "send/get an interrupt". The authors exploit this native hardware bias in order to overlap computation and communication, and to reduce processing overhead. Active messages simply add a pointer to a user specified handler in the header of each message. This handler should not block and is responsible only for moving the message from the packet into the ongoing computation. As a result, no complex buffering is necessary. The authors present Split-C, an extension to C that implements asynchronous remote memory accesses via active messages, and demonstrate how a matrix multiply using this model may achieve 95% peak performance on large nCUBE/2's. The authors show that this mechanism is superior to message-driven architectures such as the J-Machine and Monsoon. Because the handlers in these schemes may block, interrupt level scheduling is handled by hardware. In addition, the shortlived handler contexts lead to less locality. To demonstrate that active messages are a good mechanism upon which to build a message driven model, the authors present TAM which essentially uses active messages to trigger fine grained scheduling. The primary drawbacks to this mechanism are that it ignores security and requires a single executable image be present on all nodes. It seems that beyond the coscheduling ideas, and extensions to the general active message idea (e.g. active networks), there is little we can leverage from this paper in distributed systems. --------------------------------------------------- Return-Path: sungeun@wormwood Received: from wormwood.cs.washington.edu (wormwood.cs.washington.edu [128.95.2.107]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA08332 for ; Thu, 16 Jan 1997 08:48:45 -0800 Received: (sungeun@localhost) by wormwood.cs.washington.edu (8.8.3+CSE/7.2ws+) id IAA31547; Thu, 16 Jan 1997 08:48:44 -0800 (PST) Date: Thu, 16 Jan 1997 08:48:44 -0800 (PST) Message-Id: <199701161648.IAA31547@wormwood.cs.washington.edu> From: Sung-Eun Choi To: bershad@cs Subject: ActiveMessages 552-reading summary Reply-To: sungeun@cs.washington.edu Active Messages is a low-level communication mechanism designed to minimize overhead as well as enable overlap of communication with computation (the two do not necessarily go hand-in-hand). Programmers are not expected to use Active Messages directly, rather more "user-friendly" communication primitives or actual language support are expected to be built on top of AM. It appears that using AM requires a SPMD program (as the handler address needs to be known by the sender of the message), though perhaps it can be extended to true MIMD using a handler "name server" or file. The fundamental ideas behind Active Messages come directly from message driven execution. The notion that the message itself triggers some action to occur is the same, but the definition of a message and the action itself is restricted. At a lower level, this model is closely related to the model implemented in modern message passing computers. Specifically, the processor (or co-processor) is notified when a message arrives and a special message handler is executed. Note that the Cray T3D provides similar abstractions in their user level communication routines, though the ability to provide a user level handler is not provided. Programmers find such abstraction difficult to use and often resort to utilizing parallelizing compilers, implicitly parallel languages or message passing libraries built on top of such primitives. --------------------------------------------------- Return-Path: govindk@shasta.ee.washington.edu Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA08371 for ; Thu, 16 Jan 1997 08:50:39 -0800 Received: from shasta.ee.washington.edu (shasta.ee.washington.edu [128.95.28.11]) by june.cs.washington.edu (8.8.3+CSE/7.2ju) with SMTP id IAA26714 for ; Thu, 16 Jan 1997 08:50:38 -0800 Received: from andes.faulty (andes.ee.washington.edu) by shasta.ee.washington.edu (4.1/SMI-4.1) id AA01736; Thu, 16 Jan 97 08:51:44 GMT Received: by andes.faulty (SMI-8.6/SMI-SVR4) id IAA13846; Thu, 16 Jan 1997 08:51:31 -0800 Date: Thu, 16 Jan 1997 08:51:31 -0800 From: govindk@shasta.ee.washington.edu (Govindarajan K) Message-Id: <199701161651.IAA13846@andes.faulty> To: bershad@cs Subject: ActiveMessages 552-reading summary The authors present a novel communication scheme which aims in maximizing the overlap between the computation and communication, and minimizing the communication overhead in a distributed system. This they do by proposing a novel communication mechanism called Active messages. The "active " part of the term comes from the fact that the header contains the address of the user-level instruction sequence that will extract the message from the network and integrate it with the ongoing computation. Active messages is a mechanism to implement the traditional send/receive paradigm for communication. The features of the Active messages mechanism are: (i) Messages are not buffered except for network transport. This reduces the buffer management to the minimum required for data trasnsport. (ii) Active Message handlers execute immediately on message arrival,cannot suspend and terminate quickly and therefore do not backup the network. (iii) Relies on the SPMD programming model. The authors have demonstrated the efficiency of active messages on several architectures and show that they get good performance. Using a new language called Split-C (C + ability to do remote memory operations) they show using using matrix multiplication application, the capability of achieving upto 95% of peak performance on large nCubes. The authors show using TAM(Threaded Abstract Machine) a methodology for fine grained scheduling. The authors conclude the paper presenting methods for improving the hardware support for Active Messages. This includes improvements to the (i)Network interface: including DMA support for large messages and reuse of message data. (ii) processor: including having two processors one for computation and one for message handling, user level interupts --------------------------------------------------- Return-Path: echris@merganser Received: from merganser.cs.washington.edu (merganser.cs.washington.edu [128.95.2.192]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA08449 for ; Thu, 16 Jan 1997 08:56:21 -0800 Received: (echris@localhost) by merganser.cs.washington.edu (8.8.3+CSE/7.2ws+) id IAA12886; Thu, 16 Jan 1997 08:56:20 -0800 (PST) Date: Thu, 16 Jan 1997 08:56:20 -0800 (PST) From: echris@merganser Message-Id: <199701161656.IAA12886@merganser.cs.washington.edu> To: bershad@cs Subject: ActiveMessages 552-reading summary Reply-to: echris@cs.washington.edu This paper presents a communication mechanism, called active messages. The mechanism is motivated by the observation that conventional send/recv operations on message passing machines do not map efficiently to the hardware (i.e., they permit a mismatch between programming model and hardware functionality). Though arguably a convenient and high-level mechanism (yeah, right), send/recv operations suffer in that they often make difficult the task of overlapping communication and computations, and their semantics often require excessive buffering. Active messages hopes to right both deficiencies by providing a simple mechanism that maps directly to the hardware: an active message is a one-way message wherein the sender specifies a handler that the receiver is to use to process the message. Yeah, it's low level, but send/recv is not, if any, better. The Split-C language and its active message-base GET/PUT communication mechanism are described, and it is argued that it performs well on matrix/matrix multiply. Message driven architectures, such as J-Machine and Monsoon, are also discussed. The argument is that message driven architectures are fundamentally flawed (lack of locality due to short contexts) and that active messages can be used to simulate the message driven model on conventional hardware. The threaded abstract machines (TAM) is a model designed for this purpose. Finally, a few notes on how changes to the network interface and processor hardware could further improve active message performance. --------------------------------------------------- Return-Path: mernst@columbine Received: from columbine.cs.washington.edu (columbine.cs.washington.edu [128.95.1.66]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with SMTP id JAA08540 for ; Thu, 16 Jan 1997 09:01:56 -0800 Received: (mernst@localhost) by columbine.cs.washington.edu (8.6.12/7.2ws+) id JAA20188; Thu, 16 Jan 1997 09:01:46 -0800 Date: Thu, 16 Jan 1997 09:01:46 -0800 Message-Id: <199701161701.JAA20188@columbine.cs.washington.edu> From: Michael Ernst To: bershad@cs Subject: ActiveMessages 552-reading summary Active messages: a mechanism for integrated communication and computation An Active Message is one which contains, instead of traditional control information at its head, an address of a user-level message handler. Upon receipt of the message, the processor jumps to that instruction sequence, which does whatever is necessary (perform computation, copy data, invoke threads, etc.). This is a very efficient mechanism which exhibits order-of-magnitude improvements over traditional techniques; observed performance backs up their claims. Some downsides (many of which can be worked around): * All processors must be running the same image (so that they can intelligibly speak of addresses in other local memories). * Message handlers must * A mistake is more fatal than it might be in another system. * The communication is more explicit than in other programming models. The authors argue that Active Messages, a software mechanism, are superior to specialized hardware. (Such specialized hardware tends not to balance communication and computation.) They show how active messages can emulate two ends of the design space: CM-5/nCUBE and J-Machine/Monsoon. They also suggest hardware mechanisms that could speed up active messages; they consider this a better use of hardware design resources than previous efforts at building distributed and parallel machines. The programming model (implemented in Split-C) is get/put. After a get request, a thread can become inactive until the PC jumps back to the put handler, which picks up right after the get. There's no busy-waiting, as is the case in three-phase commit protocols. Active messages are intended to be small; the hander must execute quickly and to completion, to avoid tying up the network. No buffering of active messages is done.