---------------------------------------------------
Return-Path: matthai@franklin.cs.washington.edu
Received: from franklin.cs.washington.edu (franklin.cs.washington.edu [128.95.2.103]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id SAA01775 for <bershad@whistler.cs.washington.edu>; Wed, 15 Jan 1997 18:01:55 -0800
Received: from localhost (localhost [127.0.0.1]) by franklin.cs.washington.edu (8.8.3+CSE/7.2ws+) with SMTP id SAA13795 for <bershad>; Wed, 15 Jan 1997 18:01:54 -0800 (PST)
Message-Id: <199701160201.SAA13795@franklin.cs.washington.edu>
X-Mailer: exmh version 1.5.3 12/28/94
To: bershad@franklin.cs.washington.edu
Subject: ActiveMessages 552-reading summary 
Reply-to: Matthai Philipose <matthai@cs.washington.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 15 Jan 1997 18:01:54 PST
From: Matthai Philipose <matthai@franklin.cs.washington.edu>

This paper introduces a low-level communication mechanism called
Active Messages. Active messages are a way to send short messages,
often, quickly and asynchronously from one address space to another on a 
multiprocessor machine, where security and flexible naming are less
important than low latency. 

In functionality, the active message seems comparable to the __send+receive 
part__ of the SRC RPC. The main differences are:
1)In RPC, the address of the handler is not known by the sender (there's
  no SPMD model); any server thread can be woken up and delivered the message, 
  after paying the corresponding scheduling overhead. 

  In Active Message world, it seems there's only a single process running on 
the
  destination processor, and the sending process can name a "handler" in the
  user-space of this process that puts the data where it belongs. No expensive
  handler thread wake-up here, because there are no threads.
 
2)In RPC, the destination buffer for the message is not pre-determined. 
  Re-use of buffers (as in SRC RPC) means that the buffers in the 
  server-thread's user space are generally pre-allocated; but nevertheless, 
  some overhead is paid in buffer management.

3)The Active Message interrupt software does not seem to do much (software)
  checking of headers or checksumming, unlike RPC. This is probably because
  the system runs on dedicated hardware, in a trusted environment.

The above three factors contributed heavily to the send/receive part of
SRC RPC; active messages avoids these overheads because of the simpler
domain they work in. The main achievement of the paper seems to be to realize 
that in an important  class of applications, i.e. SPMD-ish parallel programs 
running in a trusted  environment, the communication model is constrained 
enough that message passing can be optimized by taking advantage of the 
constraints (such as the above three). As evidenced by the native 
communication primitives they compare themselves to, others apparently hadn't 
made full use 
of the constraints. 

This seemed like a parallel programming paper to me (homogeneous, trusted,
uniform fixed naming, SPMD-ish environment), rather than a distributed systems
paper. It seems difficult to transfer the tricks they talk about to 
distributed-
sytem world; for instance, I found their compiler tricks interesting, but
they seem to be oriented towards a SPMD world.

---------------------------------------------------
Return-Path: yasushi@silk
Received: from silk.cs.washington.edu (silk.cs.washington.edu [128.95.2.238]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id WAA04085 for <bershad@whistler.cs.washington.edu>; Wed, 15 Jan 1997 22:34:50 -0800
Received: (yasushi@localhost) by silk.cs.washington.edu (8.7.2/7.2ws+) id WAA24544; Wed, 15 Jan 1997 22:34:50 -0800 (PST)
Date: Wed, 15 Jan 1997 22:34:50 -0800 (PST)
From: yasushi@silk
Message-Id: <199701160634.WAA24544@silk.cs.washington.edu>
To: Brian Bershad <bershad@lace.cs.washington.edu>
Subject: ActiveMessages 552-reading summary

The basic idea behind the active messages is to execute an application
specific handler directly in the interrupt context in response to
incoming messages. There are two benefits. One is that the handler can
read the message without any buffering. The other is that there is no
context switching overhead. The paper also proposes a language called
split-C, which is a C augumented with asynchronous remote memory reads
and writes. Using this language, the authors were able to write a
matrix multiplication program that achieves 95% of the maximum
possible throughput.  Finally, the paper describes what hardware
support is needed to exploit active messages fully.

I think the idea of active message is not new; many embedded systems
allow user defined interrupt handling. What's new is that the paper
linked this idea to high performance computing, and actually have
shown that it can achieve response magnitude better than other
standard mechanisms.

The shortcoming of this idea is that it's not generic. It can be used
only in a parallel program in which same code image is copied to all
the processors. Also, it is applicable only in a situation where
synchronization between message handlers and the main computation is
not required.
---------------------------------------------------
Return-Path: sparekh@crocus
Received: from crocus.cs.washington.edu (crocus.cs.washington.edu [128.95.1.67]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with SMTP id XAA04616 for <bershad@whistler.cs.washington.edu>; Wed, 15 Jan 1997 23:43:14 -0800
Received: (sparekh@localhost) by crocus.cs.washington.edu (8.6.12/7.2ws+) id XAA02422; Wed, 15 Jan 1997 23:43:09 -0800
Date: Wed, 15 Jan 1997 23:43:09 -0800
Message-Id: <199701160743.XAA02422@crocus.cs.washington.edu>
From: Sujay Parekh <sparekh@cs.washington.edu>
To: bershad@cs
Subject: ActiveMessages 552-reading summary


Active messages is a new communication mechanism for interprocessor
communication in multiprocessor machines.  Standard message-passing
architectures support a send/receive programming model to exchange
data between processors.  However, they have extremely high
communication overheads (due to the co-ordination required between the
processors) which do not permit effective overlap of computation and
communication.  Consequently, either processor or network bandwidth
(or both) end up being wasted.  In message-based architectures like
the J-machine or Monsoon, support for message handling is explicitly
built into the processor.  However, their model has the incoming
message handlers doing significant computations.  This ties up
precious network buffer resources.  Additionally, this model causes
the computation progress to be highly dependent on message arrival, so
that each message arrival enables a small amount of computation.
Again, little overlap of communication and computation is exploited.

In the Active Message model, the communication exchanged between
processors consists of some data and a user-defined handler for
dealing with the data.  The handler's job is to quickly remove the
message from the network buffers and integrate it into the computation
on that node.  Thus, network resources are not tied up for arbitrary
periods of time, communication overheads are minimized and
communication can be better overlapped with computation.  

The authors demonstrate efficient implementations of Active messages
on both kinds of multiprocessors, and it is encouraging to note that
they get good performance even on hardware designed for a different
programming paradigm.  However, there are several hardware
improvements that would further enhance the efficiency of Active
Messages.  Some network interface improvements include DMA support for
large messages and well-designed message registers.  Processor support
includes ideas like fast polling, user-level trap handling for network
interrupts and a dedicate CPU at each node for running
message-handlers.
---------------------------------------------------
Return-Path: rgrimm@cs.washington.edu
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id AAA04750 for <bershad@whistler.cs.washington.edu>; Thu, 16 Jan 1997 00:07:09 -0800
Received: from [128.95.8.129] (h129.dyn.cs.washington.edu [128.95.8.129]) by june.cs.washington.edu (8.8.3+CSE/7.2ju) with SMTP id AAA06132 for <bershad@cs>; Thu, 16 Jan 1997 00:07:07 -0800
X-Sender: rgrimm@june.cs.washington.edu
Message-Id: <v02140b00af03918acdc7@[128.95.8.121]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Thu, 16 Jan 1997 00:04:51 -0800
To: bershad@cs
From: rgrimm@cs.washington.edu (Robert Grimm)
Subject: ActiveMessages 552-reading summary

Von Eicken et al. observe a mismatch between the traditional
programming model on large-scale multiprocessors and the underlying
hardware. This mismatch can can be attributed to some combination of
complex communication protocols and patterns, blocking communication
primitives (for synchronous send/receive operations) and excessive
message buffering (for asynchronous send/receive operations). The
result is poor overlap between computation and communication. To
present a better alternative, von Eicken et al. introduce a new and
simple communication mechanism, called "active messages."
Essentially, an active message adds the address of a user-space
message handler to the header of each message. This short, non-blocking
handler is executed at interrupt time and is responsible for injecting
the rest of the message data into the computation at the receiving
machine.  Von Eicken et al. present a simple programming model,
implemented on top of active messages in an extension to the C
programming language called Split-C, that allows for the better
overlapping of computation and communication. They also introduce a new
compilation target that allows for the fine-grained scheduling of small
computations and builds on top of active messages as the communication
mechanism.

Using measurements on micro-benchmarks, the authors show that active
messages provide just the right balance between software communication
mechanism and general purpose computing hardware in a design space
whose extremes are marked by the traditional programming model, which
favors computation at the cost of communication, and message driven
architectures such as the J-Machine and Monsoon, which favor
communication at the cost of general purpose computation. However, two
points remain unclear: First, to achieve optimal overlap between
computation and communication, it seems necessary to hard code the
performance characteristics of a given computer platform into the
program (or, alternatively, some form of library would have to
dynamically determine the communication latencies and then set the
appropriate program parameters) which would adversely affect
portability (even from one generation of one platform to another).
Second, no high-level application benchmarks are presented that
compare the different programming models and architectures and clearly
show the benefit of active messages.


---------------------------------------------------
Return-Path: ddion@june
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id BAA05040 for <bershad@whistler.cs.washington.edu>; Thu, 16 Jan 1997 01:03:51 -0800
Received: (ddion@localhost) by june.cs.washington.edu (8.8.3+CSE/7.2ju) id BAA08563; Thu, 16 Jan 1997 01:03:50 -0800
From: ddion@june (David Dion)
Message-Id: <199701160903.BAA08563@june.cs.washington.edu>
Subject: ActiveMessages 552-reading summary
To: bershad@cs
Date: Thu, 16 Jan 1997 01:03:49 -0800 (PST)
Cc: ddion@june (David Dion)
X-Mailer: ELM [version 2.4 PL23]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Existing message-passing machines focus on raw processor power, rather
than network performance or interaction between processors and the
network.  As a result, there is poor overlap between communication and
computation, and processor usage is degraded.  Active Messages seeks
to integrate computation and communication, laying the foundations of
a new programming model for message-passing multiprocessors.

Active Messages contain in their headers the address of a handler to
be invoked upon message arrival.  The handler extracts the contents of
the message from the network packet and integrates it into computation
on the processing node.  The handler must execute quickly and to
completion.  It is not preempted and it cannot block, as it runs at
interrupt level.  Active Messages have several advantages over
traditional send/receive message-passing.  For instance, complex
buffering is not required on the receive end, since packet data is
extracted immediately by the handler.

Active Messages is a mechanism which supports several programming
models.  Split-C provides split-phase remote memory operations for C
by adding two operations: PUT, which copies a local memory to remote
memory, and GET, which retrieves a block of remote memory and makes a
local copy.  Both operations are asynchronous, allowing computation to
continue while they take place.  In order to maximize this benefit,
the programmer must estimate the operation latency so that data is
ready when needed.  A second programming model is based on TAM
(Threaded Abstract Machine).  TAM is used as a compilation target for
parallel languages.  It organizes threads into activation frames for
each function call, creating a two-level scheduler.  Handlers deliver
messages directly to executing activations or to activations waiting
on the ready queue.  

Active Messages can be improved by hardware support in network
interfaces and processors.
---------------------------------------------------
Return-Path: tian@wally
Received: from wally.cs.washington.edu (wally.cs.washington.edu [128.95.2.122]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id CAA05783 for <bershad@whistler.cs.washington.edu>; Thu, 16 Jan 1997 02:42:51 -0800
Received: (tian@localhost) by wally.cs.washington.edu (8.8.3+CSE/7.2ws+) id CAA28377; Thu, 16 Jan 1997 02:42:50 -0800 (PST)
Date: Thu, 16 Jan 1997 02:42:50 -0800 (PST)
From: Tian Lim <tian@wally.cs.washington.edu>
To: bershad@cs
Subject: ActiveMessages 552-reading summary
Message-ID: <Pine.OSF.3.91.970116023128.28041A-100000@wally.cs.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Active messages are an asynchronous messaging mechanism based on a 
fundamental observation : the traditional "send/receive" model does not 
match the way current machines process network packets, which is 
essentially "send/get an interrupt".  The authors exploit this native 
hardware bias in order to overlap computation and communication, and to 
reduce processing overhead.

Active messages simply add a pointer to a user specified handler in the 
header of each message.  This handler should not block and is responsible 
only for moving the message from the packet into the ongoing 
computation.  As a result, no complex buffering is necessary.  The 
authors present Split-C, an extension to C that implements asynchronous 
remote memory accesses via active messages, and demonstrate how a matrix 
multiply using this model may achieve 95% peak performance on large 
nCUBE/2's.

The authors show that this mechanism is superior to message-driven 
architectures such as the J-Machine and Monsoon.  Because the handlers in 
these schemes may block, interrupt level scheduling is handled by 
hardware.  In addition, the shortlived handler contexts lead to less 
locality.  To demonstrate that active messages are a good mechanism upon 
which to build a message driven model, the authors present TAM which 
essentially uses active messages to trigger fine grained scheduling.

The primary drawbacks to this mechanism are that it ignores security and
requires a single executable image be present on all nodes.  It seems that
beyond the coscheduling ideas, and extensions to the general active
message idea (e.g. active networks), there is little we can leverage from
this paper in distributed systems. 

---------------------------------------------------
Return-Path: sungeun@wormwood
Received: from wormwood.cs.washington.edu (wormwood.cs.washington.edu [128.95.2.107]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA08332 for <bershad@whistler.cs.washington.edu>; Thu, 16 Jan 1997 08:48:45 -0800
Received: (sungeun@localhost) by wormwood.cs.washington.edu (8.8.3+CSE/7.2ws+) id IAA31547; Thu, 16 Jan 1997 08:48:44 -0800 (PST)
Date: Thu, 16 Jan 1997 08:48:44 -0800 (PST)
Message-Id: <199701161648.IAA31547@wormwood.cs.washington.edu>
From: Sung-Eun Choi <sungeun@cs.washington.edu>
To: bershad@cs
Subject: ActiveMessages 552-reading summary
Reply-To: sungeun@cs.washington.edu

Active Messages is a low-level communication mechanism designed to
minimize overhead as well as enable overlap of communication with
computation (the two do not necessarily go hand-in-hand).  Programmers
are not expected to use Active Messages directly, rather more
"user-friendly" communication primitives or actual language support
are expected to be built on top of AM.  It appears that using AM
requires a SPMD program (as the handler address needs to be known by
the sender of the message), though perhaps it can be extended to true
MIMD using a handler "name server" or file.

The fundamental ideas behind Active Messages come directly from
message driven execution.  The notion that the message itself triggers
some action to occur is the same, but the definition of a message and
the action itself is restricted.  At a lower level, this model is
closely related to the model implemented in modern message passing
computers.  Specifically, the processor (or co-processor) is notified
when a message arrives and a special message handler is executed.
Note that the Cray T3D provides similar abstractions in their user
level communication routines, though the ability to provide a user
level handler is not provided.  Programmers find such abstraction
difficult to use and often resort to utilizing parallelizing
compilers, implicitly parallel languages or message passing libraries
built on top of such primitives.

---------------------------------------------------
Return-Path: govindk@shasta.ee.washington.edu
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.2.4]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA08371 for <bershad@whistler.cs.washington.edu>; Thu, 16 Jan 1997 08:50:39 -0800
Received: from shasta.ee.washington.edu (shasta.ee.washington.edu [128.95.28.11]) by june.cs.washington.edu (8.8.3+CSE/7.2ju) with SMTP id IAA26714 for <bershad@cs>; Thu, 16 Jan 1997 08:50:38 -0800
Received: from andes.faulty (andes.ee.washington.edu) by shasta.ee.washington.edu (4.1/SMI-4.1)
	id AA01736; Thu, 16 Jan 97 08:51:44 GMT
Received: by andes.faulty (SMI-8.6/SMI-SVR4)
	id IAA13846; Thu, 16 Jan 1997 08:51:31 -0800
Date: Thu, 16 Jan 1997 08:51:31 -0800
From: govindk@shasta.ee.washington.edu (Govindarajan K)
Message-Id: <199701161651.IAA13846@andes.faulty>
To: bershad@cs
Subject: ActiveMessages 552-reading summary

The authors present a novel communication scheme which aims in
maximizing the overlap between the computation and communication,
and minimizing the communication overhead in a distributed system. 
This they do by proposing a novel communication mechanism called
Active messages. The "active " part of the term comes from the 
fact that the header contains the address of the user-level instruction
sequence that will extract the message from the network and integrate
it with the ongoing computation. Active messages is a mechanism to 
implement the traditional send/receive paradigm for communication.

The features of the Active messages mechanism are:

(i) Messages are not buffered except for network transport. This
reduces the buffer management to the minimum required for data trasnsport.
(ii) Active Message handlers execute immediately on message arrival,cannot suspend and terminate quickly and therefore do not backup the
network. 
(iii) Relies on the SPMD programming model.

The authors have demonstrated the efficiency of active messages
on several architectures and show that they get good performance.
Using a new language called Split-C (C + ability to do remote memory
operations) they show using using matrix multiplication application,
the capability of achieving upto 95% of peak performance on large nCubes.

The authors show using TAM(Threaded Abstract Machine) a methodology for
fine grained scheduling.
The authors conclude the paper presenting methods for improving
the hardware support for Active Messages.
This includes improvements to the
(i)Network interface: including DMA support for 
large messages and reuse of message data.
(ii) processor: including having two processors one for 
computation and one for message handling, user level interupts 
---------------------------------------------------
Return-Path: echris@merganser
Received: from merganser.cs.washington.edu (merganser.cs.washington.edu [128.95.2.192]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with ESMTP id IAA08449 for <bershad@whistler.cs.washington.edu>; Thu, 16 Jan 1997 08:56:21 -0800
Received: (echris@localhost) by merganser.cs.washington.edu (8.8.3+CSE/7.2ws+) id IAA12886; Thu, 16 Jan 1997 08:56:20 -0800 (PST)
Date: Thu, 16 Jan 1997 08:56:20 -0800 (PST)
From: echris@merganser
Message-Id: <199701161656.IAA12886@merganser.cs.washington.edu>
To: bershad@cs
Subject: ActiveMessages 552-reading summary 
Reply-to: echris@cs.washington.edu

This paper presents a communication mechanism, called active messages.
The mechanism is motivated by the observation that conventional
send/recv operations on message passing machines do not map efficiently
to the hardware (i.e., they permit a mismatch between programming model
and hardware functionality).  Though arguably a convenient and
high-level mechanism (yeah, right), send/recv operations suffer in that
they often make difficult the task of overlapping communication and
computations, and their semantics often require excessive buffering.
Active messages hopes to right both deficiencies by providing a simple
mechanism that maps directly to the hardware: an active message is a
one-way message wherein the sender specifies a handler that the receiver
is to use to process the message.  Yeah, it's low level, but send/recv
is not, if any, better.

The Split-C language and its active message-base GET/PUT communication
mechanism are described, and it is argued that it performs well on
matrix/matrix multiply.  Message driven architectures, such as J-Machine
and Monsoon, are also discussed.  The argument is that message driven
architectures are fundamentally flawed (lack of locality due to short
contexts) and that active messages can be used to simulate the message
driven model on conventional hardware.  The threaded abstract machines
(TAM) is a model designed for this purpose.  Finally, a few notes on how
changes to the network interface and processor hardware could further
improve active message performance.
---------------------------------------------------
Return-Path: mernst@columbine
Received: from columbine.cs.washington.edu (columbine.cs.washington.edu [128.95.1.66]) by whistler.cs.washington.edu (8.8.3/7.2ws+) with SMTP id JAA08540 for <bershad@whistler.cs.washington.edu>; Thu, 16 Jan 1997 09:01:56 -0800
Received: (mernst@localhost) by columbine.cs.washington.edu (8.6.12/7.2ws+) id JAA20188; Thu, 16 Jan 1997 09:01:46 -0800
Date: Thu, 16 Jan 1997 09:01:46 -0800
Message-Id: <199701161701.JAA20188@columbine.cs.washington.edu>
From: Michael Ernst <mernst@cs.washington.edu>
To: bershad@cs
Subject: ActiveMessages 552-reading summary 

Active messages:  a mechanism for integrated communication and computation

An Active Message is one which contains, instead of traditional control
information at its head, an address of a user-level message handler.  Upon
receipt of the message, the processor jumps to that instruction sequence,
which does whatever is necessary (perform computation, copy data, invoke
threads, etc.).  This is a very efficient mechanism which exhibits
order-of-magnitude improvements over traditional techniques; observed
performance backs up their claims.

Some downsides (many of which can be worked around):
 * All processors must be running the same image (so that they can
   intelligibly speak of addresses in other local memories).
 * Message handlers must 
 * A mistake is more fatal than it might be in another system.
 * The communication is more explicit than in other programming models.

The authors argue that Active Messages, a software mechanism, are superior
to specialized hardware.  (Such specialized hardware tends not to balance
communication and computation.)  They show how active messages can emulate
two ends of the design space:  CM-5/nCUBE and J-Machine/Monsoon.  They also
suggest hardware mechanisms that could speed up active messages; they
consider this a better use of hardware design resources than previous
efforts at building distributed and parallel machines.

The programming model (implemented in Split-C) is get/put.  After a get
request, a thread can become inactive until the PC jumps back to the put
handler, which picks up right after the get.  There's no busy-waiting, as
is the case in three-phase commit protocols.

Active messages are intended to be small; the hander must execute quickly
and to completion, to avoid tying up the network.  No buffering of active
messages is done.