Smart redundancy for distributed computation

Brun, Yuriy; Edwards, George; Bang, Jae young; Medvidovic, Nenad

doi:10.1109/ICDCS.2011.25

by Yuriy Brun, George Edwards, Jae young Bang, Nenad Medvidovic

Abstract:

Many distributed software systems allow participation by large numbers of untrusted, potentially faulty components on an open network. As faults are inevitable in this setting, these systems utilize redundancy and replication to achieve fault tolerance. In this paper, we present a novel smart redundancy technique called iterative redundancy, which ensures efficient replication of computation and data given finite processing and storage resources, even when facing Byzantine faults. Iterative redundancy is more efficient and more adaptive than comparable state-of-the-art techniques that operate in environments with unknown system resource reliability. We show how systems that solve computational problems using a network of independent nodes can benefit from iterative redundancy. We present a formal analytical analysis and an empirical analysis, demonstrate iterative redundancy on a real-world volunteer-computing system, and compare it to existing methods.

View PDF

Citation:

Yuriy Brun, George Edwards, Jae young Bang, and Nenad Medvidovic, Smart redundancy for distributed computation, in Proceedings of the 31st International Conference on Distributed Computing Systems (ICDCS), 2011, pp. 665–676.

A previous version appeared as University of Southern California, Center for Software Engineering technical report USC-CSSE-2009-510.

Bibtex:

@inproceedings{Brun11icdcs,
  author = {Yuriy Brun and George Edwards and Jae young Bang and Nenad
  Medvidovic},
  title = {\href{http://people.cs.umass.edu/brun/pubs/pubs/Brun11icdcs.pdf}{Smart
  redundancy for distributed computation}},
  booktitle = {Proceedings of the 31st International Conference on Distributed
  Computing Systems (ICDCS)},
  venue = {ICDCS},
  address = {Minneapolis, {MN}, {USA}},
  date = {20--24},
  month = {June},
  pages = {665--676},
  year = {2011},
  accept = {$\frac{87}{565} \approx 15\%$},
  doi = {10.1109/ICDCS.2011.25},

  note = {A previous version appeared as University of Southern California,
  Center for Software Engineering technical report USC-CSSE-2009-510.
  \href{https://doi.org/10.1109/ICDCS.2011.25}{DOI: 10.1109/ICDCS.2011.25}},

  previous = {A previous version appeared as University of Southern California,
  Center for Software Engineering technical report USC-CSSE-2009-510.},

  abstract = {Many distributed software systems allow participation by large
  numbers of untrusted, potentially faulty components on an open network. As
  faults are inevitable in this setting, these systems utilize redundancy and
  replication to achieve fault tolerance. In this paper, we present a novel
  smart redundancy technique called iterative redundancy, which ensures
  efficient replication of computation and data given finite processing and
  storage resources, even when facing Byzantine faults. Iterative redundancy
  is more efficient and more adaptive than comparable state-of-the-art
  techniques that operate in environments with unknown system resource
  reliability. We show how systems that solve computational problems using a
  network of independent nodes can benefit from iterative redundancy. We
  present a formal analytical analysis and an empirical analysis, demonstrate
  iterative redundancy on a real-world volunteer-computing system, and compare
  it to existing methods.},

  fundedBy = {NSF CNS-0937060 to the CRA for the CIFellows Project, 
  NSF 0820170, NSF 0905665},
}