CSE logo University of Washington Computer Science & Engineering
 Hubble: Monitoring Internet Reachability in Real-Time
  CSE Home   About Us    Search    Contact Info 

 Ethan Katz-Bassett
 Harsha Madhyastha
 John P. John
 Tom Anderson
 Arvind Krishnamurthy
 David Wetherall
   

Overview

Global reachability -- when every address is reachable from every other address -- is the most basic goal of the Internet. It was specified as a top priority in the original design of the Internet protocols, ahead of high performance or good quality of service, with the philosophy that "there is only one failure, and it is complete partition." However, this is not always the case in practice; traffic may disappear into black holes and consistently fail to reach the destination. This is problematic when the outages are not simply transient, as an operator generally has little visibility into other ASes to discern the nature of an outage and little ability to check if the problem exists from other vantages points.

We present Hubble, a system that operates continuously to find Internet reachability problems in which routes exist to a destination but packets are unable to reach the destination. Hubble allows us to characterize global Internet reachability by identifying how many prefixes are reachable from some vantages and not others, how often these problems occur, and how long they persist. Whereas previous work focused on reachability within the narrower context of an AS, testbed, or set of clients, or obtained breadth by monitoring routes only via BGP, Hubble monitors the data-path to prefixes that cover 89% of the Internet's edge address space at a 15 minute granularity. Key enabling techniques include a hybrid passive/active monitoring approach and the synthesis of multiple information sources, including historical data and spoofed probes to isolate failures.

Papers

  • Studying Black Holes in the Internet with Hubble [ pdf , html ]
    E. Katz-Bassett, H. V. Madhyastha, J. P. John, A. Krishnamurthy, D. Wetherall, T. Anderson.
    USENIX Symposium on Networked Systems Design & Implementation (NSDI), 2008.

Talks

  • Monitoring Internet Reachability Problems with Hubble
    E. Katz-Bassett.
    Invited talk, Gnomedex, August 2008.
  • Studying Black Holes on the Internet with Hubble
    E. Katz-Bassett.
    Invited talk, 10th CAIDA-WIDE Workshop, August 2008.
  • Hubble: Monitoring Internet Reachability in Real Time
    E. Katz-Bassett.
    Invited talk, Réseaux IP Européens (RIPE) 56, May 2008.
  • Real-time Blackhole Analysis with Hubble
    E. Katz-Bassett. North American Network Operators Group, June 2007.
    Video and PDF slides available.

Press

System Description

Hubble consists of three high-level components, each of which employs various network measurements and techniques:
  1. Target Identification - Using both active and passive monitoring, Hubble identifies prefixes likely to be experiencing problems as targets for further investigation.
    • Distributed monitors running on PlanetLab report when a previously responsive IP stops responding to pings.
    • The system monitors RouteViews BGP updates and reports prefixes experiencing path changes at multiple RouteViews peers.
  2. Reachability analysis - Hubble assesses the reachability of the identified target prefixes.
    • The system launches traceroutes to destinations in the prefixes from PlanetLab sites around the world.
    • It compares these traceroutes to current BGP snapshots from RouteViews and to iPlane alias information to determine at which router, prefix, and AS each probe terminates.
    • It assigns for probing in the next round any prefixes it finds to be experincing problems.
  3. Problem Classification - To aid operators and others in understanding the problem, Hubble automatically classifies problems according to three questions:
    • Which AS contains the problem?
      • The system groups the failed traceroutes and determines in which AS(s) a substantial number terminate.
    • Which routers might be causing the problem?
      • The system assesses whether all traceroutes that reach the AS terminate, or only those through certain routers.
      • For each suspect router, the system examines its historical records to see if the prefix used to be reachable through the router. If so, it checks if the next router along the path responds to pings.
    • Which destinations are affected?
      • Internet routes are often asymmetric, differing in the forward and reverse direction. A failed traceroute signals that at least one direction is not functioning, but leaves it difficult or impossible to infer which.
      • We employ an innovative technique using spoofed probes to isolate the direction of failure five times more frequently than previous techniques.


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to arvind]