CSE logo University of Washington Department of Computer Science & Engineering
 CSE 590z and 590zz - Theory Seminars, Winter 2002
  CSE Home     Previous Quarters  About Us    Search    Contact Info 

590z meets Tuesdays, 1:30-2:20pm in GWN 301. 590zz meets Wednesdays, 2:30-3:20pm in MOR 220.

Click here to view the combined email archive for cse590z@cs and theory-group@cs.

The plan for 590z this quarter is to continue presenting material from Proofs from The Book, although people who would like to talk about something else from the current theory literature are welcome to do so. There will also be a few outside speakers talking about their recent research in theoretical computer science. 590zz, as usual, will focus on the current research work of students and faculty. This can be anything from half-baked to fully developed ideas. The point is to let everyone in the theory group know what you are working on and perhaps get some useful insights from others into problems that you are stuck at. Graduating students are also welcome to present their job talks.

Schedule

Note for speakers: Please email me (ashish@cs.washington.edu) the details of your talk as soon as you decide. Thanks!

Day Speaker(s) Title
Jan 15 * Jeremy Barbay, University of France Intersection and t-Threshold Problem on Sorted Sets, Lower Bound
and Algorithms for Indexed Search Engines (abstract)
Jan 16 Stefan Saroiu Content Distribution for Peer-to-Peer Systems (abstract)
Jan 22 Jason Hartline, Ed Hong, Alex Mohr, William Pentney Anti-Persistence: History Independent Data Structures (abstract)
Jan 23 Frank McSherry Data Mining through Spectral Analysis (abstract)
Jan 29 Justin Campbell Coding Theory, An Introduction (abstract)
Jan 30 Erik Vee Lower bounds for the Nearest Neighbor Problem (abstract)
Feb 5 Kevin Sikorski, Adrien Treuille, Zizhen Yao Six Proofs of the Infinity of Primes, PFTB Chapter 1
Feb 6 Theory Night Group History Independent Data Structures (abstract)
Feb 12 Kostub Deshmukh, Mausam Representing Numbers as a Sum of Two Squares, PFTB chapter 4
Feb 13 Ashish Sabharwal The Monotone Circuit Complexity of 3-Clique (abstract)
Feb 19 Nilesh Dalvi, Karim Filali The Slope Problem, PFTB Chapter 9 (abstract)
Feb 20 Jared Saia Censorship Resistant Peer-to-peer Content Addressable Networks (abstract)
Feb 26 - No seminar because of Affiliates Talks
Feb 27 Paul Beame A sharp threshold in proof complexity yields lower bounds for satisfiability search (absatract)
Mar 5 Gidon Shavit New Coins from Old: Computing with Unknown Bias (abstract)
Mar 6 Ed Hong Extended Golomb Codes for Binary Markov Sources (abstract)
Mar 12 * Michael Saks Space lower bounds for distance approximation in the data stream model (abstract)
Mar 13 Richard Ladner Windows Scheduling Problems for Broadcast Systems (abstract)

Proofs from The Book
FOCS
STOC
SODA


Talk Abstracts

Speaker Jeremy Barbay, University of Paris
Title Intersection and t-Threshold Problem on Sorted Sets, Lower Bound and Algorithms for Indexed Search Engines
Abstract Consider the problem of computing the intersection of k sorted sets. In the comparison model, we prove a new lower bound which depends on the non-deterministic complexity of the instance, and implies that the algorithm of Demaine, López-Ortiz and Munro is optimal in this "adaptive" sense (for k much smaller than n). We extend the lower bound and the algorithm to the t-Threshold Problem, which consists in finding the elements which are in at least t of the k sets. These problems will be presented at the conference SODA02.

These problems are motivated by boolean queries in indexed text database systems such as Google. Two variants of the intersection problem, Bidimensional Intersection and InterUnion Problem, are studied in the context of the FFSS project, a new sharing file system protocol, associated with an indexed search engine. This protocol is currently tested on the local student network of the Federation AURORE, at the university of Paris-Sud.

A generalisation of the t-Threshold Set Problem, the optimal Threshold Set problem would be of interest in indexed search engine and is also under study.

---------------

Speaker Stefan Saroiu
Title Content Distribution for Peer-to-Peer Systems
Abstract Recently, many proposals of peer-to-peer system designs and protocols have emerged, all providing similar functionality: content name lookup, that is resolving a query for a piece of content into the address of the peer storing the content. While content name lookup is vital to their success, peer-to-peer systems have another, equally difficult, challenge to overcome: distributing content efficiently.

The characteristics of peer-to-peer systems pose a novel, unique environment to the problem of distributing content. First, in these systems, hosts have homogeneous and symmetric roles, acting as both clients and servers. Thus, in contrast to classic client-server architectures, distributing content in peer-to-peer can be done more efficiently, since any host can participate in the distribution. Second, a recent study has found that most of the participating peers are home-users, using asymmetric, low-speed (broadband and modem) Internet connections. Thus, bandwidths of transfers between peers are limited by the peer's connection speeds and a natural ordering of these speeds exists in practice.

In this talk, we start by formalizing the content distribution problem for distributed systems in general and we describe two practical time-based objective functions. Based on this formal description, we find that most content distribution problems trivially reduce to or are known to be NP-hard. Surprisingly, however, in the case of distributing content from well-provisioned nodes in peer-to-peer, the content distribution problem has a linear time optimal algorithm for one of the proposed objective functions. Finally, we present an evaluation of this optimal algorithm and directions for future work.

This is joint work with Krishna Gummadi, Jared Saia, Steve Gribble and Anna Karlin.

---------------

Speakers Jason Hartline, Ed Hong, Alex Mohr, William Pentney
Title Anti-Persistence: History Independent Data Structures
Abstract Paper by: Moni Naor and Vanessa Teague
Many data structures give away much more information than they were intended to. Whenever privacy is important, we need to be concerned that it might be possible to infer information from the memory representation of a data structure that is not available through its legitimate interface. Word processors that quietly maintain old versions of a document are merely the most egregious example of a general problem.

We deal with data structures whose current memory representation does not reveal their history. We focus on dictionaries, where this means revealing nothing about the order of insertions or deletions. Our first algorithm is a hash table based on open addressing, allowing O(1) insertion and search. We also present a history independent dynamic perfect hash table that uses space linear in the number of elements inserted and has expected amortized insertion and deletion time O(1). To solve the dynamic perfect hashing problem we devise a general scheme for history independent memory allocation. For fixed-size records this is quite efficient, with insertion and deletion both linear in the size of the record. Our variable-size record scheme is efficient enough for dynamic perfect hashing but not for general use. The main open problem we leave is whether it is possible to implement a variable-size record scheme with low overhead.

---------------

Speaker Frank McSherry
Title Data Mining through Spectral Analysis
Abstract The heading Data Mining covers many problems, from web searching to clustering to data prediction. In this talk, we will see how spectral analysis, the use of matrix eigenstructure, can be applied to these problems. We will see how the three problems mentioned above can be cast in a common framework, and why spectral techniques are sucessful.

Joint work with lots of people, none of whom should be held responsible for the quality of the talk :)
Amos Fiat / Yossi Azar at Tel Aviv University, Israel
Dimitris Achlioptas at MSR
Anna Karlin and Jared Saia at UW

---------------

Speaker Justin Campbell
Title Coding Theory, An Introduction
Abstract I will give an introduction to one of the current "hot" topics, Coding Theory. This will include, but is not limited to: background, basic terminology, trade-offs, linear codes, Hamming codes, Hamming bound, Singleton bound and possibly Reed-Solomon codes. My intent is for this to be understood by everyone who's had a (very) little linear algebra. Questions are encouraged.

---------------

Speaker Erik Vee
Title Lower bounds for the Nearest Neighbor Problem
Abstract Suppose you are given a set of m points in {0,1}^d, and you wish to construct a database such that given any query from {0,1}^d, you can quickly find the nearest point in your database. We investigate the decision version of this problem, called Lambda-Nearest Neigbor: Given the query point, decide whether any point in the database is within distance lambda of the query. We show several different time-space tradeoff lower bounds for the problem. Under the right assumptions, these bounds relate the amount of space needed to store the database vs. the amount of time needed to process a query to the database.

This is joint work with Paul Beame. The paper will appear in STOC02.

---------------

Speakers Theory Night Group
Title History Independent Data Structures
Abstract We will present recent results from our study of History Independent Data Structures. The talk will begin with a brief review of history independence. We will make a few observations about history independence and space(memory) usage of data structures. We will give some general results about dynamically resizing and history independent data structures as adding the typical doubling techniques to a history independent data structure breaks their history independence. We also make some arguments as to the necessity of canonical representations for strong history independence.

---------------

Speaker Ashish Sabharwal
Title The Monotone Circuit Complexity of 3-Clique
Abstract This is a "half-baked" idea that I worked on 4 years ago with a bunch of people at IIT Kanpur but never took to completion. I recently started thinking about it again for fun!

One can easily construct circuits based on AND, OR and NOT gates to compute any Boolean (i.e. 0/1) function. In 1949, Shannon proved that almost all such functions require exponential size circuits. However, demonstrating even one specific function that indeed requires even super-linear circuit size is still an open problem. Such lower bounds are known only for circuits with various restrictions, such as monotonicity (not using any NOT gates) and bounded depth.

3-Clique is the problem of detecting whether a graph input in the form of (n \choose 2) edge variables has a 3 clique or not. This has a trivial monotone circuit using n^3 gates. An \Omega(n^3 / log^4 n) lower bound is also known, using somewhat complex techniques. We present a very simple combinatorial approach that we believe would get a better, simpler lower bound of \Omega(n^3 / log^2 n). The tangible results so far are for circuits of depth 3 or 4 only.

---------------

Speakers Nilesh Dalvi, Karim Filali
Title The Slope Problem, PFTB Chapter 9
Abstract We'll prove that any n >= 3 points in the plane that are not co-linear determine at least n-1 different slopes. The proof uses a nice reduction to a combinatorial model.

---------------

Speaker Jared Saia
Title Censorship Resistant Peer-to-peer Content Addressable
Abstract We present the first peer-to-peer network which is provably robust to adversarial attack. Our network is robust in the sense that even after adversarial removal of half the nodes in the network, an arbitrarily large fraction of the remaining nodes can still access an arbitrarily large fraction of the original data items. We also give a variant of our scheme that has the property that it is highly spam resistant: an adversary can have complete control of a constant fraction of the peers and yet will still be unable to generate spam. Our network is fully distributed and has time and resource bounds which are competitive with other proposed peer-to-peer systems (e.g. Chord, CAN and Tapestry) which are not provably robust to adversarial attack.

This work appears in the Symposium on Discrete Algorithms, 2002, the Journal of Algorithms (by invitation), and the First International Workshop on Peer-to-peer Systems, 2002. The work is joint with Amos Fiat, Steve Gribble, Anna Karlin and Stefan Saroiu.

---------------

Speaker Paul Beame
Title A sharp threshold in proof complexity yields lower bounds for satisfiability search
Abstract It is well known that for random 3-CNF formulas there is a sharp threshold in the ratio of the number of clauses to the number of variables, below which formulas are almost certainly satisfiable but above which they are almost certainly unsatisfiable. There is substantial empirical and heuristic evidence that this satisfiability threshold is at roughly 4.2 but its precise value is unknown. There has been considerable work aimed at finding upper and lower bounds on this value.

We prove that random 3-CNF formulas, at clause-to-variable ratios below 4 and thus presumed to be satisfiable almost certainly, almost certainly cause natural backtracking search algorithms (also known as Davis-Putnam or DPLL algorithms) to take exponential time to find satisfying assignments.

We derive these results by giving the first example of a sharp threshold in proof complexity. More precisely, we show that for any sufficiently small epsilon>0 and D > 2.28, random formulas consisting of (1-epsilon)n 2-clauses and D n 3-clauses, which are known to be unsatisfiable almost certainly, almost certainly require resolution and Davis-Putnam proofs of unsatisfiability of exponential size, whereas it is easily seen that random formulas with (1+epsilon)n 2-clauses (and D n 3-clauses) have linear size proofs of unsatisfiability almost certainly.

(Joint work with Dimitris Achlioptas and Mike Molloy)

---------------

Speaker Gidon Shavit
Title New Coins from Old: Computing with Unknown Bias
Abstract The paper I'll be presenting is New Coins from Old: Computing with Unknown Bias by Elchanan Mossel (MSR) and Yuval Peres (UCB). The paper itself hasn't been published yet, and I'll be presenting from a draft. In addition, I'll take a somewhat major excursion into The Algebraic Theory of Context-Free Languages by N. Chomsky and M.P.Schutzenberger (1963).

---------------

Speaker Ed Hong
Title Extended Golomb Codes for Binary Markov Sources
Abstract We evaluate the performance of elementary Golomb coding on a binary two-state Markov source. In addition to the simple context independent Golomb coding method that treats the source as memoryless, we also consider two other methods: sequential coding, and interleaved coding. Sequential coding is a context-dependent method that sequentially codes the source, choosing the order of the elementary Golomb code based on the last bit seen. Interleaved coding codes the even-numbered bits before the odd-numbered bits using elementary Golomb codes of several different orders. Of these methods, we show that no one method is best on all Markov sources, but that the best method depends upon the transition probabilities in the Markov source.

---------------

Speaker Michael Saks
Title Space lower bounds for distance approximation in the data stream model
Abstract We consider the problem of approximating the distance of two $d$-dimensional vectors $x$ and $y$ in the data stream model. In this model, the $2d$ coordinates are presented as a ``stream'' of data in some arbitrary order, where each data item includes the index and value of some coordinate and a bit that identifies the vector ($x$ or $y$) to which it belongs. The goal is to minimize the amount of memory needed to approximate the distance. For the case of $L^p$-distance with $p \in [1,2]$, there are good approximation algorithms that run in polylogarithmic space in $d$ (here we assume that each coordinate is an integer with $O(\log d)$ bits). Here we prove that they do not exist for $p>2$. In particular, we prove an optimal approximation-space tradeoff of approximating $L^\infty$ distance of two vectors. We show that any randomized algorithm that approximates $L^\infty$ distance of two length $d$ vectors within factor of $d^\delta$ requires $\Omega(d^{1-4\delta})$ space. As a consequence we show that for $p>2/(1-4\delta)$, any randomized algorithm that approximate $L^p$ distance of two length $d$ vectors within a factor $d^{\delta}$ requires $\Omega(d^{1-\frac 2p-4\delta})$ space.

The lower bound follows from a lower bound on the two-party one-round communication complexity of this problem. This lower bound is proved using a combination of information theory and Fourier analysis.

This is joint work with Xiaodong Sun, and will appear in STOC 2002.

---------------

Speaker Richard Ladner
Title Windows Scheduling Problems for Broadcast Systems
Abstract The windows scheduling problem is defined by the positive integers h and w_1,w_2,...,w_n. The window w_i is associated with page i and h is the number of slotted channels available for broadcasting the pages. A schedule that solves the problem assigns pages to slots such that the gap between any two consecutive appearances of page i is at most w_i slots. We investigate two optimization problems.
  1. The optimal windows scheduling problem: given w_1,w_2,...,w_n find a schedule in which h is minimized.
  2. The optimal harmonic windows scheduling problem: given h find a schedule for the windows w_i = i in which n is maximized.
The former is a formulation of the problem of minimizing the bandwidth in push systems that support guaranteed delay and the latter is a formulation of the problem of minimizing the startup delay in media-on-demand systems. For the optimal windows scheduling problem we present an algorithm that constructs asymptotically close to optimal schedules and for the optimal harmonic windows scheduling problem, we show how to achieve the largest known n's for all values of h.



CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[Comments to Ashish Sabharwal, ashish]