Given a set of n random d-dimensional boolean vectors with the promise that two of them are r-correlated with each other, how quickly can one find the two correlated vectors? We present a surprising and simple algorithm which, for any constant r > 0 runs in (expected) time dn^(3w/4))poly(1/r) < dn^(1.8) poly(1/r), where w < 2.4 is the exponent of matrix multiplication. This is the first subquadratic time algorithm for this problem with polynomial dependence on 1/r and improves upon O(dn^(2 - O(r)) )given by Paturi et al., the `Locality Sensitive Hashing' approach of Indyk et al., and the `Bucketing Codes' approach of Dubiner.
This basic algorithm also applies to the adversarial setting, and extensions of this algorithm yield improved algorithms for several other problems, including learning sparse parities with noise, and learning juntas both with, and without noise.
For a broad class of practically relevant distribution properties, which includes entropy and support size, nearly all of the proposed estimators have an especially simple form. Given a set of independent samples from a discrete distribution, these estimators tally the vector of summary statistics---the number of domain elements seen once, twice, etc. in the sample---and output the dot product between these summary statistics, and a fixed vector of coefficients. We term such estimators linear. This historical proclivity towards linear estimators is slightly perplexing, since, despite many efforts over nearly 60 years, all proposed such estimators have significantly suboptimal convergence, compared to the bounds shown in [Valiant and Valiant, 2011].
Our main result, in some sense vindicating this insistence on linear estimators, is that for any property in this broad class, there exists a near-optimal linear estimator. Additionally, we give a practical and polynomial-time algorithm for constructing such estimators for any given parameters.
While this result does not yield explicit bounds on the sample complexities of these estimation tasks, we leverage the insights provided by this result, to give explicit constructions of near-optimal linear estimators for three properties: entropy, L1 distance to uniformity, and for pairs of distributions, L1 distance.
Our entropy estimator, when given O(n/c log n) independent samples from a distribution of support at most n, will estimate the entropy of the distribution to within accuracy c, with probability of failure o(1/poly(n)). From the recent lower bounds given in [Valiant and Valiant, 2011], this estimator is optimal, to constant factor, both in its dependence on n, and its dependence on c. In particular, the inverse-linear convergence rate of this estimator resolves the main open question of [Valiant and Valiant, 2011], which left open the possibility that the error decreased only with the square root of the number of samples.
Our distance to uniformity estimator, when given O(m/c^2 log m)$ independent samples from any distribution, returns an c-accurate estimate of the L1 distance to the uniform distribution of support m. This is the first sublinear-sample estimator for this problem, and is constant-factor optimal, for constant c.
Finally, our framework extends naturally to properties of pairs of distributions, including estimating the L1 distance and KL-divergence between pairs of distributions. We give an explicit linear estimator for estimating L1 distance to accuracy c using O(n/c^2 log n) samples from each distribution, which is constant-factor optimal, for constant c.
@inproceedings{linEstVV,
title = "The Power of Linear Estimators",
author = "Gregory Valiant and Paul Valiant",
booktitle="the 52nd Annual IEEE Symposium on the Foundations of Computer Science (FOCS)",
year = "2011",
}
We introduce a new approach to characterizing the unobserved portion of a distribution, which provides sublinear-sample additive estimators for a class of properties that includes entropy and distribution support size. Together with the lower bounds proven in the companion paper [VV'10a], this settles the longstanding question of the sample complexities of these estimation problems (up to constant factors). Our algorithm estimates these properties up to an arbitrarily small additive constant, using O(n log n) samples; [VV'10a] shows that no algorithm on o(n log n) samples can achieve this (where n is a bound on the support size, or in the case of estimating the support size, 1/n is a lower bound the probability of any element of the domain). Previously, no explicit sublinear-sample algorithms for either of these problems were known. Additionally, our algorithm runs in time linear in the number of samples used.
@article{unseenVV,
title = "Estimating the unseen: a sublinear-sample canonical estimator of distributions",
author = "Gregory Valiant and Paul Valiant",
year = "2010",
}
We prove two new multivariate central limit theorems; the first relates the sum of independent distributions to the multivariate Gaussian of corresponding mean and covariance, under the earthmover distance matric (also known as the Wasserstein metric). We leverage this central limit theorem to prove a stronger but more specific central limit theorem for “generalized multinomial” distributions—a large class of discrete distributions, parameterized by matrices, that generalize binomial and multinomial distributions, and describe many distributions encountered in computer science. This central limit theorem relates a generalized multinomial distribution to a multivariate Gaussian distribution, discretized by rounding to the nearest lattice points. In contrast to the metric of our first central limit theorem, this bound is in terms of statistical distance, which immediately implies that any algorithm with input drawn from a generalized multinomial distribution behaves essentially as if the input were drawn from a discretized Gaussian with the same mean and covariance. Such tools in the multivariate setting are rare, and we hope this new tool will be of use to the community.
In the second part of the paper, we employ this central limit theorem to establish a lower bound of Omega(n/ log n) on the sample complexity of additively estimating the entropy or support size of a distribution (where 1/n is a lower bound on the probability of any element in the domain). Together with the canonical estimator constructed in the companion paper [VV'10b], this settles the longstanding open question of the sample complexities of these estimation problems, up to constant factors. In particular, for any constant c, there is a pair of distributions D, D' each of whose elements occurs with probability at least 1/n, and whose entropies satisfy H(D)-H(D') > c, such that no algorithm on o( n / c log n) samples can distinguish D from D' with probability greater than 2/3, and analogously for the problem of estimating the support size. The previous lower-bounds on these sample complexities were n/2^Theta(Sqrt(log n)), for constant c. We explicitly exhibit such a pair of distributions via a Laguerre polynomial construction that may be of independent interest.
@article{cltVV,
title = "A CLT and tight lower bounds for estimating entropy",
author = "Gregory Valiant and Paul Valiant",
year = "2010",
}
We introduce a new approach to characterizing the unobserved portion of a distribution, which provides sublinear--sample estimators achieving arbitrarily small additive constant error for a class of properties that includes entropy and distribution support size. Additionally, we show new matching lower bounds. Together, this settles the longstanding question of the sample complexities of these estimation problems, up to constant factors.
@inproceedings{unseenVV,
title = "Estimating the Unseen: An n/log(n)-sample Estimator for Entropy and Support Size, Shown Optimal via New CLTs",
author = "Gregory Valiant and Paul Valiant",
booktitle="the 43rd ACM Symposium on Theory of Computing (STOC)",
year = "2011",
}
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We give an algorithm for this problem that has a running time, and data requirement polynomial in the dimension and the inverse of the desired accuracy, with provably minimal assumptions on the Gaussians. Our algorithm recovers parameters such that each recovered component is close in statistical distance to one of the original components--a condition that is stronger than simply additive approximations of the parameters. As simple consequences of our learning algorithm, we can perform near-optimal clustering of the sample points and density estimation for mixtures of k Gaussians, efficiently. The building blocks of our algorithm are based on the work Kalai et al. [STOC 2010] that gives an efficient algorithm for learning mixtures of two Gaussians by considering a series of projections down to one dimension, and applying the method of moments to each univariate projection. A major technical hurdle in Kalai et al. is showing that one can efficiently learn univariate mixtures of two Gaussians. In contrast, because pathological scenarios can arise when considering univariate projections of mixtures of more than two Gaussians, the bulk of the work in this paper concerns how to leverage an algorithm for learning univariate mixtures (of many Gaussians) to yield an efficient algorithm for learning in high dimensions. Our algorithm employs hierarchical clustering and rescaling, together with delicate methods for backtracking and recovering from failures that can occur in our univariate algorithm. Finally, while the running time and data requirements of our algorithm depend exponentially on the number of Gaussians in the mixture, we prove that such a dependence is necessary.
@inproceedings{MV:many_gaussians,
title = "Settling the Polynomial Learnability of Mixtures of Gaussians",
author = "Ankur Moitra and Gregory Valiant",
booktitle = "the 51st Annual Symposium on the Foundations of Computer Science (FOCS)",
year = "2010",
}
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We provide a polynomial-time algorithm for this problem for the case of two Gaussians in n dimensions (even if they overlap), with provably minimal assumptions on the Gaussians, and polynomial data requirements. In statistical terms, our estimator converges at an inverse polynomial rate, and no such estimator (even exponential time) was known for this problem (even in one dimension). Our algorithm reduces the n-dimensional problem to the one-dimensional problem, where the method of moments is applied. One technical challenge is proving that noisy estimates of the first six moments of a univariate mixture suffice to recover accurate estimates of the mixture parameters, as conjectured by Pearson (1894), and in fact these estimates converge at an inverse polynomial rate. As a corollary, we can efficiently perform near-optimal clustering: in the case where the overlap between the Gaussians is small, one can accurately cluster the data, and when the Gaussians have partial overlap, one can still accurately cluster those data points which are not in the overlap region. A second consequence is a polynomial-time density estimation algorithm for arbitrary mixtures of two Gaussians, generalizing previous work on axis-aligned Gaussians (Feldman et al, 2006).
@inproceedings{KMV:two_gaussians,
title = "Efficiently Learning Mixtures of Two Gaussians",
author = "Adam Tauman Kalai and Ankur Moitra and Gregory Valiant",
booktitle = "the 42nd ACM Symposium on Theory of Computing (STOC)",
year = "2010",
}
This paper extends the work of Gottlob, Lee, and Valiant (PODS 2009)[GLV], and considers worst-case bounds for the size of the result Q(D) of a conjunctive query Q to a database D given an arbitrary set of functional dependencies. The bounds in [GLV] are based on a "coloring" of the query variables. In order to extend the previous bounds to the setting of arbitrary functional dependencies, we leverage tools from information theory to formalize the original intuition that each color used represents some possible entropy of that variable, and bound the maximum possible size increase via a linear program that seeks to maximize how much more entropy is in the result of the query than the input. This new view allows us to precisely characterize the entropy structure of worst-case instances for conjunctive queries with simple functional dependencies (keys), providing new insights into the results of [GLV]. We extend these results to the case of general functional dependencies, providing upper and lower bounds on the worst-case size increase. We identify the fundamental connection between the gap in these bounds and a central open question in information theory. Finally, we show that, while both the upper and lower bounds are given by exponentially large linear programs, one can distinguish in polynomial time whether the result of a query with an arbitrary set of functional dependencies can be any larger than the input database.
@inproceedings{VV:size_bds,
title = "Size Bounds for Conjunctive Queries with General Functional Dependencies",
author = "Gregory Valiant and Paul Valiant",
year = "2010",
booktitle = "http://arxiv.org/abs/0909.2030v2",
}
This paper provides new worst-case bounds for the size and treewith of the result Q(D) of a conjunctive query Q to a database D. We derive bounds for the result size |Q(D)| in terms of structural properties of Q, both in the absence and in the presence of keys and functional dependencies. These bounds are based on a novel “coloring” of the query variables that associates a coloring number C(Q) to each query Q. Using this coloring number, we derive tight bounds for the size of Q(D) in case (i) no functional dependencies or keys are specified, and (ii) simple (one-attribute) keys are given. These results generalize recent size bounds for join queries obtained by Atserias, Grohe, and Marx (FOCS 2008). An extension of our coloring technique also gives a lower bound for |Q(D)| in the general setting of a query with arbitrary functional dependencies. Our new coloring scheme also allows us to precisely characterize (both in the absence of keys and with simple keys) the treewidth-preserving queries—-the queries for which the output treewidth is bounded by a function of the input treewidth. Finally we characterize the queries that preserve the sparsity of the input in the general setting with arbitrary functional dependencies.
@inproceedings{GLV:size_bounds,
title = "Size and Treewidth Bounds for Conjunctive Queries",
author = "Georg Gottlob and Stephanie Tien Lee and Gregory Valiant",
year = "2009",
booktitle = "PODS 2009",
}
We present a new framework for auction design and analysis that we term “best-response auctions”. We use this frame work to show that the simple and myopic best-response dy namics converge to the VCG outcome and are incentive compatible in several well-studied auction environments (Generalized Second Price auctions, and auctions with unit-demand bidders). Thus, we establish that in these environments, given that all other bidders are repeatedly best-responding, the best course of action for a bidder is to also repeatedly best-respond. Our results generalize classical results in economics regarding convergence to equilibrium and incentive compatibility of ascending-price English auctions. In addition, our findings provide new game-theoretic justifications for some well-studied auction rules. Best-response auctions provide a way to bridge the gap between the full-information equilibrium concept and the usual private-information auction theory.
@inproceedings{NSVZ:BRA,
title = "Best-Response Auctions",
author = "Noam Nisan and Michael Schapira and Gregory Valiant and Aviv Zohar",
booktitle = "EC 2011.",
}
We revisit price of anarchy in network routing, in a new model in which routing decisions are made by self-interested components of the network, as opposed to by the flows as in the work of Roughgarden and Tardos [RT02]. This significant departure from previous work on the problem seeks to model Internet routing more accurately. We propose two models: the latency model in which the network edges seek to minimize the average latency of the flow through them on the basis of knowledge of latency conditions in the whole network, and the pricing model in which network edges advertise pricing schemes to their neighbors and seek to maximize their profit. We show two rather surprising results: the price of stability in the latency model is unbounded, even with linear latencies (as compared with 4/3 in [RT02] for the case in which routing decisions are made by the flows themselves). However, in the pricing model in which edges advertise pricing schemes—how the price varies as a function of the total amount of flow—we show that, under a condition ruling out monopolistic situations, all Nash equilibria have optimal flows; that is, the price of anarchy in this model is one, in the case of linear latencies with no constant term.
@inproceedings{PV:selfish_routers,
title = "A New Look at Selfish Routing",
author = "Christos Papadimitriou and Gregory Valiant",
year = "2010",
booktitle = "Innovations in Computer Science (ICS2010)",
}
Under many protocols—in computerized settings and in economics settings—participants repeatedly “best respond” to each others’ actions until the system “converges” to an equilibrium point. We ask when does such myopic “local rationality” imply “global rationality”, i.e., when is it best for a player, given that the others are repeatedly best-responding, to also repeatedly best-respond? We exhibit a class of games where this is indeed the case. We identify several environments of interest that fall within our class: models of the Border Gateway Protocol (BGP), that handles routing on the Internet, and of the Transmission Control Protocol (TCP), and also stable-roommates and cost-sharing, that have been extensively studied in economic theory.
@inproceedings{NSVZ:best_reply,
title = "Best-Response Mechanisms",
author = "Noam Nisan and Michael Schapira and Gregory Valiant and Aviv Zohar",
booktitle = "Innovations in Computer Science (ICS 2011), 2011",
}
Can learning algorithms find a Nash equilibrium? This is a natural question for several reasons. Learning algorithms resemble the behavior of players in many naturally arising games, and thus results on the convergence or non- convergence properties of such dynamics may inform our understanding of the applicability of Nash equilibria as a plausible solution concept in some settings. A second reason for asking this question is in the hope of being able to prove an impossibility result, not dependent on complexity assumptions, for computing Nash equilibria via a restricted class of reasonable algorithms. In this work, we begin to answer this question by considering the dynamics of the standard multiplicative weights update learning algorithms (which are known to converge to a Nash equilibrium for zero-sum games). We revisit a 3x3 game defined by Shapley in the 1950s in order to establish that fictitious play does not converge in general games. For this simple game, we show via a potential function argument that in a variety of settings the multiplicative updates algorithm impressively fails to find the unique Nash equilibrium, in that the cumulative distributions of players produced by learning dynamics actually drift away from the equilibrium.
@inproceedings{DFPPV10,
title = "On Learning Algorithms for Nash Equilibria",
author = "C. Daskalakis and R. Frongillo and C. H. Papadimitriou and G. Pierrakos and G. Valiant",
year = "2010",
booktitle = "SAGT",
}
In light of much recent interest in finding a model of multi-player multi-action games that allows for efficient computation of Nash equilibria yet remains as expressive as possible, we investigate the computational complexity of Nash equilibria in the recently proposed model of action- graph games (AGGs). AGGs, introduced by Bhat and Leyton-Brown, are succinct representations of games that encapsulate both local dependencies as in graphical games, and partial indifference to other agents' identities as in anonymous games, which occur in many natural settings such as financial markets. This is achieved by specifying a graph on the set of actions, so that the payoff of an agent for selecting a strategy depends only on the number of agents playing each of the neighboring strategies in the action graph. We present a simple Fully Polynomial Time Approximation Scheme for computing mixed Nash equilibria of AGGs with constant degree, constant treewidth and a constant number of agent types (but an arbitrary number of strategies), and extend this algorithm to a broader set of instances. However, the main results of this paper are negative, showing that when either of the latter conditions are relaxed the problem becomes intractable. In particular, we show that even if the action graph is a tree but the number of agent-types is unconstrained, it is NP--complete to decide the existence of a pure-strategy Nash equilibrium and PPAD--complete to compute a mixed Nash equilibrium (even an approximate one). Similarly for AGGs with a constant number of agent types but unconstrained treewidth. These hardness results suggest that, in some sense, our FPTAS is as strong a positive result as one can expect. In the broader context of trying to pin down the boundary where the equilibria of multi-player games can be computed efficiently, these results complement recent hardness results for graphical games and algorithmic results for anonymous games.
@inproceedings{DSVV:actiongraph08,
title = "On the Complexity of Nash Equilibria of Action-Graph Games",
author = "Constantinos Daskalakis and Grant Schoenebeck and Gregory Valiant and Paul Valiant",
year = "2009",
booktitle = "SODA",
}
Designing and deploying a network protocol determines the rules by which end users interact with each other and with the network. We consider the problem of designing a protocol to optimize the equilibrium behavior of a network with selfish users. We consider network cost--sharing games, where the set of Nash equilibria depends fundamentally on the choice of an edge cost-sharing protocol. Previous research focused on the Shapley protocol, in which the cost of each edge is shared equally among its users. We systematically study the design of optimal cost-sharing protocols for undirected and directed graphs, single-sink and multicommodity networks, and different measures of the inefficiency of equilibria. Our primary technical tool is a precise characterization of the cost-sharing protocols that induce only network games with pure-strategy Nash equilibria. We use this characterization to prove, among other results, that the Shapley protocol is optimal in directed graphs and that simple priority protocols are essentially optimal in undirected graphs.
@article{CRV:good equilibria,
author = "Ho-Lin Chen and Tim Roughgarden and Gregory Valiant",
title = "Designing Network Protocols for Good Equilibria",
journal="SIAM Journal on Computing",
volume="39",
number ="5",
pages="1799--1832",
year = "2010",
}
Braess's Paradox is the counterintuitive but well-known fact that removing edges from a network with "selfsh routing" can decrease the latency incurred by traffic in an equilibrium flow. Despite the large amount of research motivated by Braess's Paradox since its discovery in 1968, little is known about whether it is a common real-world phenomenon, or a mere theoretical curiosity. In this paper, we show that Braess's Paradox is likely to occur in a natural random network model. More precisely, with high probability, (as the number of vertices goes to infinity), there is a traffic rate and a set of edges whose removal improves the latency of trafficc in an equilibrium flow by a constant factor. Our proof approach is robust and shows that the "global" behavior of an equilibrium flow in a large random network is similar to that in Braess's original four-node example.
@artticle{VR_2006,
author = "Gregory Valiant and Tim Roughgarden",
title = "Braess's paradox in large random graphs",
booktitle = "Random Structures and Algorithms".
volume="37",
number = "4",
pages ="495--515",
year = "2010",
}
During undergrad I did some work in Planetary Science, studying Martian impact craters.
The geometry of simple impact craters reflects the properties of the target materials, and the diverse range of fluidized morphologies observed in Martian ejecta blankets are controlled by the near-surface composition and the climate at the time of impact. Using the Mars Orbiter Laser Altimeter (MOLA) data set, quantitative information about the strength of the upper crust and the dynamics of Martian ejecta blankets may be derived from crater geometry measurements. Here we present the results from geometrical measurements of fresh craters 3.50 km in rim diameter in selected highland (Lunae and Solis planitiae) and lowland (Acidalia, Isidis, and Utopia planitiae) terrains. We find large, resolved differences between the geometrical properties of the freshest highland and lowland craters. Simple lowland craters are 1.5-2.0 times deeper with >50% larger cavities compared to highland craters of the same diameter. Rim heights and the volume of material above the preimpact surface are slightly greater in the lowlands over most of the size range studied. The different shapes of simple highland and lowland craters indicate that the upper ~6.5 km of the lowland study regions are significantly stronger than the upper crust of the highland plateaus. Lowland craters collapse to final volumes of 45-70% of their transient cavity volumes, while highland craters preserve only 25-50%. The effective yield strength of the upper crust in the lowland regions falls in the range of competent rock, approximately 9.12 MPa, and the highland plateaus may be weaker by a factor of 2 or more, consistent with heavily fractured Noachian layered deposits. The measured volumes of continuous ejecta blankets and uplifted surface materials exceed the predictions from standard crater scaling relationships and Maxwellfs Z model of crater excavation by a factor of 3. The excess volume of fluidized ejecta blankets on Mars cannot be explained by the concentration of ejecta through nonballistic emplacement processes and/or bulking. The observations require a modification of the scaling laws and are well fit using a scaling factor of ~1.4 between the transient crater surface diameter to the final crater rim diameter and excavation flow originating from one projectile diameter depth with Z = 2.7. The refined excavation model provides the first observationally constrained set of initial parameters for study of the formation of fluidized ejecta blankets on Mars.
@article{Stewart_Valiant_2006,
author = "S. T. Stewart and G. J. Valiant",
title = "Martian subsurface properties and crater formation processes inferred from fresh impact crater geometries",
journal = "Meteoritics and Planetary Sciences",
year = 2006,
volume = 41,
number = 10,
pages = "1509-1537",
}