Algorithms and lower bounds for testing properties of structured distributions

Nikishkin, Vladimir

Repository landing page

oai:era.ed.ac.uk:1842/25406

Algorithms and lower bounds for testing properties of structured distributions

Authors: Vladimir Nikishkin
Publication date: 29 November 2016
Publisher: The University of Edinburgh

Abstract

In this doctoral thesis we consider various property testing problems for structured distributions. A distribution is said to be structured if it belongs to a certain class which can be simply described in approximation terms. Such distributions often arise in practice, e.g. log-concave distributions, easily approximated by polynomials (see [Bir87a]), often appear in econometric research. For structured distributions, testing a property often requires far less samples than for general unrestricted distributions. In this thesis we prove that this is indeed the case for several distance-related properties. Namely, we give explicit sub-linear time algorithms for L1 and L2 distance testing between two structured distributions for the cases when either one or both of them are available as a “black box”. We also prove that the given algorithms have the best possible asymptotic complexity by proving matching lower bounds in the form of explicit problem instances (albeit constructed using randomized techniques) demanding at least a specified amount of data to be tested successfully. As the main numerical result, we prove that testing that total variation distance to an explicitly given distribution is at least e requires O(√k/e²) samples, where k is an approximation parameter, dependent on the class of distribution being tested and independent of the support size. Testing that the total variation distance between two “black box” distributions is at least e requires O(k⁴/⁵e⁶/⁵). In some cases, when k ~ n, this result may be worse than using an unrestricted testing algorithm (which requires O( n²/3/e² ) samples where n is the domain size). To address this issue, we develop a third algorithm, which requires O(k²/³e⁴/³ log⁴/³(n/k) log log(n/k)) and serves as a bridge between the cases of small and large domain sizes

Similar works

Full text

Open in the Core reader

Download PDF

Edinburgh Research Archive

oai:era.ed.ac.uk:1842/25406

Last time updated on 07/06/2021

This paper was published in Edinburgh Research Archive.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.