A Separation Logic for Negative Dependence (POPL 2022 - POPL Research Papers)

Who

Jialu Bao, Marco Gaboardi, Justin Hsu, Joseph Tassarotti

Track

POPL 2022

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 19 Jan 2022 16:40 - 17:05 at Salon III - Reasoning about Probabilistic Programs and Algorithms Chair(s): Armando Solar-Lezama

Abstract

Hashing plays a fundamental role in many probabilistic data structures, from basic hash tables to more sophisticated schemes such as Bloom filters. In these applications, a \emph{hash function} $h$ maps a \emph{universe} of possible values, typically large, to a set of \emph{buckets}, typically much smaller. Hash-based data structures satisfy a variety of probabilistic guarantees. For instance, we may be interested in the \emph{false positive rate}: what is the probability that a data structure mistakenly identifies an element as being stored in a collection, when it was never stored before? We may also be interested in \emph{load properties}, such as: what is the chance that some bucket in the data structure overflows? A typical way to prove these properties is to treat random hash functions as a \emph{balls-into-bins} process: randomly throw $N$ balls into $B$ bins one at a time, where each bin is chosen from some known distribution, and report the number of balls in each bin at the end of the process. For example, hashing $N$ unique elements into $B$ bins can be modeled as throwing $N$ balls into $B$ bins, where each bin is drawn uniformly at random.

While this modeling is convenient, one complication is that the counts of the elements in the different buckets are not probabilistically independent: one bin containing many elements makes it more likely that other bins contain few elements. The lack of independence makes it difficult to reason about multiple bins, for instance bounding the number of empty bins. Moreover, many common tools for analyzing probabilistic processes, like concentration bounds, usually require independence. This subtlety has also been a source of problems in pen-and-paper analyses of probabilistic data structures. For instance, the standard analysis of the Bloom filter bounds the number of occupied bins in order to bound the false positive rate. The original version of this analysis~\citep{DBLP:journals/cacm/Bloom70}, and also repeated in many papers, assumes that the bin counts are independent. However, \citet{DBLP:journals/ipl/BoseGKMMMST08} pointed out that this assumption is incorrect, and in fact the claimed upper-bound on the false-positive rate is actually a \emph{lower} bound; proving a correct bound on the false-positive rate required a substantially more complicated argument. Recently, \citet{DBLP:conf/cav/GopinathanS20} mechanized a correct, but complex proof in Coq.

We aim to develop a simpler method to formally reason about hash-based data structures and balls-into-bins processes, drawing on a key concept in probability theory: \emph{negative dependence}.

DOI

https://doi.org/10.1145/3498719

Jialu Bao

Cornell University

United States

Marco Gaboardi

Boston University

United States

Justin Hsu

Cornell University

United States

Joseph Tassarotti

Boston College

United States

Media

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 19 Jan
Displayed time zone: Eastern Time (US & Canada) change

16:40 - 17:30	Reasoning about Probabilistic Programs and AlgorithmsPOPL at Salon III Chair(s): Armando Solar-Lezama Massachusetts Institute of Technology

16:40 25m Research paper		A Separation Logic for Negative DependenceRemote POPL Jialu Bao Cornell University, Marco Gaboardi Boston University, Justin Hsu Cornell University, Joseph Tassarotti Boston College DOI Media Attached
17:05 25m Research paper		Reasoning about “Reasoning about Reasoning”: Semantics and Contextual Equivalence for Probabilistic Programs with Nested Queries and RecursionRemote POPL Yizhou Zhang University of Waterloo, Nada Amin Harvard University DOI Media Attached