Verbatim++: Verified, Optimized, and Semantically Rich Lexing with Derivatives (CPP 2022)

Sun 16 - Fri 28 January 2022 Philadelphia, Pennsylvania, United States

Who

Derek Egolf, Sam Lasser, Kathleen Fisher

Track

CPP 2022

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 17 Jan 2022 13:55 - 14:20 at Salon III - Semantics and Program Verification Chair(s): Benjamin Delaware

Abstract

Lexers and parsers are attractive targets for attackers because they often sit at the boundary between a software system’s internals and the outside world. Formally verified lexers can reduce the attack surface of these systems, thus making them more secure.

One recent step in this direction is the development of Verbatim, a verified lexer based on the concept of Brzozowski derivatives. Two limitations restrict the tool’s usefulness. First, its running time is quadratic in the length of its input string. Second, the lexer produces tokens with a simple “tag and string” representation, which limits the tool’s ability to integrate with parsers that operate on more expressive token representations.

In this work, we present a suite of extensions to Verbatim that overcomes these limitations while preserving the tool’s original correctness guarantees. The lexer achieves effectively linear performance on a JSON benchmark through a combination of optimizations that, to our knowledge, has not been previously verified. The enhanced version of Verbatim also enables users to augment their lexical specifications with custom semantic actions, and it uses these actions to produce semantically rich tokens—i.e., tokens that carry values with arbitrary, user-defined types. All extensions were implemented and verified with the Coq Proof Assistant.

Link to Publication

https://dl.acm.org/doi/abs/10.1145/3497775.3503694

Derek Egolf

Northeastern University

Sam Lasser

Tufts University

United States

Kathleen Fisher