top of page
Writer's picturequeprofkabternconc

Dharma – Generation-based Context-free Grammar Fuzzing Tool: A Case Study and Success Story



This chapter also introduces a grammar toolbox with several helper functions that ease the writing of grammars, such as using shortcut notations for character classes and repetitions, or extending grammars




Dharma – Generation-based Context-free Grammar Fuzzing Tool




The middle ground between regular expressions and Turing machines is covered by grammars. Grammars are among the most popular (and best understood) formalisms to formally specify input languages. Using a grammar, one can express a wide range of the properties of an input language. Grammars are particularly great for expressing the syntactical structure of an input, and are the formalism of choice to express nested or recursive inputs. The grammars we use are so-called context-free grammars, one of the easiest and most popular grammar formalisms.


In such a grammar, if we start with and then expand one symbol after another, randomly choosing alternatives, we can quickly produce one valid arithmetic expression after another. Such grammar fuzzing is highly effective as it comes to produce complex inputs, and this is what we will implement in this chapter.


Railroad diagrams, also called syntax diagrams, are a graphical representation of context-free grammars. They are read left to right, following possible "rail" tracks; the sequence of symbols encountered on the track defines the language. To produce railroad diagrams, we implement a function syntax_diagram().


(If you find that there is redundancy ("Robustness and Robustness") in here: In our chapter on coverage-based fuzzing, we will show how to cover each expansion only once. And if you like some alternatives more than others, probabilistic grammar fuzzing will be there for you.)


One way around this is to attach constraints to grammars, as we will discuss later in this book. Another possibility is to put together the strengths of grammar-based fuzzing and mutation-based fuzzing. The idea is to use the grammar-generated inputs as seeds for further mutation-based fuzzing. This way, we can explore not only valid inputs, but also check out the boundaries between valid and invalid inputs. This is particularly interesting as slightly invalid inputs allow finding parser errors (which are often abundant). As with fuzzing in general, it is the unexpected which reveals errors in programs.


As one of the foundations of human language, grammars have been around as long as human language existed. The first formalization of generative grammars was by Dakṣiputra Pāṇini in 350 BC [Dakṣiputra Pāṇini, 350 BCE]. As a general means to express formal languages for both data and programs, their role in computer science cannot be overstated. The seminal work by Chomsky [Chomsky et al, 1956] introduced the central models of regular languages, context-free grammars, context-sensitive grammars, and universal grammars as they are used (and taught) in computer science as a means to specify input and programming languages ever since.


The CSmith tool [Yang et al, 2011] specifically targets C programs, starting with a C grammar and then applying additional steps, such as referring to variables and functions defined earlier or ensuring integer and type safety. Their authors have used it "to find and report more than 400 previously unknown compiler bugs."


Dharma is a generation-based, context-free grammar fuzzer created in 2015 by Christoph Diehl from Mozilla Security team. The goal of this tool is to generate files (like Javascript and/or HTML) based on a given grammar description. This concept look maybe familiar to you if you already played with Domato by Ivan Fratric from Google Project Zero.


The previous dharma grammar is really specific to WebAssembly APIs and should be improved to handle more generic JavaScript methods/objects (Array, UIntArray, Number, etc.). Also, another grammar should be created to generate valid WebAssembly module bytecode, stored inside ArrayBuffer or TypedBuffer and provided to the WebAssembly.Module() constructor.


Grammar-based fuzzing is a fuzzing technique that uses a formal language grammar to define the structure of the data to be generated. These grammars are typically represented in plain-text and use a combination of symbols and constants to represent the data. The fuzzer can then parse the grammar and use it to generate fuzzed output.


First and foremost, the WebIDL specification defines a standardized grammar to which these definitions must adhere. This lets us leverage existing tools, such as WebIDL2.js, for parsing the raw WebIDL definitions and converting them into an abstract syntax tree (AST). Then this AST can be interpreted by the fuzzer to generate testcases.


Out of this need, we developed another tool named GrIDL. GrIDL leverages the WebIDL2.js library for converting our IDL definitions into an AST. It also makes several optimizations to the AST to better support its use as a fuzzing grammar. 2ff7e9595c


2 views0 comments

Recent Posts

See All

Comments


bottom of page