Skip to content

Commit a384797

Browse files
committed
design/draft-fuzzing.md: update fuzzing draft design
This updates the draft design to more closely match the more recent decisions during implementation. Change-Id: I716f4e07431612bcf15fcbde8409e8db5b3b37a9 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/289809 Trust: Katie Hockman <[email protected]> Run-TryBot: Katie Hockman <[email protected]> Reviewed-by: Jay Conrod <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
1 parent abbf42e commit a384797

File tree

1 file changed

+120
-115
lines changed

1 file changed

+120
-115
lines changed

design/draft-fuzzing.md

+120-115
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ support these functions within yyy\_test.go files.
138138

139139
## Implementation
140140

141-
There are several components to this proposal which are described below.
141+
There are several components to this design draft which are described below.
142142
The big pieces to be supported in the MVP are: support for fuzzing built-in
143143
types, structs, and types which implement the BinaryMarshaler and
144144
BinaryUnmarshaler interfaces or the TextMarshaler and TextUnmarshaler
@@ -169,8 +169,8 @@ test](#go-command)</code> by default, and can provide a starting point for a
169169
[mutation engine](#fuzzing-engine-and-mutator) if fuzzing.
170170
The testing portion of the fuzz target is a function within an `f.Fuzz`
171171
invocation.
172-
This function runs like a standard unit test with `testing.T` for each input in
173-
the seed corpus.
172+
This function runs much like a standard unit test with `testing.T` for each
173+
input in the seed corpus.
174174
If the developer is fuzzing this target with the new `-fuzz` flag with `go
175175
test`, then a [generated corpus](#generated-corpus) will be managed by the
176176
fuzzing engine, and a mutator will generate new inputs to run against the
@@ -182,14 +182,12 @@ With the new support, a fuzz target could look like this:
182182
```
183183
func FuzzMarshalFoo(f *testing.F) {
184184
// Seed the initial corpus
185-
inputs := []string{"cat", "DoG", "!mouse!"}
186-
for _, input := range inputs {
187-
f.Add(input, big.NewInt(0))
188-
}
185+
f.Add("cat", big.NewInt(1341))
186+
f.Add("!mouse", big.NewInt(0))
189187
190188
// Run the fuzz test
191189
f.Fuzz(func(t *testing.T, a string, num *big.Int) {
192-
t.Parallel()
190+
t.Parallel() // seed corpus tests can run in parallel
193191
if num.Sign() <= 0 {
194192
t.Skip() // only test positive numbers
195193
}
@@ -226,10 +224,11 @@ Functions that are new and only apply to `testing.F` are listed below.
226224
// those in the Fuzz function.
227225
func (f *F) Add(args ...interface{})
228226
229-
// Fuzz runs the fuzz function, ff, for fuzz testing. It runs ff in a separate
230-
// goroutine. Only one call to Fuzz is allowed per fuzz target, and any
231-
// subsequent calls will panic. If ff fails for a set of arguments, those
232-
// arguments will be added to the seed corpus.
227+
// Fuzz runs the fuzz function, ff, for fuzz testing. While fuzzing with -fuzz,
228+
// the fuzz target and ff may be run in multiple worker processes that don't
229+
// share global state within the process. Only one call to Fuzz is allowed per
230+
// fuzz target, and any subsequent calls will panic. If ff fails for a set of
231+
// arguments, those arguments will be added to the seed corpus.
233232
func (f *F) Fuzz(ff interface{})
234233
```
235234

@@ -238,20 +237,26 @@ func (f *F) Fuzz(ff interface{})
238237
A fuzz target has two main components: 1) seeding the corpus and 2) the `f.Fuzz`
239238
function which is executed for items in the corpus.
240239

241-
1. The corpus generation is done first, and builds a seed corpus with
242-
interesting input values for the fuzz test. This should be fairly quick,
243-
thus able to run before the fuzz testing begins, every time it’s run. These
244-
inputs are run by default with `go test`.
245-
1. The `f.Fuzz(...)` function is executed for each item in the corpus. If this
246-
target is being fuzzed, then new inputs will be generated and continously
247-
tested against the `f.Fuzz(...)` function.
248-
249-
The arguments to `f.Add(...)` and the function in `f.Fuzz(...)` must be the same
250-
type within the target, and there must be at least one argument specified.
240+
1. Defining the seed corpus and any necessary setup work is done before the
241+
`f.Fuzz` function, to prepare for fuzzing.
242+
These inputs, as well as those in `testdata/corpus/FuzzTarget`, are run by
243+
default with `go test`.
244+
1. The `f.Fuzz(...)` function is executed for each item in the seed corpus.
245+
If this target is being fuzzed, then new inputs will be generated and
246+
continously tested using the `f.Fuzz(...)` function.
247+
248+
The arguments to `f.Add(...)` and the fuzzing arguments in the `f.Fuzz` function
249+
must be the same type within the target, and there must be at least one argument
250+
specified.
251251
This will be ensured by a vet check.
252+
252253
Fuzzing of built-in types (e.g. simple types, maps, arrays) and types which
253254
implement the BinaryMarshaler and TextMarshaler interfaces are supported.
254255

256+
In the future, structs that do not implement the BinaryMarshaler and
257+
TextMarshaler interfaces may be supported by building them based on their
258+
exported fields.
259+
255260
Interfaces, functions, and channels are not appropriate types to fuzz, so will
256261
never be supported.
257262

@@ -264,50 +269,77 @@ package, as well as a set of regression inputs for any newly discovered bugs
264269
identified by the fuzzing engine.
265270
This set of inputs is also used to “seed” the corpus used by the fuzzing engine
266271
when mutating inputs to discover new code coverage.
272+
A good seed corpus can save the mutation engine a lot of work (for example
273+
adding a new key type to a key parsing function).
267274

268-
The seed corpus can be populated programmatically using `f.Add` within the
269-
fuzz target.
270-
Programmatic seed corpuses make it easy to add new entries when support for new
271-
things are added (for example adding a new key type to a key parsing function)
272-
saving the mutation engine a lot of work.
273-
These can also be more clear for the developer when they break the build when
274-
something changes.
275+
Each fuzz target will always look in the package’s `testdata/corpus/FuzzTarget`
276+
directory for an existing seed corpus to use, if one exists.
277+
New crashes will also be written to this directory.
275278

276-
The fuzz target will always look in the package’s testdata/ directory for an
277-
existing seed corpus to use as well, if one exists.
278-
This seed corpus will be in a directory of the form `testdata/<target_name>`,
279-
with a file for each unit that can be unmarshaled for testing.
279+
The seed corpus can be populated programmatically using `f.Add` within the fuzz
280+
target.
280281

281282
_Examples:_
282283

283-
1: A fuzz target’s `f.Fuzz` function takes three arguments
284+
1: A fuzz target’s `f.Fuzz` function takes a single `[]byte`.
284285

285286
```
286-
f.Fuzz(func(t *testing.T, a string, b myStruct, num *big.Int) {...})
287+
f.Fuzz(func(t *testing.T, b []byte) {...})
288+
```
287289

288-
type myStruct struct {
289-
A, B string
290-
num int
291-
}
290+
This is the typical “non-structured fuzzing” approach, and only the single
291+
[]byte will be mutated while fuzzing.
292+
293+
2: A fuzz target’s `f.Fuzz` function takes two arguments.
294+
295+
```
296+
f.Fuzz(func(t *testing.T, a string, num *big.Int) {...})
292297
```
293298

294-
In this example, string is a built-in type, so can be decoded directly.
295-
`*big.Int` implements `UnmarshalText`, so can also be unmarshaled directly.
296-
However, `myStruct` does not implement `UnmarshalBinary` or `UnmarshalText` so
297-
the struct is pieced together recursively from its exported types. That would
298-
mean two sets of bytes will be written for this type, one for each of A and B.
299-
In total, four files would be written, and four inputs can be mutated when
300-
fuzzing.
299+
This example uses string, which is a built-in type, and as such can be decoded directly.
300+
`*big.Int` implements `UnmarshalText`, so can also be unmarshaled using that
301+
method.
302+
The mutator will alter the bytes of both the string and the *big.Int while
303+
seeking new code coverage.
304+
305+
### Corpus file encoding
301306

302-
2: A fuzz target’s `f.Fuzz` function takes a single `[]byte`
307+
The `testdata/corpus` directory will hold corpus files which act as the seed
308+
corpus as well as a set of regression tests for identified crashers.
309+
Corpus files must be encoded to support multiple fuzzing arguments.
303310

311+
The first line of the corpus file indicates the encoding "version" of this file,
312+
e.g. "version 1". This is to indicate how the file was encoded, which allows for
313+
new, improved encodings in the future.
314+
315+
For version 1, each subsequent line represents the value of each type making up
316+
the corpus entry. Each line is copy-pastable directly into Go code. The only
317+
case where the line would require editing is for imported struct types, in which
318+
case the import path would be removed when used in code.
319+
320+
For example:
304321
```
305-
f.Fuzz(func(t *testing.T, b []byte) {...})
322+
encV1
323+
float(45.241)
324+
int(12345)
325+
[]byte("ABC\xa8\x8c\xb3G\xfc")
326+
example.com/foo.Bar.UnmarshalText("\xfe\x99Uh\xb4\xe29\xed")
306327
```
307328

308-
This is the typical “non-structured fuzzing” approach.
309-
There is only one set of bytes to be provided by the mutator, so only one file
310-
will be written.
329+
A tool will be provided that can convert between binary files and corpus files
330+
(in both directions).
331+
This tool would serve two main purposes.
332+
It would allow binary files, such as images, or files from other fuzzers, to be
333+
ported over into seed corpus for Go fuzzing.
334+
It would also convert otherwise indecipherable hex bytes into a binary format
335+
which may be easier to read and edit.
336+
337+
To make it easier to understand new crashes, each crash found by the fuzzing
338+
engine will be written to a binary file in $GOCACHE.
339+
This file should not be checked in, as the crash will have already been written
340+
to a corpus file in testdata within the module.
341+
Instead, this file is a way to quickly get an idea about the input which caused
342+
the crash, without requiring a tool to decode it.
311343

312344
### Fuzzing Engine and Mutator
313345

@@ -345,56 +377,49 @@ For other types, this can be done using either
345377
or
346378
<code>[UnmarshalText](https://pkg.go.dev/encoding?tab=doc#TextUnmarshaler)</code>
347379
if implemented on the type.
348-
If building a struct, it can also build exported fields recursively as needed.
380+
In the future, it may support fuzzing struct types which don't implement these
381+
marshalers by building it through its exported fields.
349382

350383
#### Generated corpus
351384

352385
A generated corpus will be managed by the fuzzing engine and will live outside
353-
the module.
354-
New items can be added to this corpus in several ways, e.g. as part of the seed
355-
corpus, or by the fuzzing engine (e.g. because of new code coverage).
386+
the module in a subdirectory of $GOCACHE.
387+
This generated corpus will grow as the fuzzing engine discovers new coverage.
356388

357389
The details of how the corpus is built and processed should be unimportant to
358390
users.
359391
This should be a technical detail that developers don’t need to understand in
360392
order to seed a corpus or write a fuzz target.
361-
Any existing files that a developer wants to include in the fuzz test should be
362-
added to the seed corpus directory, `testdata/<target_name>`.
363-
364-
365-
#### Minification + Pruning
366-
367-
Corpus entries will be minified to the smallest input that causes the failure
368-
where possible, and pruned wherever possible to remove corpus entries that don’t
369-
add additional coverage.
370-
If a developer manually adds input files to the corpus directory, the fuzzing
371-
engine may change the file names in order to help with this.
393+
Any existing files that a developer wants to include in the fuzz test may be
394+
added to the seed corpus.
372395

373396
### Crashers
374397

375-
A **crasher** is a panic or failure in `f.Fuzz(...)`, or a race condition.
398+
A **crasher** is a panic or failure in `f.Fuzz(...)`, or a race condition, which
399+
was found while fuzzing.
376400
By default, the fuzz target will stop after the first crasher is found, and a
377401
crash report will be provided.
378402
Crash reports will include the inputs that caused the crash and the resulting
379403
error message or stack trace.
380-
The crasher inputs will be written to the package's testdata/ directory as a
381-
seed corpus entry.
404+
The crasher inputs will be written to the package's testdata/corpus directory as
405+
after being minified where possible.
382406

383-
Since this crasher is added to testdata/, which will then be run by default as
384-
part of the seed corpus for the fuzz target, this can act as a test for the new
385-
failure.
407+
Since this crasher is added to testdata/corpus, which will then be run by
408+
default as part of the seed corpus for the fuzz target, this can act as a test
409+
for the new failure.
386410
A user experience may look something like this:
387411

388412
1. A user runs `go test -fuzz=FuzzFoo`, and a crasher is found while fuzzing.
389-
1. The arguments that caused the crash are added to a testdata directory within
390-
the package automatically.
391-
1. A subsequent run of `go test` (even without `-fuzz=FuzzFoo`) will then hit
392-
this newly discovering failing condition, and continue to fail until the bug
393-
is fixed.
413+
1. The arguments that caused the crash are added to the testdata/corpus
414+
directory of that package.
415+
1. A subsequent run of `go test` (without needing `-fuzz=FuzzFoo`) will then
416+
reproduce this crash, and continue to fail until the bug is fixed.
417+
A user could also run `go test -run=FuzzFoo/<filename>` to only run a
418+
specific file in the testdata/corpus directory when debugging.
394419

395420
### Go command
396421

397-
Fuzz testing will only be supported in module mode, and if run in GOPATH mode
422+
Fuzz testing will only be supported in module mode, and if run in GOPATH mode,
398423
the fuzz targets will be ignored.
399424

400425
Fuzz targets will be in *_test.go files, and can be in the same file as Test and
@@ -403,12 +428,9 @@ These test files can exist wherever *_test.go files can currently live, and do
403428
not need to be in any fuzz-specific directory or have a fuzz-specific file name
404429
or build tag.
405430

406-
A new environment variable will be added, `$GOFUZZCACHE`, which will default to
407-
an appropriate cache directory on the developer's machine.
408-
This directory will hold the generated corpus.
409-
For example, the corpus for each fuzz target will be managed in a subdirectory
410-
called `<module_name>/<pkg>/@corpus/<target_name>` where `<module_name>` will
411-
follow module case-encoding and include the major version.
431+
The generated corpus will be in a new directory within `$GOCACHE`, in the form
432+
$GOCACHE/fuzz/$pkg/$test/$name, where $pkg is the package path containing the
433+
fuzz target, $test is the target name, and $name is the name of the file.
412434

413435
The default behavior of `go test` will be to build and run the fuzz targets
414436
using the seed corpus only.
@@ -437,45 +459,28 @@ The following flags will be added or have modified meaning:
437459
-keepfuzzing
438460
Keep running the target if a crasher is found. (default false)
439461
-parallel
440-
Allow parallel execution of f.Fuzz functions that call f.Parallel.
441-
The value of this flag is the maximum number of f.Fuzz functions to run
442-
simultaneously within the given fuzz target. (default GOMAXPROCS)
462+
Allow parallel execution of f.Fuzz functions that call t.Parallel when
463+
running the seed corpus.
464+
While fuzzing with -fuzz, the value of this flag is the maximum number of
465+
workers to run the fuzz function simultaneously; by default, it is set to
466+
the value of GOMAXPROCS.
467+
Note that -parallel only applies within a single test binary.
443468
-race
444-
Enable data race detection while fuzzing. (default true)
469+
Enable data race detection while fuzzing. (default false)
470+
-run
471+
Run only those tests, examples, and fuzz targets matching the regular
472+
expression.
473+
For testing a single seed corpus entry for a target, the regular
474+
expression can be in the form $target/$name, where $target is the name of
475+
the fuzz target, and $name is the name of the file (ignoring file
476+
extensions) to run.
445477
```
446478

447479
`go test` will not respect `-p` when running with `-fuzz`, as it doesn't make
448480
sense to fuzz multiple packages at the same time.
449481

450482
## Open issues and future work
451483

452-
### Naming scheme for corpus files
453-
454-
There are several naming schemes for the corpus files which may be appropriate,
455-
and the final decision is still undecided.
456-
457-
Take the following example:
458-
459-
```
460-
f.Fuzz(func(t *testing.T, a string, b myStruct, num *big.Int) {...})
461-
462-
type myStruct struct {
463-
A, B string
464-
num int
465-
}
466-
```
467-
468-
For two corpus entries, this could be structured as follows:
469-
* 0000001.string
470-
* 0000001.myStruct.string
471-
* 0000001.myStruct.string
472-
* 0000001.big_int
473-
* 0000002.string
474-
* 0000002.myStruct.string
475-
* 0000002.myStruct.string
476-
* 0000002.big_int
477-
478-
479484
### Options
480485

481486
There are options that developers often need to fuzz effectively and safely.

0 commit comments

Comments
 (0)