chromium/mojo/docs/mojolpm.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471

# Getting started with MojoLPM

*** note
**Note:** Using MojoLPM to fuzz your Mojo interfaces is intended to be simple,
but there are edge-cases that may require a very detailed understanding of the
Mojo implementation to fix. If you run into problems that you can't understand
readily, send an email to [markbrand@google.com] and cc `fuzzing@chromium.org`
and we'll try and help.

**Prerequisites:** Knowledge of [libfuzzer] and basic understanding
of [Protocol Buffers] and [libprotobuf-mutator]. Basic understanding of
[testing in Chromium].
***

This document will walk you through:
* An overview of MojoLPM and what it's used for.
* Adding a fuzzer to an existing Mojo interface using MojoLPM.

[TOC]

## Overview of MojoLPM

MojoLPM is a toolchain for automatically generating structure-aware fuzzers for
Mojo interfaces using libprotobuf-mutator as the fuzzing engine.

This tool works by using the existing "grammar" for the interface provided by
the .mojom files, and translating that into a Protocol Buffer format that can be
fuzzed by libprotobuf-mutator. These protocol buffers are then interpreted by
a generated runtime as a sequence of mojo method calls on the targeted
interface.

The intention is that using these should be as simple as plugging the generated
code in to the existing unittests for those interfaces - so if you've already
implemented the necessary mocks to unittest your code, the majority of the work
needed to get quite effective fuzzing of your interfaces is already complete!

## Choose the Mojo interface(s) to fuzz

If you're a developer looking to add fuzzing support for an interface that
you're developing, then this should be very easy for you!

If not, then a good starting point is to search for [interfaces] in codesearch.
The most interesting interfaces from a security perspective are those which are
implemented in the browser process and exposed to the renderer process, but
there isn't a very simple way to enumerate these, so you may need to look
through some of the source code to find an interesting one.

For the rest of this guide, we'll write a new fuzzer for
`blink.mojom.CodeCacheHost`, which is defined in
`third_party/blink/public/mojom/loader/code_cache.mojom`.

We then need to find the relevant GN build target for this mojo interface so
that we know how to refer to it later - in this case that is
`//third_party/blink/public/mojom:mojom_platform`.

## Find the implementations of the interfaces

If you are developing these interfaces, then you already know where to find the
implementations.

Otherwise a good starting point is to search for references to
"public blink::mojom::CodeCacheHost". Usually there is only a single
implementation of a given Mojo interface (there are a few exceptions where the
interface abstracts platform specific details, but this is less common). This
leads us to `content/browser/renderer_host/code_cache_host_impl.h` and
`CodeCacheHostImpl`.

## Find the unittest for the implementation

Unfortunately, it doesn't look like `CodeCacheHostImpl` has a unittest, so we'll
have to go through the process of understanding how to create a valid instance
ourselves in order to fuzz this interface.

Since this interface runs in the Browser process, and is part of `/content`,
we're going to create our new fuzzer in `/content/test/fuzzer`.

## Add our testcase proto

First we'll add a proto source file, `code_cache_host_mojolpm_fuzzer.proto`,
which is going to define the structure of our testcases. This is basically
boilerplate, but it allows creating fuzzers which interact with multiple Mojo
interfaces to uncover more complex issues. For our case, this will be a simple
file:

```
syntax = "proto2";

package content.fuzzing.code_cache_host.proto;

import "third_party/blink/public/mojom/loader/code_cache.mojom.mojolpm.proto";

message NewCodeCacheHost {
  required uint32 id = 1;
}

message RunUntilIdle {
  enum ThreadId {
    IO = 0;
    UI = 1;
  }

  required ThreadId id = 1;
}

message Action {
  oneof action {
    NewCodeCacheHost new_code_cache_host = 1;
    RunUntilIdle run_until_idle = 2;
    mojolpm.blink.mojom.CodeCacheHost.RemoteMethodCall code_cache_host_call = 3;
  }
}

message Sequence {
  repeated uint32 action_indexes = 1 [packed=true];
}

message Testcase {
  repeated Action actions = 1;
  repeated Sequence sequences = 2;
  repeated uint32 sequence_indexes = 3 [packed=true];
}
```

This specifies all of the actions that the fuzzer will be able to take - it
will be able to create a new `CodeCacheHost` instance, perform sequences of
interface calls on those instances, and wait for various threads to be idle.

In order to build this proto file, we'll need to copy it into the out/ directory
so that it can reference the proto files generated by MojoLPM - this will be
handled for us by the `mojolpm_fuzzer_test` build rule.

## Add our fuzzer source

Now we're ready to create the fuzzer c++ source file,
`code_cache_host_mojolpm_fuzzer.cc` and the fuzzer build target. This
target is going to depend on both our proto file, and on the c++ source file.
Most of the necessary dependencies will be handled for us, but we do still need
to add some directly.

Note especially the dependency on `mojom_platform_mojolpm` in blink, this is an
autogenerated target where the target containing the generated fuzzer protocol
buffer descriptions will be the name of the mojom target with `_mojolpm`
appended.

```
mojolpm_fuzzer_test("code_cache_host_mojolpm_fuzzer") {
  sources = [
    "code_cache_host_mojolpm_fuzzer.cc"
  ]

  proto_source = "code_cache_host_mojolpm_fuzzer.proto"

   deps = [
    "//base/test:test_support",
    "//content/browser:for_content_tests",
    "//content/public/browser:browser_sources",
    "//content/test:test_support",
    "//services/network:test_support",
    "//storage/browser:test_support",
  ]

  proto_deps = [
    "//third_party/blink/public/mojom:mojom_platform_mojolpm",
  ]
}
```

Now, the minimal source code to do load our testcases:

```c++
// Copyright 2020 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

#include <stdint.h>
#include <utility>

#include "code_cache_host_mojolpm_fuzzer.pb.h"
#include "mojo/core/embedder/embedder.h"
#include "third_party/blink/public/mojom/loader/code_cache.mojom-mojolpm.h"
#include "third_party/libprotobuf-mutator/src/src/libfuzzer/libfuzzer_macro.h"

DEFINE_BINARY_PROTO_FUZZER(
    const content::fuzzing::code_cache_host::proto::Testcase& testcase) {
}
```

You should now be able to build and run this fuzzer (it, of course, won't do
very much) to check that everything is lined up right so far.

## Handle global process setup

Now we need to add some basic setup code so that our process has something that
mostly resembles a normal Browser process; if you look in the file this is
`CodeCacheHostFuzzerEnvironment`, which adds a global environment instance that
will handle setting up this basic environment, which will be reused for all of
our testcases, since starting threads is expensive and slow.

## Handle per-testcase setup

We next need to handle the necessary setup to instantiate `CodeCacheHostImpl`,
so that we can actually run the testcases. At this point, we realise that it's
likely that we want to be able to have multiple `CodeCacheHostImpl`'s with
different render_process_ids and different backing origins, so we need to modify
our proto file to reflect this:

```
message NewCodeCacheHost {
  enum OriginId {
    ORIGIN_A = 0;
    ORIGIN_B = 1;
    ORIGIN_OPAQUE = 2;
    ORIGIN_EMPTY = 3;
  }

  required uint32 id = 1;
  required uint32 render_process_id = 2;
  required OriginId origin_id = 3;
}
```

Note that we're using an enum to represent the origin, rather than a string;
it's unlikely that the true value of the origin is going to be important, so
we've instead chosen a few select values based on the cases mentioned in the
source.

The first thing that we need to do is set-up the basic Browser process
environment; this is what `ContentFuzzerEnvironment` is doing - this has a basic
setup suitable for fuzzing interfaces in `/content`. A few things to be careful
of are that we need to make sure that `mojo::core::Init()` is called (only once)
and we probably want as much freedom as possible in terms of scheduling, so we
want to use slightly different threading options than the average unittest. This
is a singleton type that will live for the entire duration of the fuzzer process
so we don't want to be holding any testcase-specific data here.

The next thing that we need to do is to figure out the basic setup needed to
instantiate the interface we're interested in. Looking at the constructor for
`CodeCacheHostImpl` we need three things; a valid `render_process_id`, an
instance of `CacheStorageContextImpl` and an instance of
`GeneratedCodeCacheContext`. `CodeCacheHostFuzzerContext` is our container for
these per-testcase instances; and will handle creating and binding the instances
of the Mojo interfaces that we're going to fuzz. The most important thing to be
careful of here is that everything happens on the correct thread/sequence. Many
Browser-process objects have specific expectations, and will end up with very
different behaviour if they are created or used from the wrong context.

## Integrate with the generated MojoLPM fuzzer code

Finally, we need to do a little bit more plumbing, to rig up this infrastructure
that we've built together with the autogenerated code that MojoLPM gives us to
interpret and run our testcases. This is the `CodeCacheHostTestcase`, and the
part where the magic happens is here:

```c++
void CodeCacheHostTestcase::NextAction() {
  if (next_idx_ < testcase_.sequence_indexes_size()) {
    auto sequence_idx = testcase_.sequence_indexes(next_idx_++);
    const auto& sequence =
      testcase_.sequences(sequence_idx % testcase_.sequences_size());
    for (auto action_idx : sequence.action_indexes()) {
      if (!testcase_.actions_size() || ++action_count_ > MAX_ACTION_COUNT) {
        return;
      }
      const auto& action =
        testcase_.actions(action_idx % testcase_.actions_size());
      switch (action.action_case()) {
        case content::fuzzing::code_cache_host::proto::Action::kNewCodeCacheHost: {
          cch_context_.AddCodeCacheHost(
            action.new_code_cache_host().id(),
            action.new_code_cache_host().render_process_id(),
            action.new_code_cache_host().origin_id());
        } break;

        case content::fuzzing::code_cache_host::proto::Action::kRunUntilIdle: {
          if (action.run_until_idle().id()) {
            content::RunUIThreadUntilIdle();
          } else {
            content::RunIOThreadUntilIdle();
          }
        } break;

        case content::fuzzing::code_cache_host::proto::Action::kCodeCacheHostCall: {
          mojolpm::HandleRemoteMethodCall(action.code_cache_host_call());
        } break;

        case content::fuzzing::code_cache_host::proto::Action::ACTION_NOT_SET:
          break;
      }
    }
  }
}
```

The key line here in integration with MojoLPM is the last case,
`kCodeCacheHostCall`, where we're asking MojoLPM to treat this incoming proto
entry as a call to a method on the `CodeCacheHost` interface.

There's just a little bit more boilerplate in the bottom of the file to tidy up
concurrency loose ends, making sure that the fuzzer components are all running
on the correct threads; those are more-or-less common to any fuzzer using
MojoLPM.

## Test it!

Make a corpus directory and fire up your shiny new fuzzer!

```
 ~/chromium/src% out/Default/code_cache_host_mojolpm_fuzzer /dev/shm/corpus
INFO: Seed: 3273881842
INFO: Loaded 1 modules   (1121912 inline 8-bit counters): 1121912 [0x559151a1aea8, 0x559151b2cd20),
INFO: Loaded 1 PC tables (1121912 PCs): 1121912 [0x559151b2cd20,0x559152c4b4a0),
INFO:      146 files found in /dev/shm/corpus
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: seed corpus: files: 146 min: 2b max: 268b total: 8548b rss: 88Mb
#147  INITED cov: 4633 ft: 10500 corp: 138/8041b exec/s: 0 rss: 91Mb
#152  NEW    cov: 4633 ft: 10501 corp: 139/8139b lim: 4096 exec/s: 0 rss: 91Mb L: 98/268 MS: 8 Custom-ChangeByte-Custom-EraseBytes-Custom-ShuffleBytes-Custom-Custom-
#154  NEW    cov: 4634 ft: 10510 corp: 140/8262b lim: 4096 exec/s: 0 rss: 91Mb L: 123/268 MS: 3 CustomCrossOver-ChangeBit-Custom-
#157  NEW    cov: 4634 ft: 10512 corp: 141/8384b lim: 4096 exec/s: 0 rss: 91Mb L: 122/268 MS: 3 CustomCrossOver-Custom-CustomCrossOver-
#158  NEW    cov: 4634 ft: 10514 corp: 142/8498b lim: 4096 exec/s: 0 rss: 91Mb L: 114/268 MS: 1 CustomCrossOver-
#159  NEW    cov: 4634 ft: 10517 corp: 143/8601b lim: 4096 exec/s: 0 rss: 91Mb L: 103/268 MS: 1 Custom-
#160  NEW    cov: 4634 ft: 10526 corp: 144/8633b lim: 4096 exec/s: 0 rss: 91Mb L: 32/268 MS: 1 Custom-
#164  NEW    cov: 4634 ft: 10528 corp: 145/8851b lim: 4096 exec/s: 0 rss: 91Mb L: 218/268 MS: 4 CustomCrossOver-Custom-CustomCrossOver-Custom-
```

## Wait for it...

Let the fuzzer run for a while, and keep periodically checking in in case it's
fallen over. It's likely you'll have made a few mistakes somewhere along the way
but hopefully soon you'll have the fuzzer running 'clean' for a few hours.

If your coverage isn't going up at all, then you've probably made a mistake and
it likely isn't managing to actually interact with the interface you're trying
to fuzz - try using the code coverage output from the next step to debug what's
going wrong.

## (Optional) Run coverage

In many cases it's useful to check the code coverage to see if we can benefit
from adding some manual testcases to get deeper coverage. For this example I
used the following command:

```
python tools/code_coverage/coverage.py code_cache_host_mojolpm_fuzzer -b out/Coverage -o ManualReport -c "out/Coverage/code_cache_host_mojolpm_fuzzer -ignore_timeouts=1 -timeout=4 -runs=0 /dev/shm/corpus" -f content
```

With the CodeCacheHost, looking at the coverage after a few hours we could see
that there's definitely some room for improvement:

```c++
/* 55       */ base::Optional<GURL> GetSecondaryKeyForCodeCache(const GURL& resource_url,
/* 56 53.6k */ int render_process_id) {
/* 57 53.6k */    if (!resource_url.is_valid() || !resource_url.SchemeIsHTTPOrHTTPS())
/* 58 53.6k */      return base::nullopt;
/* 59 0     */
/* 60 0     */    GURL origin_lock =
/* 61 0     */        ChildProcessSecurityPolicyImpl::GetInstance()->GetOriginLock(
/* 62 0     */            render_process_id);
```

## (Optional) Improve corpus manually

It's fairly easy to improve the corpus manually, since our corpus files are just
protobuf files that describe the sequence of interface calls to make.

There are a couple of approaches that we can take here - we'll try building a
small manual seed corpus that we'll use to kick-start our fuzzer. Since it's
easier to edit text protos, MojoLPM can automatically convert our seed corpus
from text protos to binary protos during the build, making this slightly less 
painful for us, and letting us store our corpus in-tree in a readable format.

So, we'll create a new folder to hold this seed corpus, and craft our first
file:

```
actions {
  new_code_cache_host {
    id: 1
    render_process_id: 0
    origin_id: ORIGIN_A
  }
}
actions {
  code_cache_host_call {
    remote {
      id: 1
    }
    m_did_generate_cacheable_metadata {
      m_cache_type: CodeCacheType_kJavascript
      m_url {
        new {
          id: 1
          m_url: "http://aaa.com/test"
        }
      }
      m_data {
        new {
          id: 1
          m_bytes {
          }
        }
      m_expected_response_time {
      }
    }
  }
}
sequences {
  action_indexes: 0
  action_indexes: 1
}
sequence_indexes: 0
```

We can then add some new entries to our build target to have the corpus
converted to binary proto directly during build.

```
  testcase_proto_kind = "content.fuzzing.code_cache_host.proto.Testcase"

  seed_corpus_sources = [
    "code_cache_host_mojolpm_fuzzer_corpus/did_generate_cacheable_metadata.textproto",
  ]
```

If we now run a new coverage report using this single file seed corpus:
(note that the binary corpus files will be output in your output directory, in 
this case code_cache_host_mojolpm_fuzzer_seed_corpus.zip):

```
autoninja -C out/Coverage chrome
rm -rf /tmp/corpus; mkdir /tmp/corpus; unzip out/Coverage/code_cache_host_mojolpm_fuzzer_seed_corpus.zip -d /tmp/corpus
python tools/code_coverage/coverage.py code_cache_host_mojolpm_fuzzer -b out/Coverage -o ManualReport -c "out/Coverage/code_cache_host_mojolpm_fuzzer -ignore_timeouts=1 -timeout=4 -runs=0 /tmp/corpus" -f content
```

We can see that we're now getting some more coverage:

```c++
/* 118   */ void CodeCacheHostImpl::DidGenerateCacheableMetadata(
/* 119   */     blink::mojom::CodeCacheType cache_type,
/* 120   */     const GURL& url,
/* 121   */     base::Time expected_response_time,
/* 122 2 */       mojo_base::BigBuffer data) {
/* 123 2 */     if (!url.SchemeIsHTTPOrHTTPS()) {
/* 124 0 */       mojo::ReportBadMessage("Invalid URL scheme for code cache.");
/* 125 0 */       return;
/* 126 0 */     }
/* 127 2 */
/* 128 2 */     DCHECK_CURRENTLY_ON(BrowserThread::UI);
/* 129 2 */
/* 130 2 */     GeneratedCodeCache* code_cache = GetCodeCache(cache_type);
/* 131 2 */     if (!code_cache)
/* 132 0 */       return;
/* 133 2 */
/* 134 2 */     base::Optional<GURL> origin_lock =
/* 135 2 */         GetSecondaryKeyForCodeCache(url, render_process_id_);
/* 136 2 */     if (!origin_lock)
/* 137 0 */       return;
/* 138 2 */
/* 139 2 */     code_cache->WriteEntry(url, *origin_lock, expected_response_time,
/* 140 2 */                            std::move(data));
/* 141 2 */ }
```

Much better!

[markbrand@google.com]: mailto:markbrand@google.com?subject=[MojoLPM%20Help]:%20&cc=fuzzing@chromium.org
[libfuzzer]: https://source.chromium.org/chromium/chromium/src/+/master:testing/libfuzzer/getting_started.md
[Protocol Buffers]: https://developers.google.com/protocol-buffers/docs/cpptutorial
[libprotobuf-mutator]: https://source.chromium.org/chromium/chromium/src/+/master:testing/libfuzzer/libprotobuf-mutator.md
[testing in Chromium]: https://source.chromium.org/chromium/chromium/src/+/master:docs/testing/testing_in_chromium.md
[interfaces]: https://source.chromium.org/search?q=interface%5Cs%2B%5Cw%2B%5Cs%2B%7B%20f:%5C.mojom$%20-f:test