-
-
Notifications
You must be signed in to change notification settings - Fork 591
Integrating continuous fuzzing by way of OSS-Fuzz #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi there. Thanks for the offer / sending the PR. So -- all of jsonschema's code is pure Python at the minute, so I'd be curious whether OSS-Fuzz could say anything interesting without at least some information about how to generate JSON Schema specification -alike objects. But happy to see what pops out too. The email address that's in the SECURITY.md file is a decent place to send these. CC @Zac-HD in case you're interested or have opinions :) And thanks for raising! |
I didn't know this, though surely @Zac-HD will, but looks like OSS-Fuzz supports generating data via Hypothesis. Something like that would be a big improvement over random dictionary poking I think. Zac may already be doing that himself as part of hypothesis-jsonschema? But if not, @DavidKorczynski I think that'd be the right kind of integration with OSSFuzz. |
Am happy to set up a property-based approach with Hypothesis if you are happy to integrate with OSS-Fuzz! If I go ahead with the integration I can submit the fuzzers upstream in this repository instead of keeping them on OSS-Fuzz, then we can also get the property-based testing going - does that sound good? |
Sounds good to me! |
I do indeed have opinions! (and wrote the integration docs over the weekend 😉) The real trick here is that Hypothesis supports driving arbitrary tests using a traditional fuzzer, which naturally includes OSS-Fuzz's various backends. In short, I'd be very surprised if you can discover interesting bugs by parsing strings into JSON and nothing else - though I can imagine this working OK if you had all the JSON tokens ( Fortunately though, The main tricks will be that:
Also happy to collaborate and split any integration reward or direct to charity (e.g. the PSF or one of https://www.givewell.org/charities/top-charities) |
@Zac-HD I think I followed that (and very helpful as usual). I can't say in my brain that I know yet what sorts of fuzzing seem useful here but as you say you've certainly found issues via it before so maybe there are more gaps to fill...
This sounds great to me too yeah. Should take it offline maybe to discuss but you know I still have a soft spot for PyPy so throwing some dollars at them to make hypothesis+PyPy support even better is attractive :) but so is PSF. |
My general view on refining the fuzzer to rely on more structural approaches is that it should be verified empirically. The argument is that the coverage-guided aspects of the fuzzing engine will be great at coming up with inputs that satisfy the various input structures of the target application. I think this is particularly true in a case like this where the execution speed is high, the structural complexity of json is relatively low (say in comparison to PDFs or image formats), and OSS-Fuzz will throw significant CPU power on it. The original fuzzer starts hitting into jsonschema (in seconds). Based on these my personal view is to refine only after we get empirical results, i.e. if we get results then that's great and if not then we should refine. Naturally I respect the view of the maintainers - but my personal advice would be to either not refine at first or have both. The perspective the fuzzer takes (the original one) was simply to follow the pattern described here https://pypi.org/project/jsonschema/ i.e. the comment in the code |
The question though to me is what results we are expecting. For a normal fuzzing process that OSS-Fuzz is running, it seems to me often that is "the software doesn't crash", especially if it's fuzzing code in memory-unsafe languages. If a fuzzer is to say anything useful though about JSON Schema (and this library If I'm understanding your comment I think you're saying that we should test one invariant "valid JSON doesn't blow up I think if I follow @Zac-HD's comment:
That that looks more like what I'd expect, namely if the key invariant is "valid pairs of schemas and instances produce successful jsonschema output" and "invalid pairs of schema and instances produce unsuccessful jsonschema output" that there's likely to be more bang-for-the-buck there. But I'm as I say also willing to go with what the experts say :) so you @DavidKorczynski may be more familiar with OSS-Fuzz and I know @Zac-HD is more familiar with property testing in general so I'd be willing to defer. |
Ah right - now I understand. The bugs that I am after are unhandled exceptions. |
Probably you know this but class Foo(dict):
def __getitem__(self, key):
if key == "12": raise ZeroDivisionError()
return self.__dict__[key] you indeed may get But for suitable subsets of objects, ones I assume you'll use as fuzzing input, then yeah. (And fair enough, maybe we start there.) |
Re: donation of any integration rewardI'd be very happy to direct it to PyPy for use at their discretion... and to include a note suggesting that efficient code coverage would be great for fuzzers 😉 Fuzzing with
|
9ad349be Merge pull request python-jsonschema#773 from jdesrosiers/annotation-propname-desc f164982c Merge pull request python-jsonschema#771 from bavulapati/add-blaze-as-consumer d2bd2ad2 Update annotation test description for propertyNames a7a64707 Add [Blaze](https://github.com/sourcemeta/blaze) to the README 9f256c88 Change "expected" to an object with schema locations 7f996868 Update content tests to only apply to string instances 5338ecd1 Remove tests that assert a keyword doesn't emit annotations 738653b5 Make order of assertion properties consistent 8b5de3b9 Updates based on feedback from Juan 6270e399 Updates based on feedback from Karen 341df3ec Add automation to check that annotation tests are valid 16988c67 Add annotation tests bc919bdb Merge pull request python-jsonschema#755 from V02460/unevaluated-additional-properties 83e866b4 Merge pull request python-jsonschema#763 from michaelmior/propertynames-const c5a9703f Merge pull request python-jsonschema#760 from OptimumCode/rfc3490-label-separator b4c09b65 Add tests for propertyNames with const/enum 4fa572d8 Move tests for rfc3490#3.1 into a separate test case ce9f68ca Add link to rfc and quote ad94cacc Add test cases for other valid label separators in IDN hostnames 39002ae7 Merge pull request python-jsonschema#762 from OptimumCode/rfc-html-link c8780535 Correct section anchor for rfc URL template 5f2ca7d6 Modify rfc url template to use html version 9c5d99b6 Merge pull request python-jsonschema#761 from OptimumCode/annotation-script-rfc-support 9563ce7b Correct rfc URL template - incorrect path pattern was used 961bfad0 Correct spec kind extraction from defined key. Continue on unkown URL kind e524505b Merge pull request python-jsonschema#759 from sirosen/hostname-format-reject-single-dot 4a3efd18 Add negative tests for "." for hostname formats 4ba013d5 Merge pull request python-jsonschema#747 from santhosh-tekuri/duration aa500e80 Merge pull request python-jsonschema#749 from json-schema-org/gregsdennis/json-everything-update eb8ce976 Merge pull request python-jsonschema#757 from ajevans99/main dcdae5c0 Merge pull request python-jsonschema#758 from sirosen/hostname-format-check-empty-string db21d21b Merge branch 'main' into hostname-format-check-empty-string 3fd78f04 Merge pull request python-jsonschema#1 from ajevans99/swift-json-schema 3cada3a9 Update README.md 5273e0d6 Make test descriptions more specific 43828fee Simplify adjacent additionalProperties test 347d6099 unevaluatedProperties: Remove type keywords 7dfbb1e9 Add test for unevaluatedProperties 82a07749 Merge pull request python-jsonschema#753 from json-schema-org/ether/fix-draft-locations a66d23d4 move draft-specific files to the dedicated dir for its draft 8ef15501 Merge pull request python-jsonschema#751 from big-andy-coates/format_tests_under_format fe1b1392 All format test cases should be under the `format` directory. b1ee90f6 json-everything moved to an org c00a3f94 test: duration format must start with P 9fc880bf Merge pull request python-jsonschema#740 from notEthan/format-pattern-control-char cbd48ea5 Simplify test of \a regex character to test directly against `pattern` schema d6f1010a Merge pull request python-jsonschema#746 from json-schema-org/annotations 4aec22c1 Revert the changes to additionalProperties.json. 2dc10671 Move the workflow step title. d9ce71ac May as well also show quotes in the annotation. 1b719a84 Pick the line after the description when attaching spec annotations. 08105151 Markdown is apparently not (yet?) supported in annotations. 81645773 Tidy up the specification annotator a bit. 38628b79 Make the spec URLs structure a bit easier for internal use. 4ebbeaf4 Merge branch 'Era-cell/main' e4bd7554 dumbness2 corrected d8ade402 inside run 57c7c869 changed install location 11f8e511 Added installing command in workflow f2766616 template library, url loads changes c2badb12 Merge pull request python-jsonschema#734 from OptimumCode/idn-hostname-arabic-indic-mixed dd9599a5 Merge branch 'main' of github.com:json-schema-org/JSON-Schema-Test-Suite 5b393436 add pr dependencies action 3a509007 Clear existin annotations on same PR 23674123 Cases for rfc and iso written separately 0b780b2c Corected yaml format 2b1ffb74 Best practices followed with optimized code e88a2da6 Works for all OS 7b40efe4 Base path for neighbouring file? 564e6957 Walking through all leaf files 7b84fb44 Merge branch 'main' of https://github.com/Era-cell/JSON-Schema-Test-Suite 891d0265 First workflow2 1c175195 regex correction 96f7683a Final correction2 - file names beautufied 5f050a07 Final correction1 77527b63 Stupidity corrected eb8fd760 Branch name specified 540a269b Log2 f29d090a Wrong location sepcification 582e12be logging logs check df3bdecc path corrected c6b937ca Reading all jsons and spec urls added cbdd1755 change day2 54f3784a Merge pull request python-jsonschema#731 from MeastroZI/main 79dc92f1 TOKEN ce52852d Python file location changed 3558c2c6 Fake add to tests eecc7b7a Merge branch 'main' of https://github.com/Era-cell/JSON-Schema-Test-Suite 810d148a First workflow2 4eac02c7 First workflow ff29264c Merge pull request python-jsonschema#741 from harrel56/chore/tabs-to-spaces 9f39cf73 use spaces instead of tabs 2f3b5f7a Corrected replaced unevaluated with additoinalProperties 40bcb8b3 Corrected replaced unevaluated with additoinalProperties fa9224d7 Merge pull request python-jsonschema#732 from MeastroZI/main2 83bedd5c Changing descriptions 49f73429 fixing tests e6d6a081 adding more test cases 7e6c9be6 changing descriptions 959aca92 shifting test 605d7d78 Update propertyDependencies.json : test must be tests deb82824 test for dependentSchema and propertyDependencies with unevaluatedProperties and additionalProperties ea485124 Merge branch 'json-schema-org:main' into main 64a3e7b3 Merge pull request python-jsonschema#721 from json-schema-org/gregsdennis/dynamicref-skips-resources b9f14e64 Fix $schema in new new test 3d5048e8 Merge pull request python-jsonschema#733 from Era-cell/main 4ae14268 Add valid first character to avoid Bidi rule violation 2480edba Update additionalProperties.json formatting it 6aa79c0b Update additionalProperties.json formatting it 3e0139a5 Update tests/draft-next/additionalProperties.json 616240b0 Update tests/draft-next/additionalProperties.json c5f3e4ea Update tests/draft2020-12/propertyNames.json 964efb8e propertyNames doesn't affect additionalProperties, tests exist already for unevaluatedProps f08b884c Cases go under additional and unevaluated Properties 99864ff6 added tests for propertyNames with additionalProperties/unevaluatedProperties, also with specification property 3b5782b6 Update ref.json : changing $Ids 546b3561 test for $ref with $recursiveAnchor 57617f25 Merge pull request python-jsonschema#726 from Era-cell/main 51fc69cd meta data and property names constraints added, additional Items: string 9b169bed specification takes array of objects having section and quote 1362a8cc Pattern for para corrected 340116ec Schema of specification in much structured 003ac021 Test-schema including sub-schema for scpecification 50a20280 adding specification enhancement for additionalProperties 604f5f99 Drop tests of `$id` and `$anchor` that just test values against meta-schema `pattern` for those properties 9cd64ec9 come on man, save all the files f494440e use unique $id in optional tests, too 468453b0 use unique $id 9ec6d17e fix copy/paste error b284f423 add tests for $dynamicRef skipping over resources bf0360f4 add $recursiveAnchor to 2019-09 meta-schemas 0519d1f0 add $dynamicAnchor to meta-schemas b41167c7 Merge pull request python-jsonschema#714 from json-schema-org/more-not 4221a55a Add tests for not: {} schemas for all values. c499d1d2 Merge pull request python-jsonschema#713 from spacether/patch-1 24a471bd Update README.md 544f7c3d Merge pull request python-jsonschema#712 from otto-ifak/main 9dad3ebe Add tests for enum with array of bool 589a0858 Merge pull request python-jsonschema#706 from marksparkza/unevaluated-before-ref 64d5cab9 Merge pull request python-jsonschema#710 from spacether/patch-1 418cdbd6 Removes idea folder e0a9e066 Updates all other tests to mention grapheme/graphemes 217bf81b Merge pull request python-jsonschema#701 from json-schema-org/ether/dynamicRef-boolean 7a3d06d7 I remove a test that doesn't make sense. e8bf453d Move tests with ids in non-schemas to optional 69136952 Update minLength.json d545be21 Fix duplidate identifiers in recently added tests 4e9640c8 test when $dynamicRef references a boolean schema 3dab98ca Merge pull request python-jsonschema#705 from json-schema-org/gregsdennis/remove-contains-objects-tests 1d3aa495 remove more maxContains 4a2c61e8 Test unevaluatedItems|Properties before $ref ec553d76 contains no longer applies to objects 0433a2bf Merge pull request python-jsonschema#704 from big-andy-coates/clarify-format-requirements c685195f Merge pull request python-jsonschema#703 from big-andy-coates/link-to-creek-validator-comprison-site a46174b0 Add more detail around test runner requirements for `format` tests bb1de8a9 The site linked to is a data-driven functional and performance benchmark of JVM based validator implementations. d38ddd54 Merge pull request python-jsonschema#696 from jdesrosiers/unevaluated-dynamicref 5d0c05fa Fix copy/paste error 95fe6ca2 Merge pull request python-jsonschema#694 from json-schema-org/heterogeneous-additionalItems 9c88a0be Merge pull request python-jsonschema#697 from json-schema-org/gregsdennis/add-ref-into-known-nonapplicator 49222046 Add unevaluted with dynamic ref tests to draft-next 8ba1c90d Update unevaluted with dynamic ref to be more likely to catch errors fea2cf19 add tests for 2019 and 2020 6695ca38 add optional tests for `$ref`ing into known non-applicator keywords 2834c630 Add tests for unevaluated with dynamic reference cda4281c Merge pull request python-jsonschema#695 from json-schema-org/ether/clean-up-subSchemas 7b9f45c2 move subSchemas-defs.json to subSchemas.json e41ec0ec remove unused definition files 349c5a82 Merge pull request #692 from json-schema-org/ether/fix-subSchemas-refs 451baca4 Merge pull request python-jsonschema#670 from marksparkza/invalid-output-test b8da838a Add tests for heterogeneous arrays with additionalItems 6d7a44b7 fix subschema locations and their $refs a9a1e2e3 Merge pull request python-jsonschema#690 from skryukov/add-ipv4-mask-test ba52c48a Merge pull request python-jsonschema#689 from skryukov/add-schema-keyword-to-required-tests 69b53add Add a test case for ipv4 with netmask d0c602a7 Add $schema keyword to required tests 20f1f52c Merge pull request python-jsonschema#688 from spacether/feat_updates_python_exp_impl b087b3ca Updates implmentation 4ecd01f3 Merge pull request python-jsonschema#687 from swaeberle/check-single-label-idn-hostnames 732e7275 test single label IDN hostnames 202d5625 test: hostname format check fails on empty string ea0b63c9 Remove invalid output tests git-subtree-dir: json git-subtree-split: 9ad349be933f1e74810cb4fd3ad19780694dc77e
Hi,
I was thinking that it would be nice to set up continuous fuzzing of jsonschema, by way of OSS-Fuzz. In this PR: google/oss-fuzz#4996 I have done exactly that, namely created the necessary logic from an OSS-Fuzz perspective to integrate jsonschema. This includes developing initial fuzzers as well as integrating into OSS-Fuzz.
Essentially, OSS-Fuzz is a free service run by Google that performs continuous fuzzing of important open source projects. The only expectation of integrating into OSS-Fuzz is that bugs will be fixed. This is not a "hard" requirement in that no one enforces this and the main point is if bugs are not fixed then it is a waste of resources to run the fuzzers, which we would like to avoid.
If you would like to integrate, could I please have an email(s) that will get access to the data produced by OSS-Fuzz, such as bug reports, coverage reports and more stats. Notice the emails affiliated with the project will be public in the OSS-Fuzz repo, as they will be part of a configuration file.
The text was updated successfully, but these errors were encountered: