Skip to content

Revise schema and formats facility #183

@lukpueh

Description

@lukpueh

Description of issue or feature request:
Here are several loosely ordered observations in regards to securesystemslib's schema/formats facility:

Current behavior:

  • Some schemas sound more specific than they are, e.g. RELPATH_SCHEMA == PATH_SCHEMA == AnyString. This is misleading, when assessing a function's capability of sanitizing inputs.

  • Some schemas are an odd replacement for constants, e.g. schemas that define hard-coded strings, like "rsassa-pss-sha256", and are then used to check the same strings, which have been hardcoded into function default arguments.

  • Schema validation seems generally overused. In TUF nearly every function runs check_match for each argument. IMHO argument sanitizing is mostly important in public-facing interfaces. Programming errors, on the other, should be caught through extensive testing and code review.
    Using it everywhere makes the code bloated, also because there are a lot of generic comments describing the checks, and the obligatory and also quite generic FormatError, if <arg> is not properly formatted- blocks in the <Exceptions> section of docstrings.

  • Schema checking sometimes makes execution branches unreachable (also see Unreachable else best practice code-style-guidelines#18):

    X_OR_Y_SCHEMA.check_match(arg)
    
    if arg == X:
      # ...
    elif arg == Y:
      # ...
    else:
      raise WillNeverBeRaisedError()

    And I've even seen cases where WillNeverBeRaisedError is also listed in the docstring as Exception that might be raised (documentation rot)

  • The error messages from checking schemas with the check_match method are often not helpful, because they usually don't show the value of the checked variable or lack context.

Expected behavior:

  • Review existing schemas for their validity/strictness, especially when chained.

  • Only use in public facing interfaces (open for discussion).
    This would also require a clearer definition of what functions should be public interfaces, which btw. TUF/in-toto integrators would greatly benefit from.

  • At least don't blindly add <arg>.<SCHEMA>.check_match to every function. Coordinate with the rest of the function, and its purpose.

  • Make sure the error message is helpful, e.g. by:

    • supporting an error message override argument in check_match, or by
    • using matches plus raise FormatError (or maybe even just ValueError) with a custom message instead of check_match.
  • Disambiguate schemas and constants.

Metadata

Metadata

Assignees

No one assigned

    Labels

    legacyIssues related to legacy interfaces (obsolete with #731)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions