Skip to content

Performance issues with referencing  #1088

@robherring

Description

@robherring

I'm working on moving dtschema over to use referencing Registry instead of RefResolver. With the change, the validation time has gone from ~12 sec to 6 minutes.

The schemas I have are ~3k schema files which are mostly defining a single object. They are all loaded into a Registry. Each validation run iterates over 100s of instances with all the schemas. I'm

Profiling puts almost all the time spent in Registry.combine():

 4324    0.016    0.000   10.630    0.002 _core.py:457(combine)                                        
 8648    8.496    0.001    8.514    0.001 {method 'update' of 'rpds.HashTrieMap' objects}              
 4323    2.014    0.000    2.014    0.000 {method 'update' of 'rpds.HashTrieSet' objects}              

(This was Ctrl-C out of after 12 secs)

The root problem seems to be every new Validator instance created in 'descend' is creating a new Registry to add the json-schema.org schemas into my Registry. Commenting out this gets the performance back:

diff --git a/jsonschema/validators.py b/jsonschema/validators.py
index 23ea17c878d4..783a931f9dc2 100644
--- a/jsonschema/validators.py
+++ b/jsonschema/validators.py
@@ -202,7 +202,7 @@ def create(
# TODO: include new meta-schemas added at runtime
_registry: referencing.jsonschema.SchemaRegistry = field(
default=SPECIFICATIONS,

  •        converter=SPECIFICATIONS.combine,  # type: ignore[misc]
    
  •        #converter=SPECIFICATIONS.combine,  # type: ignore[misc]
           kw_only=True,
           repr=False,
       )
    

Obviously not the right fix. It needs to do the combine once, rather than every 'evolve'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething doesn't work the way it should.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions