-
-
Notifications
You must be signed in to change notification settings - Fork 595
Description
I'm working on moving dtschema over to use referencing Registry instead of RefResolver. With the change, the validation time has gone from ~12 sec to 6 minutes.
The schemas I have are ~3k schema files which are mostly defining a single object. They are all loaded into a Registry. Each validation run iterates over 100s of instances with all the schemas. I'm
Profiling puts almost all the time spent in Registry.combine():
4324 0.016 0.000 10.630 0.002 _core.py:457(combine)
8648 8.496 0.001 8.514 0.001 {method 'update' of 'rpds.HashTrieMap' objects}
4323 2.014 0.000 2.014 0.000 {method 'update' of 'rpds.HashTrieSet' objects}
(This was Ctrl-C out of after 12 secs)
The root problem seems to be every new Validator instance created in 'descend' is creating a new Registry to add the json-schema.org schemas into my Registry. Commenting out this gets the performance back:
diff --git a/jsonschema/validators.py b/jsonschema/validators.py
index 23ea17c878d4..783a931f9dc2 100644
--- a/jsonschema/validators.py
+++ b/jsonschema/validators.py
@@ -202,7 +202,7 @@ def create(
# TODO: include new meta-schemas added at runtime
_registry: referencing.jsonschema.SchemaRegistry = field(
default=SPECIFICATIONS,
-
converter=SPECIFICATIONS.combine, # type: ignore[misc]
-
#converter=SPECIFICATIONS.combine, # type: ignore[misc] kw_only=True, repr=False, )
Obviously not the right fix. It needs to do the combine once, rather than every 'evolve'.