Skip to content

Commit f49d8c5

Browse files
Use Python types to declare document fields (#1845) (#1847)
* basic type annotation support * support for optional, list, and other type hints * additional typing support * dataclass-like behavior for Document and InnerDoc * unit tests * support InstrumentedField in Search class * documentation * Update docs/persistence.rst Co-authored-by: Quentin Pradet <[email protected]> * Update elasticsearch_dsl/document_base.py Co-authored-by: Quentin Pradet <[email protected]> * addressed review feedback * better docs for Optional * fix optional in test --------- Co-authored-by: Quentin Pradet <[email protected]> (cherry picked from commit 0c3ffcd) Co-authored-by: Miguel Grinberg <[email protected]>
1 parent aafccf5 commit f49d8c5

File tree

13 files changed

+870
-72
lines changed

13 files changed

+870
-72
lines changed

docs/persistence.rst

+174-6
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ layer for your application.
99
For more comprehensive examples have a look at the examples_ directory in the
1010
repository.
1111

12-
.. _examples: https://github.com/elastic/elasticsearch-dsl-py/tree/master/examples
12+
.. _examples: https://github.com/elastic/elasticsearch-dsl-py/tree/main/examples
1313

1414
.. _doc_type:
1515

@@ -66,14 +66,14 @@ settings in elasticsearch (see :ref:`life-cycle` for details).
6666
Data types
6767
~~~~~~~~~~
6868

69-
The ``Document`` instances should be using native python types like
69+
The ``Document`` instances use native python types like ``str`` and
7070
``datetime``. In case of ``Object`` or ``Nested`` fields an instance of the
71-
``InnerDoc`` subclass should be used just like in the ``add_comment`` method in
72-
the above example where we are creating an instance of the ``Comment`` class.
71+
``InnerDoc`` subclass is used, as in the ``add_comment`` method in the above
72+
example where we are creating an instance of the ``Comment`` class.
7373

7474
There are some specific types that were created as part of this library to make
75-
working with specific field types easier, for example the ``Range`` object used
76-
in any of the `range fields
75+
working with some field types easier, for example the ``Range`` object used in
76+
any of the `range fields
7777
<https://www.elastic.co/guide/en/elasticsearch/reference/current/range.html>`_:
7878

7979
.. code:: python
@@ -103,6 +103,174 @@ in any of the `range fields
103103
# empty range is unbounded
104104
Range().lower # None, False
105105
106+
Python Type Hints
107+
~~~~~~~~~~~~~~~~~
108+
109+
Document fields can be defined using standard Python type hints if desired.
110+
Here are some simple examples:
111+
112+
.. code:: python
113+
114+
from typing import Optional
115+
116+
class Post(Document):
117+
title: str # same as title = Text(required=True)
118+
created_at: Optional[datetime] # same as created_at = Date(required=False)
119+
published: bool # same as published = Boolean(required=True)
120+
121+
It is important to note that when using ``Field`` subclasses such as ``Text``,
122+
``Date`` and ``Boolean``, they must be given in the right-side of an assignment,
123+
as shown in examples above. Using these classes as type hints will result in
124+
errors.
125+
126+
Python types are mapped to their corresponding field type according to the
127+
following table:
128+
129+
.. list-table:: Python type to DSL field mappings
130+
:header-rows: 1
131+
132+
* - Python type
133+
- DSL field
134+
* - ``str``
135+
- ``Text(required=True)``
136+
* - ``bool``
137+
- ``Boolean(required=True)``
138+
* - ``int``
139+
- ``Integer(required=True)``
140+
* - ``float``
141+
- ``Float(required=True)``
142+
* - ``bytes``
143+
- ``Binary(required=True)``
144+
* - ``datetime``
145+
- ``Date(required=True)``
146+
* - ``date``
147+
- ``Date(format="yyyy-MM-dd", required=True)``
148+
149+
To type a field as optional, the standard ``Optional`` modifier from the Python
150+
``typing`` package can be used. The ``List`` modifier can be added to a field
151+
to convert it to an array, similar to using the ``multi=True`` argument on the
152+
field object.
153+
154+
.. code:: python
155+
156+
from typing import Optional, List
157+
158+
class MyDoc(Document):
159+
pub_date: Optional[datetime] # same as pub_date = Date()
160+
authors: List[str] # same as authors = Text(multi=True, required=True)
161+
comments: Optional[List[str]] # same as comments = Text(multi=True)
162+
163+
A field can also be given a type hint of an ``InnerDoc`` subclass, in which
164+
case it becomes an ``Object`` field of that class. When the ``InnerDoc``
165+
subclass is wrapped with ``List``, a ``Nested`` field is created instead.
166+
167+
.. code:: python
168+
169+
from typing import List
170+
171+
class Address(InnerDoc):
172+
...
173+
174+
class Comment(InnerDoc):
175+
...
176+
177+
class Post(Document):
178+
address: Address # same as address = Object(Address, required=True)
179+
comments: List[Comment] # same as comments = Nested(Comment, required=True)
180+
181+
Unfortunately it is impossible to have Python type hints that uniquely
182+
identify every possible Elasticsearch field type. To choose a field type that
183+
is different than the ones in the table above, the field instance can be added
184+
explicitly as a right-side assignment in the field declaration. The next
185+
example creates a field that is typed as ``Optional[str]``, but is mapped to
186+
``Keyword`` instead of ``Text``:
187+
188+
.. code:: python
189+
190+
class MyDocument(Document):
191+
category: Optional[str] = Keyword()
192+
193+
This form can also be used when additional options need to be given to
194+
initialize the field, such as when using custom analyzer settings or changing
195+
the ``required`` default:
196+
197+
.. code:: python
198+
199+
class Comment(InnerDoc):
200+
content: str = Text(analyzer='snowball', required=True)
201+
202+
When using type hints as above, subclasses of ``Document`` and ``InnerDoc``
203+
inherit some of the behaviors associated with Python dataclasses, as defined by
204+
`PEP 681 <https://peps.python.org/pep-0681/>`_ and the
205+
`dataclass_transform decorator <https://typing.readthedocs.io/en/latest/spec/dataclasses.html#dataclass-transform>`_.
206+
To add per-field dataclass options such as ``default`` or ``default_factory``,
207+
the ``mapped_field()`` wrapper can be used on the right side of a typed field
208+
declaration:
209+
210+
.. code:: python
211+
212+
class MyDocument(Document):
213+
title: str = mapped_field(default="no title")
214+
created_at: datetime = mapped_field(default_factory=datetime.now)
215+
published: bool = mapped_field(default=False)
216+
category: str = mapped_field(Keyword(required=True), default="general")
217+
218+
When using the ``mapped_field()`` wrapper function, an explicit field type
219+
instance can be passed as a first positional argument, as the ``category``
220+
field does in the example above.
221+
222+
Static type checkers such as `mypy <https://mypy-lang.org/>`_ and
223+
`pyright <https://github.com/microsoft/pyright>`_ can use the type hints and
224+
the dataclass-specific options added to the ``mapped_field()`` function to
225+
improve type inference and provide better real-time suggestions in IDEs.
226+
227+
One situation in which type checkers can't infer the correct type is when
228+
using fields as class attributes. Consider the following example:
229+
230+
.. code:: python
231+
232+
class MyDocument(Document):
233+
title: str
234+
235+
doc = MyDocument()
236+
# doc.title is typed as "str" (correct)
237+
# MyDocument.title is also typed as "str" (incorrect)
238+
239+
To help type checkers correctly identify class attributes as such, the ``M``
240+
generic must be used as a wrapper to the type hint, as shown in the next
241+
examples:
242+
243+
.. code:: python
244+
245+
from elasticsearch_dsl import M
246+
247+
class MyDocument(Document):
248+
title: M[str]
249+
created_at: M[datetime] = mapped_field(default_factory=datetime.now)
250+
251+
doc = MyDocument()
252+
# doc.title is typed as "str"
253+
# doc.created_at is typed as "datetime"
254+
# MyDocument.title is typed as "InstrumentedField"
255+
# MyDocument.created_at is typed as "InstrumentedField"
256+
257+
Note that the ``M`` type hint does not provide any runtime behavior and its use
258+
is not required, but it can be useful to eliminate spurious type errors in IDEs
259+
or type checking builds.
260+
261+
The ``InstrumentedField`` objects returned when fields are accessed as class
262+
attributes are proxies for the field instances that can be used anywhere a
263+
field needs to be referenced, such as when specifying sort options in a
264+
``Search`` object:
265+
266+
.. code:: python
267+
268+
# sort by creation date descending, and title ascending
269+
s = MyDocument.search().sort(-MyDocument.created_at, MyDocument.title)
270+
271+
When specifying sorting order, the ``+`` and ``-`` unary operators can be used
272+
on the class field attributes to indicate ascending and descending order.
273+
106274
Note on dates
107275
~~~~~~~~~~~~~
108276

elasticsearch_dsl/__init__.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
from .aggs import A
2020
from .analysis import analyzer, char_filter, normalizer, token_filter, tokenizer
2121
from .document import AsyncDocument, Document
22-
from .document_base import InnerDoc, MetaField
22+
from .document_base import InnerDoc, M, MetaField, mapped_field
2323
from .exceptions import (
2424
ElasticsearchDslException,
2525
IllegalOperation,
@@ -148,6 +148,7 @@
148148
"Keyword",
149149
"Long",
150150
"LongRange",
151+
"M",
151152
"Mapping",
152153
"MetaField",
153154
"MultiSearch",
@@ -178,6 +179,7 @@
178179
"char_filter",
179180
"connections",
180181
"construct_field",
182+
"mapped_field",
181183
"normalizer",
182184
"token_filter",
183185
"tokenizer",

elasticsearch_dsl/_async/document.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,11 @@
1818
import collections.abc
1919

2020
from elasticsearch.exceptions import NotFoundError, RequestError
21+
from typing_extensions import dataclass_transform
2122

2223
from .._async.index import AsyncIndex
2324
from ..async_connections import get_connection
24-
from ..document_base import DocumentBase, DocumentMeta
25+
from ..document_base import DocumentBase, DocumentMeta, mapped_field
2526
from ..exceptions import IllegalOperation
2627
from ..utils import DOC_META_FIELDS, META_FIELDS, merge
2728
from .search import AsyncSearch
@@ -62,6 +63,7 @@ def construct_index(cls, opts, bases):
6263
return i
6364

6465

66+
@dataclass_transform(field_specifiers=(mapped_field,))
6567
class AsyncDocument(DocumentBase, metaclass=AsyncIndexMeta):
6668
"""
6769
Model-like class for persisting documents in elasticsearch.

elasticsearch_dsl/_sync/document.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,11 @@
1818
import collections.abc
1919

2020
from elasticsearch.exceptions import NotFoundError, RequestError
21+
from typing_extensions import dataclass_transform
2122

2223
from .._sync.index import Index
2324
from ..connections import get_connection
24-
from ..document_base import DocumentBase, DocumentMeta
25+
from ..document_base import DocumentBase, DocumentMeta, mapped_field
2526
from ..exceptions import IllegalOperation
2627
from ..utils import DOC_META_FIELDS, META_FIELDS, merge
2728
from .search import Search
@@ -60,6 +61,7 @@ def construct_index(cls, opts, bases):
6061
return i
6162

6263

64+
@dataclass_transform(field_specifiers=(mapped_field,))
6365
class Document(DocumentBase, metaclass=IndexMeta):
6466
"""
6567
Model-like class for persisting documents in elasticsearch.

0 commit comments

Comments
 (0)