|
15 | 15 | // specific language governing permissions and limitations |
16 | 16 | // under the License. |
17 | 17 |
|
18 | | -//! Signature module contains foundational types that are used to represent signatures, types, |
19 | | -//! and return types of functions in DataFusion. |
| 18 | +//! Function signatures: [`Volatility`], [`Signature`] and [`TypeSignature`] |
20 | 19 |
|
21 | 20 | use std::fmt::Display; |
22 | 21 | use std::hash::Hash; |
@@ -44,42 +43,90 @@ pub const TIMEZONE_WILDCARD: &str = "+TZ"; |
44 | 43 | /// valid length. It exists to avoid the need to enumerate all possible fixed size list lengths. |
45 | 44 | pub const FIXED_SIZE_LIST_WILDCARD: i32 = i32::MIN; |
46 | 45 |
|
47 | | -/// A function's volatility, which defines the functions eligibility for certain optimizations |
| 46 | +/// How a function's output changes with respect to a fixed input |
| 47 | +/// |
| 48 | +/// The volatility of a function determines eligibility for certain |
| 49 | +/// optimizations. You should always define your function to have the strictest |
| 50 | +/// possible volatility to maximize performance and avoid unexpected |
| 51 | +/// results. |
| 52 | +/// |
48 | 53 | #[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone, Copy, Hash)] |
49 | 54 | pub enum Volatility { |
50 | | - /// An immutable function will always return the same output when given the same |
51 | | - /// input. DataFusion will attempt to inline immutable functions during planning. |
| 55 | + /// Always returns the same output when given the same input. |
| 56 | + /// |
| 57 | + /// DataFusion will inline immutable functions during planning. |
| 58 | + /// |
| 59 | + /// For example, the `abs` function is immutable, so `abs(-1)` will be |
| 60 | + /// evaluated and replaced with `1` during planning rather than invoking |
| 61 | + /// the function at runtime. |
52 | 62 | Immutable, |
53 | | - /// A stable function may return different values given the same input across different |
54 | | - /// queries but must return the same value for a given input within a query. An example of |
55 | | - /// this is the `Now` function. DataFusion will attempt to inline `Stable` functions |
56 | | - /// during planning, when possible. |
57 | | - /// For query `select col1, now() from t1`, it might take a while to execute but |
58 | | - /// `now()` column will be the same for each output row, which is evaluated |
59 | | - /// during planning. |
| 63 | + /// May return different values given the same input across different |
| 64 | + /// queries but must return the same value for a given input within a query. |
| 65 | + /// |
| 66 | + /// For example, the `now()` function is stable, because the query `select |
| 67 | + /// col1, now() from t1`, will return different results each time it is run, |
| 68 | + /// but within the same query, the output of the `now()` function has the |
| 69 | + /// same value for each output row. |
| 70 | + /// |
| 71 | + /// DataFusion will inline `Stable` functions when possible. For example, |
| 72 | + /// `Stable` functions are inlined when planning a query for execution, but |
| 73 | + /// not in View definitions or prepared statements. |
60 | 74 | Stable, |
61 | | - /// A volatile function may change the return value from evaluation to evaluation. |
62 | | - /// Multiple invocations of a volatile function may return different results when used in the |
63 | | - /// same query. An example of this is the random() function. DataFusion |
64 | | - /// can not evaluate such functions during planning. |
65 | | - /// In the query `select col1, random() from t1`, `random()` function will be evaluated |
66 | | - /// for each output row, resulting in a unique random value for each row. |
| 75 | + /// May change the return value from evaluation to evaluation. |
| 76 | + /// |
| 77 | + /// Multiple invocations of a volatile function may return different results |
| 78 | + /// when used in the same query on different rows. An example of this is the |
| 79 | + /// `random()` function. |
| 80 | + /// |
| 81 | + /// DataFusion can not evaluate such functions during planning or push these |
| 82 | + /// predicates into scans. In the query `select col1, random() from t1`, |
| 83 | + /// `random()` function will be evaluated for each output row, resulting in |
| 84 | + /// a unique random value for each row. |
67 | 85 | Volatile, |
68 | 86 | } |
69 | 87 |
|
70 | | -/// A function's type signature defines the types of arguments the function supports. |
| 88 | +/// The types of arguments for which a function has implementations. |
| 89 | +/// |
| 90 | +/// [`TypeSignature`] **DOES NOT** define the types that a user query could call the |
| 91 | +/// function with. DataFusion will automatically coerce (cast) argument types to |
| 92 | +/// one of the supported function signatures, if possible. |
71 | 93 | /// |
72 | | -/// Functions typically support only a few different types of arguments compared to the |
73 | | -/// different datatypes in Arrow. To make functions easy to use, when possible DataFusion |
74 | | -/// automatically coerces (add casts to) function arguments so they match the type signature. |
| 94 | +/// # Overview |
| 95 | +/// Functions typically provide implementations for a small number of different |
| 96 | +/// argument [`DataType`]s, rather than all possible combinations. If a user |
| 97 | +/// calls a function with arguments that do not match any of the declared types, |
| 98 | +/// DataFusion will attempt to automatically coerce (add casts to) function |
| 99 | +/// arguments so they match the [`TypeSignature`]. See the [`type_coercion`] module |
| 100 | +/// for more details |
75 | 101 | /// |
76 | | -/// For example, a function like `cos` may only be implemented for `Float64` arguments. To support a query |
77 | | -/// that calls `cos` with a different argument type, such as `cos(int_column)`, type coercion automatically |
78 | | -/// adds a cast such as `cos(CAST int_column AS DOUBLE)` during planning. |
| 102 | +/// # Example: Numeric Functions |
| 103 | +/// For example, a function like `cos` may only provide an implementation for |
| 104 | +/// [`DataType::Float64`]. When users call `cos` with a different argument type, |
| 105 | +/// such as `cos(int_column)`, and type coercion automatically adds a cast such |
| 106 | +/// as `cos(CAST int_column AS DOUBLE)` during planning. |
79 | 107 | /// |
80 | | -/// # Data Types |
| 108 | +/// [`type_coercion`]: crate::type_coercion |
81 | 109 | /// |
82 | | -/// ## Timestamps |
| 110 | +/// ## Example: Strings |
| 111 | +/// |
| 112 | +/// There are several different string types in Arrow, such as |
| 113 | +/// [`DataType::Utf8`], [`DataType::LargeUtf8`], and [`DataType::Utf8View`]. |
| 114 | +/// |
| 115 | +/// Some functions may have specialized implementations for these types, while others |
| 116 | +/// may be able to handle only one of them. For example, a function that |
| 117 | +/// only works with [`DataType::Utf8View`] would have the following signature: |
| 118 | +/// |
| 119 | +/// ``` |
| 120 | +/// # use arrow::datatypes::DataType; |
| 121 | +/// # use datafusion_expr_common::signature::{TypeSignature}; |
| 122 | +/// // Declares the function must be invoked with a single argument of type `Utf8View`. |
| 123 | +/// // if a user calls the function with `Utf8` or `LargeUtf8`, DataFusion will |
| 124 | +/// // automatically add a cast to `Utf8View` during planning. |
| 125 | +/// let type_signature = TypeSignature::Exact(vec![DataType::Utf8View]); |
| 126 | +/// |
| 127 | +/// ``` |
| 128 | +/// |
| 129 | +/// # Example: Timestamps |
83 | 130 | /// |
84 | 131 | /// Types to match are represented using Arrow's [`DataType`]. [`DataType::Timestamp`] has an optional variable |
85 | 132 | /// timezone specification. To specify a function can handle a timestamp with *ANY* timezone, use |
@@ -130,8 +177,9 @@ pub enum TypeSignature { |
130 | 177 | Exact(Vec<DataType>), |
131 | 178 | /// One or more arguments belonging to the [`TypeSignatureClass`], in order. |
132 | 179 | /// |
133 | | - /// [`Coercion`] contains not only the desired type but also the allowed casts. |
134 | | - /// For example, if you expect a function has string type, but you also allow it to be casted from binary type. |
| 180 | + /// [`Coercion`] contains not only the desired type but also the allowed |
| 181 | + /// casts. For example, if you expect a function has string type, but you |
| 182 | + /// also allow it to be casted from binary type. |
135 | 183 | /// |
136 | 184 | /// For functions that take no arguments (e.g. `random()`) see [`TypeSignature::Nullary`]. |
137 | 185 | Coercible(Vec<Coercion>), |
@@ -206,7 +254,7 @@ impl TypeSignature { |
206 | 254 | /// just listing specific DataTypes. For example, TypeSignatureClass::Timestamp matches any timestamp |
207 | 255 | /// type regardless of timezone or precision. |
208 | 256 | /// |
209 | | -/// Used primarily with TypeSignature::Coercible to define function signatures that can accept |
| 257 | +/// Used primarily with [`TypeSignature::Coercible`] to define function signatures that can accept |
210 | 258 | /// arguments that can be coerced to a particular class of types. |
211 | 259 | #[derive(Debug, Clone, Eq, PartialEq, PartialOrd, Hash)] |
212 | 260 | pub enum TypeSignatureClass { |
@@ -736,10 +784,12 @@ impl Hash for ImplicitCoercion { |
736 | 784 | } |
737 | 785 | } |
738 | 786 |
|
739 | | -/// Defines the supported argument types ([`TypeSignature`]) and [`Volatility`] for a function. |
| 787 | +/// Provides information necessary for calling a function. |
| 788 | +/// |
| 789 | +/// - [`TypeSignature`] defines the argument types that a function has implementations |
| 790 | +/// for. |
740 | 791 | /// |
741 | | -/// DataFusion will automatically coerce (cast) argument types to one of the supported |
742 | | -/// function signatures, if possible. |
| 792 | +/// - [`Volatility`] defines how the output of the function changes with the input. |
743 | 793 | #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Hash)] |
744 | 794 | pub struct Signature { |
745 | 795 | /// The data types that the function accepts. See [TypeSignature] for more information. |
|
0 commit comments