Skip to content

experimental.config.utils.schema: schema-aware hierarchical data processing #4279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 9 tasks
Tracked by #4505
Totktonada opened this issue Jun 7, 2024 · 2 comments · Fixed by #4469
Closed
2 of 9 tasks
Tracked by #4505

experimental.config.utils.schema: schema-aware hierarchical data processing #4279

Totktonada opened this issue Jun 7, 2024 · 2 comments · Fixed by #4469
Assignees
Labels
3.2 reference [location] Tarantool manual, Reference part server [area] Task relates to Tarantool's server (core) functionality

Comments

@Totktonada
Copy link
Member

Totktonada commented Jun 7, 2024

Main related dev. issue:

Other related dev. issues/commits:

Product: Tarantool
Since: 3.2
Audience/target: module and application developers
Root document:

SME: @ Totktonada

Details

Introduction

Tarantool offers a declarative configuration way since 3.0.0. The configuration has strictly defined schema, but there are sections that accept arbitrary values. These are app.cfg.* and roles_cfg.*.

An author of an application or a role defines how these values are validated and processed.

Tarantool now offers a tool for validating and processing of data using a declarative schema: the experimental.config.utils.schema module.

As the name of the module says, it is in the experimental status. The API may be changed in a backward incompatible way. However, tarantool developers are conservative regarding such changes in experimental modules.

Application and role developers are encouraged to use the schema module and provide a feedback.

Brief API description

Click to expand...

(This API description mostly repeats the one from tarantool/tarantool#8725 with some updates and enhancements. It is copied here primarily to ease reading and to have all the related documentation in one place.)

Schema node constructors:

  • Define a scalar.

    schema.scalar({
        type = 'string', -- number, integer, boolean, any
        <..annotations..>
    })
  • Define a record (an object with certain field names and field value types).

    schema.record({
        foo = <schema node>,
    }, {
        <..annotations..>
    })
  • Define a map (an object with arbitrary key names, but strict about keys and values types).

    schema.map({
        key = <schema node>,
        value = <schema node>,
        <..annotations..>
    })
  • Define an array.

    schema.array({
      items = <..schema node..>,
      <..annotations..>

Two supplementary schema node constructors are defined: schema.enum() and schema.set().

  • -- Accepts 'foo', 'bar' or 'baz'.
    schema.enum({
      'foo',
      'bar',
      'baz',
    }, {
        <..annotations..>
    })
  • -- Accepts ['foo'], ['foo', 'bar'] and so on.
    schema.set({
      'foo',
      'bar',
      'baz',
    }, {
        <..annotations..>
    })

The schema object constructor wraps a schema node and adds schema name and user-provided methods.

schema.new('myschema', <schema node>[, {methods = <...>}])

The schema object has the following methods:

  • Traversing the schema.

    <schema object>:pairs() -> luafun iterator
  • Validate data against the schema.

    <schema object>:validate(data)
  • Filter data based on the schema annotations.

    <schema object>:filter(data, f) -> luafun iterator
  • Map data based on the schema annotations.

    <schema object>:map(data, f, f_ctx) -> new_data
  • Apply default values.

    <schema object>:apply_default(data) -> new_data
  • Get/set a nested value.

    <schema object>:get(data, path) -> requested field
    <schema object>:set(data, path, value)
  • Merge two values.

    <schema object>:merge(a, b) -> new_data

The following annotations are interpreted by the module itself:

  • type (string)
  • validate (function)
  • allowed_values (table)
  • default (any)
  • apply_default_if (function)

Other annotations are ignored by the module and may be used for arbitrary purposes.

The module supports parsing of data from environment variables with conversions to appropriate data types (for example, converting MAYVAR=3301 to a Lua number), parsing of comma separated array items and key-value pairs (for example, TT_REPLICATION_PEERS=localhost:3301,localhost:3302,localhost:3303) and decoding JSON values.

schema.fromenv('MYVAR', os.getenv('MYVAR'), <schema>)

Detailed API description

Type system

Click to expand...

There are scalar and composite types.

A scalar type accepts a primitive value (except a special scalar type any). The scalar types are listed in the next section.

A composite type accepts a collection of values. There the following composite types.

  • record describes an object with certain fields of certain types.

    It is an analogue of struct in C, record in Avro Schema and message in Protocol Buffers.

    A record has the following constraints:

    • The only accepted Lua type is table.
    • All the keys of the table are strings.
    • Only listed keys are accepted.
    • The fields have certain types (different ones in the general case).

    All the fields are optional if there are no additional constraints.

  • map describes an object with arbitrary named keys (of the same type) with values of the same type.

    It is an analogue of unordered_map in C++, map in Avro Schema, map in Protocol Buffers.

    A map has the following constraints:

    • The only accepted Lua type is table.
    • All keys of the table have the same certain type.
    • All values of the table have the same certain type.
  • array describes an ordered collection of elements with the same type.

    It is an analogue of an array in C, array in Avro Schema, a repeated field in Protocol Buffers.

    An array has the following constraints:

    • The only accepted Lua type is table.
    • All the keys of the table are numeric, without a fractional part.
    • The lower key is 1.
    • The higher key is equal to the number of items.
    • All items of the table have the same certain type.

Scalars

Click to expand...
Scalar type Lua type Constraints
string string
number number
integer number x - math.floor(x) == 0
boolean boolean
string, number string or number
number, string string or number
any <..arbitrary..>

any accepts an arbitrary Lua type, including table. A scalar of the any type may be used to declare an arbitrary value that doesn't need any validation.

Schema node constructors: scalar, record, map, array

Click to expand...

The module provides the following functions to create schema node objects.

-- Create a scalar.
--
-- Returns a table of the following shape.
--
-- {
--     type = <...>,
--     <..annnotations..>
-- }
schema.scalar({
    type = one-of(
        'string',
        'number',
        'integer',
        'boolean',
        'any',
        'string, number',
        'number, string',
    ),
    <..annotations..>
})

-- Example.
schema.scalar({
    type = 'string',
    description = 'lorem ipsum',
})
-- =>
{
    type = 'string',
    description = 'lorem ipsum',
}
-- Create a record.
--
-- Returns a table of the following shape.
--
-- {
--     type = 'record',
--     fields = <...>,
--     <..annotations..>
-- }
schema.record({
    [<field name>] = <schema node>,
    <...>,
}, {
    <..annotations..>
})

-- Example.
schema.record({
    foo = schema.scalar({type = 'string'}),
    bar = schema.scalar({type = 'integer'}),
}, {
    description = 'lorem ipsum',
})
-- =>
{
    type = 'record',
    fields = {
        foo = {type = 'string'},
        bar = {type = 'integer'},
    },
    description = 'lorem ipsum',
}
-- Create a map.
--
-- Returns a table of the following shape.
--
-- {
--     type = 'map',
--     key = <...>,
--     value = <...>,
--     <..annotations..>
-- }
schema.map({
    key = <schema node>,
    value = <schema node>,
    <..annotations..>
})

-- Example.
schema.map({
    key = schema.scalar({type = 'string'}),
    value = schema.scalar({type = 'string'}),
    description = 'lorem ipsum',
})
-- =>
{
    type = 'map',
    key = {type = 'string'},
    value = {type = 'string'},
    description = 'lorem ipsum',
}
-- Create an array.
--
-- Returns a table of the following shape.
--
-- {
--     type = 'array',
--     items = <...>,
--     <..annotations..>
-- }
schema.array({
    items = <schema node>,
    <..annotations..>
})

-- Example.
schema.array({
    items = schema.scalar({type = 'string'}),
    description = 'lorem ipsum',
})
-- =>
{
    type = 'array',
    items = {type = 'string'},
    description = 'lorem ipsum',
}

A general structure of a schema node is the following.

{
    -- One of scalar types, 'record', 'map' or 'array'.
    type = <string>,
    -- For a record.
    fields = <table>,
    -- For a map.
    key = <table>,
    value = <table>,
    -- For an array.
    items = <table>,
    -- Arbitrary user specified annotations.
    <..annotations..>
}

Derived schema node type constructors: enum, set

Click to expand...

schema.enum accepts a string from the given set of allowed string values. It uses an allowed_values annotation, see <schema object>:validate() for details about this annotation.

-- Shortcut for a string scalar with the given allowed values.
--
-- Returns a table of the following shape.
--
-- {
--     type = 'string',
--     -- As given in the first argument.
--     allowed_values = <...>,
--     -- As given in the second argument.
--     <..annotations..>
-- }
schema.enum(allowed_values, annotations)

-- Example.
schema.enum({'foo', 'bar'}, {description = 'lorem ipsum'})
-- =>
{
    type = 'string',
    allowed_values = {'foo', 'bar'},
    description = 'lorem ipsum',
}

schema.set accepts an array of unique string values, where each of the strings is from the given set of allowed string values.

It uses allowed_values and validate annotations, see <schema object>:validate() for details about these annotations.

-- Shortcut for array of unique string values from the given list
-- of allowed values.
--
-- Returns a table of the following shape.
--
-- {
--     type = 'array',
--     items = {
--         type = 'string',
--         -- As given in the first argument.
--         allowed_values = <...>,
--     },
--     -- This validate function checks that the incoming
--     -- array has no repeated values.
--     validate = <function>,
--     -- As given in the second argument.
--     <..annotations..>
-- }
schema.set(allowed_values, annotations)

-- Example.
schema.set({'foo', 'bar'}, {description = 'lorem ipsum'})
-- =>
{
    type = 'array',
    items = {
        type = 'string',
        allowed_values = {'foo', 'bar'},
    },
    validate = <function>,
    description = 'lorem ipsum',
}

Schema object constructor: new

Click to expand...

A schema node can be transformed to a schema object. Unlike a schema node, the schema object has a name, has methods described below and may have user-provided methods.

-- Create a schema object.
--
-- name: string, schema object name
-- schema: table, schema node
-- opts:
--   methods: table, user-provided methods
--
-- Returns a table of the following shape.
--
-- {
--     name = <string>,
--     schema = <schema node>,
--     methods = <table>,
-- }
--
-- It has a metatable with methods described in this document.
schema.new(name, schema, opts)

-- Example.
schema.new('foo', schema.scalar({type = 'string'}))
-- =>
-- {
--     name = 'foo',
--     schema = {
--         type = 'string',
--         computed = {
--             annotations = {},
--         },
--     }
-- }

The given schema node is recursively copied and computed fields are added to each schema node into the computed field. See the 'Computed annotations' sections for details.

Indexing a schema object is performed as follows:

  • Look up a user-provided method.
  • Look up a module provided method.
  • Look up a field in the schema object table.
  • Return nil.

An example of a user-provided method:

local point_schema = schema.new('point', schema.record({
    x = schema.scalar({type = 'number'}),
    y = schema.scalar({type = 'number'}),
}), {
    methods = {
        distance = function(_self, a, b)
            return math.sqrt((a.x - b.x)^2 + (a.y - b.y)^2)
        end,
    },
})

point_schema:distance({x = 0, y = 0}, {x = 3, y = 4})
-- => 5

<schema object>:validate()

Click to expand...

Validate the given data against the given schema.

-- Raise an error if the given data doesn't adhere the given
-- schema.
<schema object>:validate(data)

The method performs a recursive type checking. See the 'Type system' and 'Scalars' sections for details how exactly it is performed for each of the given types.

Aside of the type checking the method performs validation based on a user-provided annotations. It is described below in this section.

Nuances:

  • schema.new('<...>', schema.scalar(<...>)) doesn't accept nil and box.NULL. However,
  • All fields in a record are optional: they accept nil and box.NULL.
  • There is no record/map/array autoguessing: the given data is validated against the given schema node by rules defined by this schema node. Also, mt.__serialize marks in the data are not involved anyhow.
  • An array shouldn't have any holes (nil values in a middle).

Annotations taken into accounts:

  • allowed_values (table) -- whitelist of values
  • validate (function) -- schema node specific validator

The validate annotation is a user-provided function that accepts the following arguments.

{
    <...>
    validate = function(data, w)
        -- w.schema -- current schema node
        -- w.path -- path to the node
        -- w.error -- function that prepends a caller provided
        --            error message with context information;
        --            use it for nice error messages
    end,
}

The user-provided validate function is called after all the type validation is done, including nested nodes, and, also, after the allowed_values check.

The contract is that the validate annotation raises an error to fail the check. There is a convenient w.error() helper that raises an error formatted with schema name and path to the current schema node.

Example:

#!/usr/bin/env tarantool

local schema = require('experimental.config.utils.schema')

local function validate_email(email, w)
    if email:find('@') == nil then
        w.error('A email must contain @ symbol, got %q', email)
    end
end

local personal_info_schema = schema.new('personal_info', schema.record({
    email = schema.scalar({
        type = 'string',
        validate = validate_email,
    }),
}))

personal_info_schema:validate({email = 'foo'})
-- error: [personal_info] email: A email must contain @ symbol, got "foo"

The user-provided validate function may use computed annotations. See an example in the 'Computed annotations' section below.

Beware: The user-provided validate function is not called on a record's field that has nil/box.NULL value. If an error should be raised on a missing field, add the validate annotation on the outer level (on a record, not the field itself).

<schema object>:get()

Click to expand...

Get nested data that is pointed by the given path.

<schema object>:get(data, path)

Important: the data is assumed as already validated against the given schema.

The indexing is performed in the optional chaining manner ('foo.bar' works like foo?.bar in TypeScript).

The method checks the path against the schema: it doesn't allow to use a non-existing field or index a scalar value.

The path is either array-like table or a string in the dot notation.

local myschema = schema.new(<...>)

local data = {foo = {bar = 'x'}}
myschema:get(data, 'foo.bar') -- => 'x'
myschema:get(data, {'foo', 'bar'}) -- => 'x'

local data = {}
myschema:get(data, 'foo.bar') -- => nil

Nuances

  • nil, '', {} in the path argument means the root node: IOW, returns the data argument as is.
  • Array indexing is not supported yet.
  • A scalar of the any type can be indexed if it is a table or nil/box.NULL. In this case a tail of the path that is inside the any type is not checked against a schema. Indexing nil/box.NULL always returns nil.

<schema object>:set()

Click to expand...

Set the given rhs value at the given path in the data.

<schema object>:set(data, path, rhs)

Important: data is assumed as already validated against the given schema, but rhs is validated by the method before the assignment.

The method checks the path against the schema: it doesn't allow to use a non-existing field or index a scalar value.

The path is either array-like table or a string in the dot notation.

local myschema = schema.new(<...>)

local data = {}
myschema:set(data, 'foo.bar', 42)
print(data.foo.bar) -- 42

local data = {}
myschema:set(data, {'foo', 'bar'}, 42)
print(data.foo.bar) -- 42

Nuances

  • A root node (pointed by the empty path) can't be set using this method.
  • Array indexing is not supported yet.
  • A scalar of the 'any' type can't be indexed, even when it is a table. It is OK to set the whole value of the 'any' type. (This restriction will be relaxed in 3.2.0 in the scope of config/schema: :set() can't assign/delete a field inside 'any' type tarantool#10204.)
  • Assignment of a non-nil rhs value creates intermediate tables over the given path instead of nil or box.NULL values.

Field deletion

If rhs is nil, it means deletion of the pointed field.

How it works (in examples):

Intermediate tables are not created:

local myschema = <...>
local data = {}
myschema:set(data, 'foo.bar', nil)
-- data is {}, not {foo = {}}

Existing tables on the path are not removed:

local myschema = <...>
local data = {
    instances = {
        foo = {x = 1},
        bar = {},
    },
}
myschema:set(data, 'instances.foo.x', nil)
-- data is {
--     instances = {
--         foo = {},
--         bar = {},
--     },
-- }

<schema object>:filter()

Click to expand...

Filter data based on the schema annotations.

<schema object>:filter(data, f) -> luafun iterator

Important: the data is assumed as already validated against the given schema. (A fast type check is performed on composite types, but it is not recommended to lean on it.)

The user-provided filter function f receives the following table as the argument:

w = {
    path = <array-like table>,
    schema = <schema node>,
    data = <data at the given path>,
}

The filter function returns a boolean value that is interpreted as 'accepted' or 'not accepted'.

The user-provided function f is called for each schema node, including ones that have box.NULL value (but not nil). A node of a composite type (record/map/array) is not traversed down if it has nil or box.NULL value.

The :filter() function returns a luafun iterator by all w values accepted by the f function.

A composite node that is not accepted still traversed down.

Examples:

-- Do something for each piece of data that is marked by the
-- given annotation.
s:filter(function(w)
    return w.schema.my_annotation ~= nil
end):each(function(w)
    do_something(w.data)
end)
-- Group data by a value of an annotation.
local group_by_my_annotation = s:filter(function(w)
    return w.schema.my_annotation ~= nil
end):map(function(w)
    return w.schema.my_annotation, w.data
end):tomap()

Nuances

  • box.NULL is assumed as an existing value, so the user-provided filter function f is called for it. However, it is not called for nil values. See details below.
  • While it is technically possible to pass information about a field name for record/map field values and about an item index for an array item value, it is not implemented for simplicity.
  • w.path for a map key and a map value are the same. It seems, we should introduce some syntax to point a key in a map, but it is not implemented yet.

nil/box.NULL nuances explanation

Let's assume that a record defines three scalar fields: 'foo', 'bar' and 'baz'. Let's name a schema object that wraps the record as s.

  • s:filter(nil, f) calls f only for the record itself.
  • s:filter(box.NULL, f) works in the same way.
  • s:filter({foo = box.NULL, bar = nil}, f) calls f two times: for the record and for the 'foo' field.

This behavior is needed to provide ability to handle box.NULL values in the data somehow. It reflects the pairs() behavior on a usual table, so it looks quite natural.

<schema object>:map()

Click to expand...

Transform data by the given function.

<schema object>:map(data, f, f_ctx) -> new_data

Leave the shape of the data unchanged.

Important: the data is assumed as already validated against the given schema. (A fast type check is performed on composite types, but it is not recommended to lean on it.)

The user-provided transformation function receives the following three arguments in the given order:

  • data -- value at the given path
  • w -- walkthrough node, described below
  • ctx -- user-provided context for the transformation function

The walkthrough node w has the following fields:

  • w.schema -- schema node at the given path
  • w.path -- path to the schema node
  • w.error -- function that prepends a caller provided error message with context information; use it for nice error messages

An example of the mapping function:

local function f(data, w, ctx)
    if w.schema.type == 'string' and data ~= nil then
        return data:gsub('{{ *foo *}}', ctx.foo)
    end
    return data
end

The :map() method is recursive with certain rules:

  • All record fields are traversed unconditionally, including ones with nil/box.NULL values. Even if the record itself is nil/box.NULL, its fields are traversed down (assuming their values as nil).

    It is important when the original data should be extended using some information from the schema: say, default values.

  • It is not the case for a map and an array: nil/box.NULL fields and items are preserved as is, they're not traversed down. If the map/the array itself is nil/box.NULL, it is preserved as well.

    A map has no list of fields in the schema, so it is not possible to traverse it down. Similarly, an array has no items count in the schema.

The method attempts to preserve the original shape of values of a composite type:

  • nil/box.NULL record is traversed down, but if all the new field values are nil, the return value is the original one (nil/box.NULL), not an empty table.
  • nil/box.NULL values for a map and an array are preserved.

Nuances

  • The user-provided transformation function is called only for scalars.
  • nil/box.NULL handling for composite types. Described above.
  • w.path for a map key and a map value are the same. It seems, we should introduce some syntax to point a key in a map, but it is not implemented yet.

<schema object>:apply_default()

Click to expand...

Apply default values from the schema.

<schema object>:apply_default(data) -> new_data

Important: the data is assumed as already validated against the given schema. (A fast type check is performed on composite types, but it is not recommended to lean on it.)

Annotations taken into accounts:

  • default -- the value to be placed instead of a missed one

  • apply_default_if (function) -- whether to apply the default

    {
        default = <...>,
        apply_default_if = function(data, w)
           -- w.schema -- current schema node
           -- w.path -- path to the node
           -- w.error -- for nice error messages
        end,
    }

    If there is no apply_default_if annotation, the default is assumed as to be applied.

Nuances:

  • Defaults are taken into account only for scalars.
  • The method works for static defaults, but it doesn't work for dynamic default values or for defaults that depend on the data somehow. Use :map() for such scenarios.

<schema object>:merge()

Click to expand...

Merge two hierarical values (prefer the latter).

<schema object>:merge(a, b) -> new_data

Important: the data is assumed as already validated against the given schema. (A fast type check is performed on composite types, but it is not recommended to lean on it.)

box.NULL is preferred over nil, any X where X ~= nil is preferred over nil/box.NULL.

Records and maps are deeply merged. Scalars and arrays are all-or-nothing: the right hand one is chosen if both are not nil/box.NULL.

The formal rules are below.

Let's define the merge result for nil and box.NULL values:

  1. merge(nil, nil) -> nil
  2. merge(nil, box.NULL) -> box.NULL
  3. merge(box.NULL, nil) -> box.NULL
  4. merge(box.NULL, box.NULL) -> box.NULL

Let's define X as a value that is not nil and is not box.NULL.

  1. merge(X, nil) -> X
  2. merge(X, box.NULL) -> X
  3. merge(nil, X) -> X
  4. merge(box.NULL, X) -> X

If the above conditions are not meet, the following type specific rules are in effect.

  1. merge(<scalar A>, <scalar B>) -> <scalar B>
  2. merge(<array A>, <array B>) -> <array B>
  3. merge(<record A>, <record B>) -> deep-merge(A, B)
  4. merge(<map A>, <map B>) -> deep-merge(A, B)

For each key K in A and each key K in B: deep-merge(A, B)[K] is merge(A[K], B[K]).

Nuances

  • A scalar of the any type is NOT deeply merged even if it is a table, however it may be useful. We'll consider adding support of such a behaviour in some backward-compatible way in a future.

  • Arrays are not concatenated (the right hand one wins), however it may be useful too. The original idea is that we don't know, whether ordinals in the array are important, but the practice shows that arrays in configuration data are always just sets with an order -- ordinals do not matter.

    Also, the given behavior (the right hand one wins) allows to discard items from the left hand side array that may be useful too. A concatenation behavior wouldn't allow it.

    Anyway, the concatenation behavior may be considered to implement in some backward-compatible way in a future.

<schema object>:pairs()

Click to expand...

Walk over the schema and return scalar, array and map schema nodes (all nodes except records).

<schema object>:pairs() -> luafun iterator

Usage example:

for _, w in schema:pairs() do
    local path = w.path
    local schema = w.schema
    <...>
end

Parse an environment variable

Click to expand...

Parse data from an environment variable as a value of the given type.

schema.fromenv(env_var_name, raw_value, schema) -> data

Important: the result is not necessarily valid against the given schema node. It should be validated using the <schema object>:validate() method before further processing.

env_var_name is used for error messages.

raw_value is to be received using os.getenv() or os.environ().

schema is a schema node, not a schema object.

How the raw value is parsed depends on the schema node type.

Scalars

  • string -- return the raw value as is
  • number -- attempt to parse a number using tonumber(), fail if unsuccessful
  • string, number (and its alias number, string) -- attempt to parse a number using tonumber(); if it is unsuccessful return the raw value as is
  • integer -- attempt to parse an integral number using tonumber64(), fail if unsuccessful
  • boolean -- accept 0/1 and true/false (case insensitively), fail on other values
  • any -- parse as JSON, fail if the decoding fails

Record

Not supported.

It is technically possible to implement parsing of records similarly how it is done for maps, but it is not implemented yet.

Map

Accepts two formats:

  1. JSON format (if data starts from {).
  2. Simple foo=bar,baz=fiz object format (otherwise).

The simple format is applicable for a map with string keys and scalar values of all types except any.

In the simple format the field values are parsed according to its type in the schema (see the rules for scalars above).

Array

Accepts two formats:

  1. JSON format (if data starts from [).
  2. Simple foo,bar,baz array format (otherwise).

The simple format is applicable for an array with scalar item values of all types except any.

In the simple format the item values are parsed according to its type in the schema (see the rules for scalars above).

Computed annotations

Click to expand...

The idea is to have information from the ancestor nodes accessible from the given schema node.

Example:

local schema = require('experimental.config.utils.schema')

local abilities = schema.record({
    walking = schema.scalar({type = 'boolean'}),
    swimming = schema.scalar({type = 'boolean'}),
    flying = schema.scalar({type = 'boolean'}),
}, {
    validate = function(data, w)
        local kind = w.schema.computed.annotations.kind
        if kind == 'penguin' and data.flying then
            w.error('A penguin is unable to fly')
        end
    end,
})

local duck_schema = schema.new('duck', schema.record({
    name = schema.scalar({type = 'string'}),
    abilities = abilities,
}, {
    kind = 'duck',
}))

local penguin_schema = schema.new('penguin', schema.record({
    name = schema.scalar({type = 'string'}),
    abilities = abilities,
}, {
    kind = 'penguin',
}))

local bird_data = {
    name = 'Gurr',
    abilities = {
        walking = true,
        swimming = true,
        flying = true,
    },
}

duck_schema:validate(bird_data)
penguin_schema:validate(bird_data)
-- error: [penguin] abilities: A penguin is unable to fly

The example demonstrates how the kind annotation from the outermost schema node is used in a validate function of a nested schema node.

schema.new call prepares each schema node in such a way that the computed.annotations field contains all the annotations merged from the root schema node down to the given one. If the same annotation is present in an ancestor node and in an descendant node, the latter is preferred.

There are two classes of schema node table fields that are not considered as annotations to merge into the computed.annotations field:

  • keys that are part of the schema node tree structure: type, fields, key, value, items
  • known annotations that barely has any sense in context of descendant schema nodes: allowed_values, validate, default, apply_default_if

Definition of done

This is relatively large topic and it seems logical to split the 'document everything' goal to some subtasks.

  • Schema node constructors are documented
  • Schema constructor is documented
  • Schema object methods are documented
  • schema.fromenv is documented
  • A list of annotation that influence module's functions/methods behavior is documented

Planning checklist

  • Pick product label: server.
  • Pick type label: reference.
  • Estimate complexity in storypoints in the title
  • Add to Documentation board → Backlog
@Totktonada Totktonada added reference [location] Tarantool manual, Reference part server [area] Task relates to Tarantool's server (core) functionality labels Jun 7, 2024
@sergos
Copy link

sergos commented Jun 14, 2024

Please, transfer the ticket to the doc team after the draft finalization.

Totktonada added a commit to Totktonada/tarantool that referenced this issue Jun 26, 2024
The module is renamed from `internal.config.utils.schema` to
`experimental.config.utils.schema` without changes.

It is useful for validation of configuration data in roles and
applications.

Also, it provides a couple of methods that aim to simplify usual tasks
around processing of hierarchical configuration data. For example,

* get/set a nested value
* apply defaults from the schema
* filter data based on annotations from the schema
* transform a hierarchical data using a function
* merge two hierarchical values
* parse environment variable according to its type in the schema

See tarantool/doc#4279 for an in-depth
description.

Fixes tarantool#10117

NO_DOC=tarantool/doc#4279
Totktonada added a commit to Totktonada/tarantool that referenced this issue Jun 26, 2024
The module is renamed from `internal.config.utils.schema` to
`experimental.config.utils.schema` without changes.

It is useful for validation of configuration data in roles and
applications.

Also, it provides a couple of methods that aim to simplify usual tasks
around processing of hierarchical configuration data. For example,

* get/set a nested value
* apply defaults from the schema
* filter data based on annotations from the schema
* transform a hierarchical data using a function
* merge two hierarchical values
* parse environment variable according to its type in the schema

See tarantool/doc#4279 for an in-depth
description.

Fixes tarantool#10117

NO_DOC=tarantool/doc#4279
Totktonada added a commit to Totktonada/tarantool that referenced this issue Jul 3, 2024
The module is renamed from `internal.config.utils.schema` to
`experimental.config.utils.schema` without changes.

It is useful for validation of configuration data in roles and
applications.

Also, it provides a couple of methods that aim to simplify usual tasks
around processing of hierarchical configuration data. For example,

* get/set a nested value
* apply defaults from the schema
* filter data based on annotations from the schema
* transform a hierarchical data using a function
* merge two hierarchical values
* parse environment variable according to its type in the schema

See tarantool/doc#4279 for an in-depth
description.

Fixes tarantool#10117

NO_DOC=tarantool/doc#4279
Totktonada added a commit to tarantool/tarantool that referenced this issue Jul 3, 2024
The module is renamed from `internal.config.utils.schema` to
`experimental.config.utils.schema` without changes.

It is useful for validation of configuration data in roles and
applications.

Also, it provides a couple of methods that aim to simplify usual tasks
around processing of hierarchical configuration data. For example,

* get/set a nested value
* apply defaults from the schema
* filter data based on annotations from the schema
* transform a hierarchical data using a function
* merge two hierarchical values
* parse environment variable according to its type in the schema

See tarantool/doc#4279 for an in-depth
description.

Fixes #10117

NO_DOC=tarantool/doc#4279
Totktonada added a commit to Totktonada/tarantool that referenced this issue Jul 5, 2024
This commit implements the `<schema object>:set()` algorithm in a more
accurate way and it solves several drawbacks of the previous
implementation.

* It was impossible to set a field that is nested to a record or a map
  that has the box.NULL value (tarantool#10190).
* It was impossible to set a field to the box.NULL value (tarantool#10193).
* It was impossible to delete a field, now `nil` RHS value means the
  deletion (tarantool#10194).

Fixes tarantool#10190
Fixes tarantool#10193
Fixes tarantool#10194

NO_DOC=Included into tarantool/doc#4279
Totktonada added a commit to Totktonada/tarantool that referenced this issue Jul 5, 2024
`<schema object>:get()` now can access a field inside an `any` type if
it is a `table` or `nil`/`box.NULL`.

`config:get()` now can access fields inside `app.cfg.<key>` and
`roles_cfg.<key>`.

Fixes tarantool#10205

NO_DOC=The `<schema object>:get()` update is included into
       tarantool/doc#4279.
       The `config:get()` reference on the website doesn't mention the
       constraint, so it doesn't need an update.
Totktonada added a commit to Totktonada/tarantool that referenced this issue Jul 5, 2024
`<schema object>:get()` now can access a field inside an `any` type if
it is a `table` or `nil`/`box.NULL`.

`config:get()` now can access fields inside `app.cfg.<key>` and
`roles_cfg.<key>`.

Fixes tarantool#10205

NO_DOC=The `<schema object>:get()` update is included into
       tarantool/doc#4279.
       The `config:get()` reference on the website doesn't mention the
       constraint, so it doesn't need an update.
@Totktonada Totktonada changed the title [DRAFT] experimental.config.utils.schema: schema-aware hierarchical data processing experimental.config.utils.schema: schema-aware hierarchical data processing Jul 5, 2024
@Totktonada
Copy link
Member Author

Please, transfer the ticket to the doc team after the draft finalization.

@sergos I've removed the DRAFT mark from the title and removed the lango team project from the issue.

@tarantool/doc The documentation request is ready to work on.

@andreyaksenov andreyaksenov self-assigned this Jul 5, 2024
Totktonada added a commit to Totktonada/tarantool that referenced this issue Jul 5, 2024
`<schema object>:get()` now can access a field inside the `any` type if
it is a `table` or `nil`/`box.NULL`.

`config:get()` now can access fields inside `app.cfg.<key>` and
`roles_cfg.<key>`.

Fixes tarantool#10205

NO_DOC=The `<schema object>:get()` update is included into
       tarantool/doc#4279.
       The `config:get()` reference on the website doesn't mention the
       constraint, so it doesn't need an update.
Totktonada added a commit to Totktonada/tarantool that referenced this issue Jul 22, 2024
`<schema object>:get()` now can access a field inside the `any` type if
it is a `table` or `nil`/`box.NULL`.

`config:get()` now can access fields inside `app.cfg.<key>` and
`roles_cfg.<key>`.

Fixes tarantool#10205

NO_DOC=The `<schema object>:get()` update is included into
       tarantool/doc#4279.
       The `config:get()` reference on the website doesn't mention the
       constraint, so it doesn't need an update.
Totktonada added a commit to Totktonada/tarantool that referenced this issue Jul 22, 2024
This commit implements the `<schema object>:set()` algorithm in a more
accurate way and it solves several drawbacks of the previous
implementation.

* It was impossible to set a field that is nested to a record or a map
  that has the box.NULL value (tarantool#10190).
* It was impossible to set a field to the box.NULL value (tarantool#10193).
* It was impossible to delete a field, now `nil` RHS value means the
  deletion (tarantool#10194).

Fixes tarantool#10190
Fixes tarantool#10193
Fixes tarantool#10194

NO_DOC=Included into tarantool/doc#4279
Totktonada added a commit to tarantool/tarantool that referenced this issue Jul 22, 2024
`<schema object>:get()` now can access a field inside the `any` type if
it is a `table` or `nil`/`box.NULL`.

`config:get()` now can access fields inside `app.cfg.<key>` and
`roles_cfg.<key>`.

Fixes #10205

NO_DOC=The `<schema object>:get()` update is included into
       tarantool/doc#4279.
       The `config:get()` reference on the website doesn't mention the
       constraint, so it doesn't need an update.
Totktonada added a commit to Totktonada/tarantool that referenced this issue Jul 22, 2024
This commit implements the `<schema object>:set()` algorithm in a more
accurate way and it solves several drawbacks of the previous
implementation.

* It was impossible to set a field that is nested to a record or a map
  that has the box.NULL value (tarantool#10190).
* It was impossible to set a field to the box.NULL value (tarantool#10193).
* It was impossible to delete a field, now `nil` RHS value means the
  deletion (tarantool#10194).

Fixes tarantool#10190
Fixes tarantool#10193
Fixes tarantool#10194

NO_DOC=Included into tarantool/doc#4279
Totktonada added a commit to tarantool/tarantool that referenced this issue Jul 22, 2024
This commit implements the `<schema object>:set()` algorithm in a more
accurate way and it solves several drawbacks of the previous
implementation.

* It was impossible to set a field that is nested to a record or a map
  that has the box.NULL value (#10190).
* It was impossible to set a field to the box.NULL value (#10193).
* It was impossible to delete a field, now `nil` RHS value means the
  deletion (#10194).

Fixes #10190
Fixes #10193
Fixes #10194

NO_DOC=Included into tarantool/doc#4279
@andreyaksenov andreyaksenov linked a pull request Aug 26, 2024 that will close this issue
@p7nov p7nov self-assigned this Sep 16, 2024
p7nov added a commit that referenced this issue Oct 22, 2024
Resolves #4279.

Co-authored-by: Pavel Semyonov <[email protected]>
Co-authored-by: Elena Shebunyaeva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.2 reference [location] Tarantool manual, Reference part server [area] Task relates to Tarantool's server (core) functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants