- 
                Notifications
    You must be signed in to change notification settings 
- Fork 0
SQL: DDL
#Problem statement
SQLite utilizes sqlite_master TABLE to maintain a persistent list of tables, indices, views and triggers.
In our fork, SQL tables and indices are backed by Tarantool spaces and indices, respectively. Since Tarantool is already capable of tracking its own spaces and indices, sqlite_master looks largely redundant.
Since sqlite_master is to be removed, it's necessary to redesign the schema handling.
In the following sections, we provide information on Tarantool spaces and indices. Then we document the original SQLite schema design. We conclude with the new design proposal.
Tarantool spaces are listed in _space system space. Indices are listed in _index.
CREATE TABLE _space (
  id INT PRIMARY KEY,
  owner INT,
  name TEXT,
  engine TEXT,
  field_count INT,
  opts,
  format
)- 
id— spaces are identified by the numeric id;
- 
owner— owner id;
- 
name— space name;
- 
engine—vynilormemtx;
- 
field_count— number of fields in a tuple, enforced, 0 — variable length tuples;
- 
opts— ex:{"temporary": true};
- 
format— description of tuple fields, not enforced, ex:[{"name": "column1", "type": "number"}, ...].
CREATE TABLE _index (
  id INT,
  iid INT,
  name TEXT,
  type TEXT,
  opts,
  parts,
  PRIMARY KEY (id, iid)
)- 
id— owning space id;
- 
iid— numeric id, indices are identified by(id, iid)tuple;
- 
name— index name;
- 
type— ex:tree;
- 
opts— ex:{"unique": true};
- 
parts— index key description: column numbers and types, ex:[[0, "scalar"], [1, "scalar"]].
A space may lack indexes. In order for the space to be capable of storing data at least one index is required. An index with iid 0 is the primary index (the name doesn't matter). Creating secondary index before the primary is not permitted. Names and iid-s scope is the owning space. Unlike SQL, two indexes with the same name may coexist provided that they belong to different spaces.
Tarantool takes holistic approach to data management — _space and _index are regular spaces. They are persisted and replicated in exactly the same way regular spaces are. Every space has a list of triggers attached. Triggers fire when a tuple is inserted, removed or updated. When an entry in _space or _index is modified, triggers fire. Consider for instance the following tuple inserted in _space:
[1000, 1, "test", "memtx", 0, {}, []].
The system trigger creates a new space object and initialises it with the provided options. The object is added to internal hash table. Now it is possible to attempt storing and retrieving data from space 1000 (though these attempts will ultimately fail since the space lacks indices yet).
There's one burning issue: _space and _index don't support multi-statement transactions. It's only possible to modify these spaces in auto-commit mode (which is a severe limitation we hope to be removed ASAP :)
Schema is persisted in sqlite_master table. It's a regular table. It's stored at the fixed location in the database file making it possible to access without the schema.
CREATE TABLE sqlite_master(
  type TEXT,
  name TEXT,
  tbl_name TEXT,
  rootpage INTEGER,
  sql TEXT
)- 
type—table,index,viewortrigger;
- 
name— entity name;
- 
tbl_name— the name of the table the entity belongs to;
- 
rootpage— the location of the corresponding BTree in the database file (for tables and indices);
- 
sql— SQL statement that created the entity.
Note the sql field — instead of designing a serialisation format to persist in-memory schema objects, SQLite designer decided to simply store SQL statement instead and re-parse it at schema load time. Due to this decision, the parser has two modes, determined by db->init.busy field.
When initialisation is happening, the parser only builds Table-s and Index-es and similar objects. Otherwise, in addition to building objects, a VDBE program is generated to perform the requested operation. Typically, multiple routines are invoked in the course of parsing SQL statement. The db->init.busy checks are interspersed throughout the parser routines.
Consider the stages in processing CREATE TABLE xxx statement.
- The text is parsed and VDBE program is generated. Temporary Tableand 0+Indexobjects are created and used internally by the parser routines. Those temporary objects are ultimately destroyed when the parser completes it's job.
- The in-memory schema isn't updated yet, neither is sqlite_mastertable. Storage for the new table isn't allocated yet.
- VDBE program execution starts.
- BTree-s are allocated.
- 
sqlite_masteris updated.
- 
SELECT name, rootpage, sql FROM sqlite_master WHERE tbl_name=='xxx' and type != 'trigger'fetches back the rows describing the new table.
- The same loader that fetched the schema from the database file processes the results and updates the schema. Internally, the parser is invoked again for each row.
- The in-memory schema is now updated.
- VDBE execution ends.
Multiple entries in sqlite_master may be created by a single CREATE TABLE statement. Consider
CREATE TABLE person (id INTEGER PRIMARY KEY, name TEXT UNIQUE)Not only does the statement create a table, but two indices as well: the first one to implement PRIMARY KEY and the second one to implement UNIQUE constraint. The two implicit indices are named sqlite_autoindex_person_1 and sqlite_autoindex_person_2, respectively. The records describing implicit indices in sqlite_master have an empty sql field.
If the table is WITHOUT ROWID, the data is actually stored in sqlite_autoindex_person_1 index and a BTree for the table isn't created. In the later case, the entry for sqlite_autoindex_person_1 is omitted from sqlite_master since the root page number is stored in the table entry instead.
There's a schema cookie that increments after every schema change. The intention is to notice changes made by other processes (the database file is unlocked when there's no active transactions). When a schema change is detected is is reloaded completely.
- 
We expect to integrate SQL into Tarantool more tightly in the future. But for now a certain level of duplication is to be expected: Tarantool in-memory schema will coexist with the SQLite's one. 
- 
The only sanctioned method of creating SQLite in-memory schema objects is through the parser. This may change in the future but in the meantime we are being cautious. 
- 
We need to store SQL statements in the way reminiscent of the sqlite_mastertable. Conveniently, both_spaceand_indexhave extendableoptsdictionary. We addsqlthere.
- 
In order to populate in-memory SQLite schema, we process _spaceand_indexand synthesise rows insqlite_masterformat. We reuse SQLite original schema loader.
- 
For spaces that weren't created by SQL and hence lacks sqlstatement, we synthesise one.
- 
Tarantool own in-memory schema doesn't need sqlstrings — extracted and discarded at schema load time.
- 
_Spaceand_indexare added to SQLite in-memory schema using generic Tarantool/SQLite schema sync facility which is to be defined later.
- 
CREATE TABLEcreates an entry in_spaceusing generic table modification opcodes.
- 
CREATE INDEXdoes the same in_index.
- 
CREATE VIEWcreates a space without indices (views and tables share the namespace).
- 
CREATE TRIGGERneeds a new dedicated space.
- 
After a space or an index is actually created, it must be added to SQLite in-memory schema. Vanilla SQLite performed SELECTfrom thesqlite_masterto fetch rows defining the new entity and invoked the schema loader on them.
- 
There's no sqlite_masteranymore. We have two options — either rely on the generic schema sync facility, or generate the row data programmatically. The later is possible since the VDBE program had inserted these rows in the first place. We're going to evaluate both approaches.
Tarantool already has the mechanism similar to SQLite's schema version cookie. When the schema changes, purge in-memory schema and perform a full reload.
Once a space is updated propagate changes into SQLite.
This is similar to B but delays propagation until a transaction commit.
A is not particularly efficient since the in-memory schema is often purged and reloaded from scratch.
B is efficient since only the pointed changes are made to the in-memory schema. Additionally, B allows for a simpler VDBE program, since in-memory schema syns automatically once a new space or index is created (see item 13 above).
However, B is notoriously more complex — not only do we need to take action when a new record appears or the old one disappears, we should be capable of executing arbitrary changes and guessing the intent. Ex: ALTER TABLE changes the sql statement. But only limited changes are supported in ALTER TABLE. When sql statement changes we may only drop and reload portions in the in-memory schema that describe the table being changed.
Since transactions are broken C doesn't make sense at all.
Architecture Specifications
- Server architecture
- Feature specifications
- What's in a good specification
- Functional indexes
- Space _index structure
- R tree index quick start and usage
- LuaJIT
- Vinyl
- SQL
- Testing
- Performance
How To ...?
- ... add new fuzzers
- ... build RPM or Deb package using packpack
- ... calculate memory size
- ... debug core dump of stripped tarantool
- ... debug core from different OS
- ... debug Lua state with GDB
- ... generate new bootstrap snapshot
- ... use Address Sanitizer
- ... collect a coredump
Lua modules
Useful links