Skip to content

Commit de3d2b3

Browse files
committed
release 1.6.0 with refactoring & adding new tokens, adding Athena
1 parent 847be25 commit de3d2b3

27 files changed

+63822
-647
lines changed

CHANGELOG.txt

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,32 @@
1+
**v1.6.0**
2+
### IMPORTANT:
3+
In this versions there is some output changes & fixes that can break your code.
4+
1. Now all arguments inside brackets are parsed as separate strings in the list.
5+
For example:
6+
`file_format = (TYPE=JSON NULL_IF=('field')` this was parsed like 'NULL_IF': "('field')",
7+
now it will be: 'NULL_IF': ["'field'"],
8+
9+
2. Added separate tokens for EQ `=` and IN (previously they was parsed as IDs also - for internal info, for contributors.
10+
11+
3. Some check statements in columns now parsed validly, also IN statements parsed as normal lists.
12+
So this statement include_exclude_ind CHAR(1) NOT NULL CONSTRAINT chk_metalistcombo_logicalopr
13+
CHECK (include_exclude_ind IN ('I', 'E')),
14+
15+
16+
will produce this output:
17+
18+
{'check': {'constraint_name': 'chk_metalistcombo_logicalopr',
19+
'statement': {'in_statement': {'in': ["'I'", "'E'"],
20+
'name': 'include_exclude_ind'}}},
21+
22+
23+
### Fixes
24+
1. DEFAULT word now is not arriving in key 'default' (it was before in some cases)
25+
26+
### New Features
27+
1. Added Athena output mode and initial support - https://github.com/datacontract/datacontract-cli/issues/332
28+
29+
130
**v1.5.4**
231
### Improvements
332
#### Snowflake :

README.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -489,17 +489,55 @@ for help with debugging & testing support for BigQuery dialect DDLs:
489489
* https://github.com/kalyan939
490490

491491
## Changelog
492+
**v1.6.0**
493+
### IMPORTANT:
494+
In this versions there is some output changes & fixes that can break your code.
495+
1. Now all arguments inside brackets are parsed as separate strings in the list.
496+
For example:
497+
`file_format = (TYPE=JSON NULL_IF=('field')` this was parsed like 'NULL_IF': "('field')",
498+
now it will be: 'NULL_IF': ["'field'"],
499+
500+
2. Added separate tokens for EQ `=` and IN (previously they was parsed as IDs also - for internal info, for contributors.
501+
502+
3. Some check statements in columns now parsed validly, also IN statements parsed as normal lists.
503+
So this statement include_exclude_ind CHAR(1) NOT NULL CONSTRAINT chk_metalistcombo_logicalopr
504+
CHECK (include_exclude_ind IN ('I', 'E')),
505+
506+
507+
will produce this output:
508+
509+
{'check': {'constraint_name': 'chk_metalistcombo_logicalopr',
510+
'statement': {'in_statement': {'in': ["'I'", "'E'"],
511+
'name': 'include_exclude_ind'}}},
512+
513+
514+
### Fixes
515+
1. DEFAULT word now is not arriving in key 'default' (it was before in some cases)
516+
517+
### New Features
518+
1. Added Athena output mode and initial support - https://github.com/datacontract/datacontract-cli/issues/332
519+
520+
521+
**v1.5.4**
522+
### Improvements
523+
#### Snowflake :
524+
1. In Snowflake add `pattern` token for external table statement, and improve location rendering
525+
526+
492527
**v1.5.3**
493528
### Fixes
494529

495-
1. In Snowflake Fix unexpected behaviour when file_format name given - https://github.com/xnuinside/simple-ddl-parser/issues/273
530+
1. In Snowflake unexpected error when STRIP_OUTER_ARRAY property in file_format statement - https://github.com/xnuinside/simple-ddl-parser/issues/276
496531
2.
497532

498533
**v1.5.2**
499534
### Improvements
500535
#### MySQL
501536
1. Added support for COLLATE - https://github.com/xnuinside/simple-ddl-parser/pull/266/files
502537

538+
### Fixes
539+
540+
1. In Snowflake Fix unexpected behaviour when file_format name given - https://github.com/xnuinside/simple-ddl-parser/issues/273
503541

504542
**v1.5.1**
505543
### Improvements

docs/README.rst

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -555,13 +555,64 @@ for help with debugging & testing support for BigQuery dialect DDLs:
555555
Changelog
556556
---------
557557

558+
**v1.6.0**
559+
560+
IMPORTANT:
561+
^^^^^^^^^^
562+
563+
In this versions there is some output changes & fixes that can break your code.
564+
565+
566+
#.
567+
Now all arguments inside brackets are parsed as separate strings in the list.
568+
For example:
569+
``file_format = (TYPE=JSON NULL_IF=('field')`` this was parsed like 'NULL_IF': "('field')",
570+
now it will be: 'NULL_IF': ["'field'"],
571+
572+
#.
573+
Added separate tokens for EQ ``=`` and IN (previously they was parsed as IDs also - for internal info, for contributors.
574+
575+
#.
576+
Some check statements in columns now parsed validly, also IN statements parsed as normal lists.
577+
So this statement include_exclude_ind CHAR(1) NOT NULL CONSTRAINT chk_metalistcombo_logicalopr
578+
CHECK (include_exclude_ind IN ('I', 'E')),
579+
580+
will produce this output:
581+
582+
{'check': {'constraint_name': 'chk_metalistcombo_logicalopr',
583+
'statement': {'in_statement': {'in': ["'I'", "'E'"],
584+
'name': 'include_exclude_ind'}}},
585+
586+
Fixes
587+
^^^^^
588+
589+
590+
#. DEFAULT word now is not arriving in key 'default' (it was before in some cases)
591+
592+
New Features
593+
^^^^^^^^^^^^
594+
595+
596+
#. Added Athena output mode and initial support - https://github.com/datacontract/datacontract-cli/issues/332
597+
598+
**v1.5.4**
599+
600+
Improvements
601+
^^^^^^^^^^^^
602+
603+
Snowflake :
604+
~~~~~~~~~~~
605+
606+
607+
#. In Snowflake add ``pattern`` token for external table statement, and improve location rendering
608+
558609
**v1.5.3**
559610

560611
Fixes
561612
^^^^^
562613

563614

564-
#. In Snowflake Fix unexpected behaviour when file_format name given - https://github.com/xnuinside/simple-ddl-parser/issues/273
615+
#. In Snowflake unexpected error when STRIP_OUTER_ARRAY property in file_format statement - https://github.com/xnuinside/simple-ddl-parser/issues/276
565616
2.
566617

567618
**v1.5.2**
@@ -575,6 +626,12 @@ MySQL
575626

576627
#. Added support for COLLATE - https://github.com/xnuinside/simple-ddl-parser/pull/266/files
577628

629+
Fixes
630+
^^^^^
631+
632+
633+
#. In Snowflake Fix unexpected behaviour when file_format name given - https://github.com/xnuinside/simple-ddl-parser/issues/273
634+
578635
**v1.5.1**
579636

580637
Improvements

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "simple-ddl-parser"
3-
version = "1.5.3"
3+
version = "1.6.0"
44
description = "Simple DDL Parser to parse SQL & dialects like HQL, TSQL (MSSQL), Oracle, AWS Redshift, Snowflake, MySQL, PostgreSQL, etc ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.; sequences, alters, custom types & other entities from ddl."
55
authors = ["Iuliia Volkova <[email protected]>"]
66
license = "MIT"

simple_ddl_parser/ddl_parser.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
HQL,
88
MSSQL,
99
PSQL,
10+
Athena,
1011
BaseSQL,
1112
BigQuery,
1213
IBMDb2,
@@ -37,6 +38,7 @@ class Dialects(
3738
BigQuery,
3839
IBMDb2,
3940
PSQL,
41+
Athena,
4042
):
4143
pass
4244

@@ -159,14 +161,18 @@ def is_creation_name(self, t: LexToken) -> bool:
159161
"TYPE",
160162
"DOMAIN",
161163
"TABLESPACE",
162-
"INDEX",
163164
"CONSTRAINT",
164165
"EXISTS",
165166
]
166167
return (
167168
t.value not in skip_id_tokens
168169
and t.value.upper() not in ["IF"]
169-
and self.lexer.last_token in exceptional_keys
170+
and (
171+
self.lexer.last_token in exceptional_keys
172+
or (
173+
self.lexer.last_token == "INDEX" and self.lexer.is_table is not True
174+
)
175+
)
170176
and not self.exceptional_cases(t.value.upper())
171177
)
172178

@@ -193,13 +199,14 @@ def t_AUTOINCREMENT(self, t: LexToken):
193199

194200
def t_ID(self, t: LexToken):
195201
r"([0-9]+[.][0-9]*([e][+-]?[0-9]+)?|[0-9]\.[0-9])\w|([a-zA-Z_,0-9:><\/\\\=\-\+\~\%$@#\|&?;*\()!{}\[\]\`\[\]]+)"
202+
if len(t.value) > 1 and t.value.endswith(","):
203+
t.value = t.value[:-1]
196204
t.type = tok.symbol_tokens.get(t.value, "ID")
197205

198206
if t.type == "LP":
199207
self.lexer.lp_open += 1
200208
self.lexer.columns_def = True
201209
self.lexer.last_token = "LP"
202-
print(t.type, t.value)
203210
return t
204211
elif self.is_token_column_name(t) or self.lexer.last_token == "DOT":
205212
t.type = "ID"
@@ -249,7 +256,6 @@ def set_lexx_tags(self, t: LexToken):
249256

250257
def set_last_token(self, t: LexToken):
251258
self.lexer.last_token = t.type
252-
print(t.value, t.type)
253259
return t
254260

255261
def p_id(self, p):

simple_ddl_parser/dialects/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
from simple_ddl_parser.dialects.athena import Athena
12
from simple_ddl_parser.dialects.bigquery import BigQuery
23
from simple_ddl_parser.dialects.hql import HQL
34
from simple_ddl_parser.dialects.ibm import IBMDb2
@@ -22,4 +23,5 @@
2223
"IBMDb2",
2324
"BaseSQL",
2425
"PSQL",
26+
"Athena",
2527
]
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
from typing import List
2+
3+
4+
class Athena:
5+
def p_escaped_by(self, p: List) -> None:
6+
"""expr : expr ESCAPED BY STRING_BASE"""
7+
p[0] = p[1]
8+
p_list = list(p)
9+
if "\\\\" in p_list[-1]:
10+
p_list[-1] = "\\"
11+
p[0]["escaped_by"] = p_list[-1]

simple_ddl_parser/dialects/bigquery.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,11 @@ def p_multiple_options(self, p):
1515
p[0] = p[1]
1616

1717
def p_options(self, p):
18-
"""options : OPTIONS LP id_equals RP"""
18+
"""options : OPTIONS LP multi_id_equals RP"""
1919
p_list = list(p)
2020
if not isinstance(p[1], dict):
21-
p[0] = {"options": p[3]}
21+
options = [{key: value} for key, value in p[3].items()]
22+
p[0] = {"options": options}
2223
else:
2324
p[0] = p[1]
2425
if len(p) == 4:

simple_ddl_parser/dialects/hql.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,19 @@ def p_expression_location(self, p: List) -> None:
88
"""expr : expr LOCATION EQ STRING
99
| expr LOCATION EQ DQ_STRING
1010
| expr LOCATION EQ multi_id_or_string
11+
| expr LOCATION DQ_STRING
12+
| expr LOCATION STRING
13+
| expr LOCATION multi_id_or_string
14+
| expr LOCATION EQ ID EQ ID EQ ID
1115
"""
16+
# last expr for sample like location=@ADL_Azure_Storage_Account_Container_Name/year=2023/month=08/
1217
p[0] = p[1]
1318
p_list = list(p)
14-
p[0]["location"] = p_list[-1]
19+
if len(p_list) == 9:
20+
location = "".join(p_list[4:])
21+
else:
22+
location = p_list[-1]
23+
p[0]["location"] = location
1524

1625
def p_expression_clustered(self, p: List) -> None:
1726
"""expr : expr ID ON LP pid RP
@@ -73,10 +82,10 @@ def p_multi_assignments(self, p: List) -> None:
7382
p[0].update(p_list[-1])
7483

7584
def p_assignment(self, p: List) -> None:
76-
"""assignment : id id id
77-
| STRING id STRING
78-
| id id STRING
79-
| STRING id id
85+
"""assignment : id EQ id
86+
| STRING EQ STRING
87+
| id EQ STRING
88+
| STRING EQ id
8089
| STRING id"""
8190
p_list = remove_par(list(p))
8291
if "state" in self.lexer.__dict__:

simple_ddl_parser/dialects/ibm.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,7 @@
33

44
class IBMDb2:
55
def p_expr_index_in(self, p: List) -> None:
6-
"""expr : expr INDEX id id"""
6+
"""expr : expr INDEX IN id"""
77
p_list = list(p)
8-
if p_list[-2].upper() == "IN":
9-
p[1].update({"index_in": p_list[-1]})
8+
p[1].update({"index_in": p_list[-1]})
109
p[0] = p[1]

0 commit comments

Comments
 (0)