-
Notifications
You must be signed in to change notification settings - Fork 423
IBM DB2 support #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IBM DB2 support #25
Conversation
Unfortunately I didn't find any other solution for this...
@@ -57,11 +63,11 @@ export default class Tokenizer { | |||
// 5. national character quoted string using N'' or N\' to escape | |||
createStringPattern(stringTypes) { | |||
const patterns = { | |||
"``": "((`[^`]*($|`))+)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reordered them to match documentation order.
src/languages/Db2Formatter.js
Outdated
]; | ||
|
||
const reservedToplevelWords = [ | ||
"ADD", "AFTER", "ALTER COLUMN", "ALTER TABLE", "DELETE FROM", "EXCEPT", "FETCH FIRST", "FROM", "GROUP BY", "GO", "HAVING", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only thing that was added is "FETCH FIRST"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to this group has "FETCH FIRST" compared to StandardSqlFormatter.
Might be worth to document it. Another thing: would be nice if the lines aligned with StandardSqlFormatter, so I could see differences when I just run diff
for these files.
Of course a best diff would be one word per line. Perhaps it's actually worth doing that, although it creates pretty long files, it should be easier to maintain. Like currently I see that there are lots of differences in the reservedWords
list, but it's really hard to tell what are the exact differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried making toplevel and newline words easier to compare because these are the most important ones.
I think one word per line is overkill. If you really want to compare differenct dialects config then you can use online tools or write a quick script with Lodash (that's what I did).
src/core/Tokenizer.js
Outdated
*/ | ||
constructor(cfg) { | ||
this.WORD_REGEX = /^(\w+)/; | ||
this.WORD_REGEX = /^([\w|#|@]+)/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried playing around with word and operators tokenizing order (match operators before words) but that was no-go. Currently I went with the easiest solution that shouldn't break anything...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also make this WORD_REGEX configurable. We could have a config option like specialWordChars: ["@", "#"]
.
Other SQL dialects don't allow #
and @
inside identifiers, which currently results in a scenario where formatting the following query with StandardSqlFormatter:
SELECT a#comment, here
results in:
SELECT
a#comment,
here
while correct formatting would be:
SELECT
a #comment, here
" -- This is a comment\n" + | ||
" MyTable;\n" | ||
); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also test the formatting of @
and #
inside identifiers. This test partially covers the latter, but it would be better to have a separate test with clear description, explaining the @
and #
are part of identifiers (e.g. not treated as operators).
src/core/Tokenizer.js
Outdated
@@ -49,6 +50,10 @@ export default class Tokenizer { | |||
return new RegExp(`^(${reservedWordsPattern})\\b`, "i"); | |||
} | |||
|
|||
createWordRegex(specialChars = []) { | |||
return new RegExp(`^([\\w|${specialChars.join("|")}]+)`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will result in regex like: ^([\w|@|#]+)
As the |
characters are part of character class [ ... ]
this will match stuff like foo|bar
as a single word, as |
has no special meaning inside square brackets.
test/Db2FormatterTest.js
Outdated
@@ -38,7 +38,8 @@ describe("Db2Formatter", function() { | |||
|
|||
it("recognizes @ and # as identifiers", function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more correct to say: "as part of identifiers"
This got added in the original PR for DB2 support: #25 Don't know how Uku interpreted the DB2 manual, but I for one don't see these characters being allowed in identifiers. Unfortunately I didn't check the manual myself when I originally accepted this PR.
I used ftp://ftp.software.ibm.com/ps/products/db2/info/vr105/pdf/en_US/DB2SQLRefVol1-db2s1e1051.pdf for learning and setting up DB2 configuration.