You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: javascript/ql/src/RegExp/DuplicateCharacterInCharacterClass.qhelp
+39-10Lines changed: 39 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -5,26 +5,42 @@
5
5
6
6
<overview>
7
7
<p>
8
-
Character classes in regular expressions represent sets of characters, so there is no need to specify
9
-
the same character twice in one character class. Duplicate characters in character classes are at best
10
-
useless, and may even indicate a latent bug.
8
+
Character classes in regular expressions (denoted by square brackets <code>[]</code>) represent sets of characters where the pattern matches any single character from that set. Since character classes are sets, specifying the same character multiple times is redundant and often indicates a programming error.
11
9
</p>
12
10
11
+
<p>
12
+
Common mistakes include:
13
+
</p>
14
+
<ul>
15
+
<li>Using square brackets <code>[]</code> instead of parentheses <code>()</code> for grouping alternatives</li>
16
+
<li>Misunderstanding that special regex characters like <code>|</code>, <code>*</code>, <code>+</code>, <code>()</code>, and <code>-</code> work differently when appearing inside a character class</li>
17
+
<li>Accidentally duplicating characters or escape sequences that represent the same character</li>
18
+
</ul>
19
+
13
20
</overview>
14
21
<recommendation>
15
22
16
-
<p>If the character was accidentally duplicated, remove it. If the character class was meant to be a
17
-
group, replace the brackets with parentheses.</p>
23
+
<p>
24
+
Examine each duplicate character to determine the intended behavior:
25
+
</p>
26
+
<ul>
27
+
<li>If you see <code>|</code> inside square brackets (e.g., <code>[a|b|c]</code>): This is usually a mistake. The author likely intended alternation. Replace the character class with a group: <code>(a|b|c)</code></li>
28
+
<li>If trying to match alternative strings, use parentheses <code>()</code> for grouping instead of square brackets</li>
29
+
<li>If the duplicate was truly accidental, remove the redundant characters</li>
30
+
<li>If trying to use special regex operators inside square brackets, note that most operators (like <code>|</code>) are treated as literal characters</li>
31
+
</ul>
18
32
33
+
<p>
34
+
Note that simply removing <code>|</code> characters from character classes is rarely the correct fix. Instead, analyze the pattern to understand what the author intended to match.
35
+
</p>
19
36
20
37
</recommendation>
21
38
<example>
22
39
<p>
23
-
In the following example, the character class <code>[password|pwd]</code> contains two instances each
24
-
of the characters <code>d</code>, <code>p</code>, <code>s</code>, and <code>w</code>. The programmer
25
-
most likely meant to write <code>(password|pwd)</code> (a pattern that matches either the string
26
-
<code>"password"</code> or the string <code>"pwd"</code>), and accidentally mistyped the enclosing
27
-
brackets.
40
+
<strong>Example 1: Confusing character classes with groups</strong>
41
+
</p>
42
+
<p>
43
+
The pattern <code>[password|pwd]</code> does not match "password" or "pwd" as intended. Instead, it matches any single character from the set <code>{p, a, s, w, o, r, d, |}</code>. Note that <code>|</code> has no special meaning inside character classes.
To fix this problem, the regular expression should be rewritten to <code>/(password|pwd) =/</code>.
34
50
</p>
35
51
52
+
<p>
53
+
<strong>Example 2: CSS unit matching</strong>
54
+
</p>
55
+
<p>
56
+
The pattern <code>r?e[m|x]</code> appears to be trying to match "rem" or "rex", but actually matches "re" followed by any of the characters <code>{m, |, x}</code>. The correct pattern should be <code>r?e(m|x)</code> or <code>r?e[mx]</code>.
57
+
</p>
58
+
59
+
<p>
60
+
Similarly, <code>v[h|w|min|max]</code> should be <code>v(h|w|min|max)</code> to properly match "vh", "vw", "vmin", or "vmax".
<li>MDN: <ahref="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes">Character Classes</a> - Details on how character classes work.</li>
68
+
<li>MDN: <ahref="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges">Groups and Ranges</a> - Proper use of grouping with parentheses.</li>
Copy file name to clipboardExpand all lines: javascript/ql/test/query-tests/RegExp/DuplicateCharacterInCharacterClass/DuplicateCharacterInCharacterClass.expected
+36Lines changed: 36 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -7,3 +7,39 @@
7
7
| tst.js:8:3:8:3 | a | Character 'a' is $@. | tst.js:8:5:8:5 | a | repeated in the same character class |
8
8
| tst.js:9:3:9:6 | \\x0a | Character '\\x0a' is $@. | tst.js:9:7:9:10 | \\x0a | repeated in the same character class |
9
9
| tst.js:10:3:10:8 | \\u000a | Character '\\u000a' is $@. | tst.js:10:9:10:10 | \\n | repeated in the same character class |
10
+
| tst.js:15:4:15:4 | \| | Character '\|' is $@. | tst.js:15:6:15:6 | \| | repeated in the same character class |
11
+
| tst.js:16:3:16:3 | : | Character ':' is $@. | tst.js:16:9:16:9 | : | repeated in the same character class |
12
+
| tst.js:17:4:17:4 | ^ | Character '^' is $@. | tst.js:17:11:17:11 | ^ | repeated in the same character class |
13
+
| tst.js:17:5:17:5 | s | Character 's' is $@. | tst.js:17:12:17:12 | s | repeated in the same character class |
14
+
| tst.js:17:6:17:6 | t | Character 't' is $@. | tst.js:17:13:17:13 | t | repeated in the same character class |
15
+
| tst.js:17:6:17:6 | t | Character 't' is $@. | tst.js:17:15:17:15 | t | repeated in the same character class |
16
+
| tst.js:17:6:17:6 | t | Character 't' is $@. | tst.js:17:19:17:19 | t | repeated in the same character class |
17
+
| tst.js:17:7:17:7 | y | Character 'y' is $@. | tst.js:17:20:17:20 | y | repeated in the same character class |
18
+
| tst.js:17:8:17:8 | l | Character 'l' is $@. | tst.js:17:21:17:21 | l | repeated in the same character class |
19
+
| tst.js:17:9:17:9 | e | Character 'e' is $@. | tst.js:17:22:17:22 | e | repeated in the same character class |
20
+
| tst.js:18:3:18:3 | . | Character '.' is $@. | tst.js:18:5:18:5 | . | repeated in the same character class |
21
+
| tst.js:19:6:19:6 | \u0645 | Character '\u0645' is $@. | tst.js:19:8:19:8 | \u0645 | repeated in the same character class |
22
+
| tst.js:22:3:22:4 | \\p | Character '\\p' is $@. | tst.js:22:15:22:16 | \\p | repeated in the same character class |
23
+
| tst.js:22:5:22:5 | { | Character '{' is $@. | tst.js:22:17:22:17 | { | repeated in the same character class |
24
+
| tst.js:22:7:22:7 | e | Character 'e' is $@. | tst.js:22:10:22:10 | e | repeated in the same character class |
25
+
| tst.js:22:8:22:8 | t | Character 't' is $@. | tst.js:22:9:22:9 | t | repeated in the same character class |
26
+
| tst.js:22:12:22:12 | } | Character '}' is $@. | tst.js:22:23:22:23 | } | repeated in the same character class |
27
+
| tst.js:22:13:22:13 | & | Character '&' is $@. | tst.js:22:14:22:14 | & | repeated in the same character class |
28
+
| tst.js:22:21:22:21 | I | Character 'I' is $@. | tst.js:22:22:22:22 | I | repeated in the same character class |
29
+
| tst.js:26:3:26:4 | \\p | Character '\\p' is $@. | tst.js:26:13:26:14 | \\p | repeated in the same character class |
30
+
| tst.js:26:5:26:5 | { | Character '{' is $@. | tst.js:26:15:26:15 | { | repeated in the same character class |
31
+
| tst.js:26:7:26:7 | e | Character 'e' is $@. | tst.js:26:10:26:10 | e | repeated in the same character class |
32
+
| tst.js:26:7:26:7 | e | Character 'e' is $@. | tst.js:26:17:26:17 | e | repeated in the same character class |
33
+
| tst.js:26:7:26:7 | e | Character 'e' is $@. | tst.js:26:28:26:28 | e | repeated in the same character class |
34
+
| tst.js:26:8:26:8 | t | Character 't' is $@. | tst.js:26:9:26:9 | t | repeated in the same character class |
35
+
| tst.js:26:11:26:11 | r | Character 'r' is $@. | tst.js:26:29:26:29 | r | repeated in the same character class |
36
+
| tst.js:26:12:26:12 | } | Character '}' is $@. | tst.js:26:30:26:30 | } | repeated in the same character class |
37
+
| tst.js:26:20:26:20 | m | Character 'm' is $@. | tst.js:26:26:26:26 | m | repeated in the same character class |
38
+
| tst.js:28:3:28:3 | / | Character '/' is $@. | tst.js:28:5:28:5 | / | repeated in the same character class |
39
+
| tst.js:30:4:30:4 | ^ | Character '^' is $@. | tst.js:30:5:30:5 | ^ | repeated in the same character class |
40
+
| tst.js:31:4:31:4 | * | Character '*' is $@. | tst.js:31:5:31:5 | * | repeated in the same character class |
41
+
| tst.js:33:5:33:5 | \| | Character '\|' is $@. | tst.js:33:6:33:7 | \\\| | repeated in the same character class |
42
+
| tst_replace.js:3:26:3:26 | n | Character 'n' is $@. | tst_replace.js:3:28:3:28 | n | repeated in the same character class |
43
+
| tst_replace.js:11:18:11:18 | n | Character 'n' is $@. | tst_replace.js:11:20:11:20 | n | repeated in the same character class |
44
+
| tst_replace.js:25:18:25:18 | n | Character 'n' is $@. | tst_replace.js:25:20:25:20 | n | repeated in the same character class |
45
+
| tst_replace.js:42:18:42:18 | n | Character 'n' is $@. | tst_replace.js:42:20:42:20 | n | repeated in the same character class |
/[a|b|c]/;// $ Alert -- Repeated | character in character class, which has no special meaning in this context
16
+
/[:alnum:]/;// $ Alert -- JavaScript does not support POSIX character classes like `[:alnum:]` in regular expressions, thus characters in the class are treated as literals
17
+
/[(^style|^staticStyle)]/;// $ Alert
18
+
/[.x.]/i;// $ Alert -- Repeated . character in character class
19
+
/^[يفمأمسند]/i;// $ Alert -- م duplicate
20
+
/[\u{1F600}-\u{1F64F}]/u;
21
+
/[\p{Letter}&&\p{ASCII}]/v;// && is an intersection operator while /v flag is present
22
+
/[\p{Letter}&&\p{ASCII}]/;// $ Alert -- without /v flag, && is not a valid operator and treated as member of character class thus duplicate
23
+
/[\p{Decimal_Number}&&[0-9A-F]]/v;
24
+
/[\p{Letter}--[aeiouAEIOU]]/v;
25
+
/[\p{Letter}\p{Decimal_Number}]/v;// Union operation between two character classes only with /v flag
26
+
/[\p{Letter}\p{Decimal_Number}]/;// $ Alert -- without /v flag, this is not a valid operation and treated as member of character class thus duplicate
27
+
/[\[\]]/;
28
+
/[/[/]]/;// $ Alert
29
+
/[^^abc]/;// First `^` is a negation operator, second treated as literal `^` is a member of character class
30
+
/[^^^abc]/;// $ Alert -- Second and third `^` are treated as literals thus duplicates
31
+
/[^**]/;// $ Alert
32
+
/[-a-z]/;// Matches `-` and range `a-z` no duplicate
33
+
/^[:|\|]/// $ Alert -- `|` is treated as a literal character in the character class, thus duplicate even with escape character
0 commit comments