Skip to content

Commit 6811cad

Browse files
authored
Merge pull request #19711 from Napalys/js/quality/promote_duplicate_char_class
JS: Promote `js/regex/duplicate-in-character-class` to quality
2 parents 3fed7f5 + d968dd0 commit 6811cad

File tree

6 files changed

+143
-13
lines changed

6 files changed

+143
-13
lines changed

javascript/ql/integration-tests/query-suite/javascript-code-quality.qls.expected

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ ql/javascript/ql/src/Expressions/ExprHasNoEffect.ql
33
ql/javascript/ql/src/Expressions/MissingAwait.ql
44
ql/javascript/ql/src/LanguageFeatures/SpuriousArguments.ql
55
ql/javascript/ql/src/Quality/UnhandledErrorInStreamPipeline.ql
6+
ql/javascript/ql/src/RegExp/DuplicateCharacterInCharacterClass.ql
67
ql/javascript/ql/src/RegExp/RegExpAlwaysMatches.ql

javascript/ql/src/RegExp/DuplicateCharacterInCharacterClass.qhelp

Lines changed: 39 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,26 +5,42 @@
55

66
<overview>
77
<p>
8-
Character classes in regular expressions represent sets of characters, so there is no need to specify
9-
the same character twice in one character class. Duplicate characters in character classes are at best
10-
useless, and may even indicate a latent bug.
8+
Character classes in regular expressions (denoted by square brackets <code>[]</code>) represent sets of characters where the pattern matches any single character from that set. Since character classes are sets, specifying the same character multiple times is redundant and often indicates a programming error.
119
</p>
1210

11+
<p>
12+
Common mistakes include:
13+
</p>
14+
<ul>
15+
<li>Using square brackets <code>[]</code> instead of parentheses <code>()</code> for grouping alternatives</li>
16+
<li>Misunderstanding that special regex characters like <code>|</code>, <code>*</code>, <code>+</code>, <code>()</code>, and <code>-</code> work differently when appearing inside a character class</li>
17+
<li>Accidentally duplicating characters or escape sequences that represent the same character</li>
18+
</ul>
19+
1320
</overview>
1421
<recommendation>
1522

16-
<p>If the character was accidentally duplicated, remove it. If the character class was meant to be a
17-
group, replace the brackets with parentheses.</p>
23+
<p>
24+
Examine each duplicate character to determine the intended behavior:
25+
</p>
26+
<ul>
27+
<li>If you see <code>|</code> inside square brackets (e.g., <code>[a|b|c]</code>): This is usually a mistake. The author likely intended alternation. Replace the character class with a group: <code>(a|b|c)</code></li>
28+
<li>If trying to match alternative strings, use parentheses <code>()</code> for grouping instead of square brackets</li>
29+
<li>If the duplicate was truly accidental, remove the redundant characters</li>
30+
<li>If trying to use special regex operators inside square brackets, note that most operators (like <code>|</code>) are treated as literal characters</li>
31+
</ul>
1832

33+
<p>
34+
Note that simply removing <code>|</code> characters from character classes is rarely the correct fix. Instead, analyze the pattern to understand what the author intended to match.
35+
</p>
1936

2037
</recommendation>
2138
<example>
2239
<p>
23-
In the following example, the character class <code>[password|pwd]</code> contains two instances each
24-
of the characters <code>d</code>, <code>p</code>, <code>s</code>, and <code>w</code>. The programmer
25-
most likely meant to write <code>(password|pwd)</code> (a pattern that matches either the string
26-
<code>"password"</code> or the string <code>"pwd"</code>), and accidentally mistyped the enclosing
27-
brackets.
40+
<strong>Example 1: Confusing character classes with groups</strong>
41+
</p>
42+
<p>
43+
The pattern <code>[password|pwd]</code> does not match "password" or "pwd" as intended. Instead, it matches any single character from the set <code>{p, a, s, w, o, r, d, |}</code>. Note that <code>|</code> has no special meaning inside character classes.
2844
</p>
2945

3046
<sample src="examples/DuplicateCharacterInCharacterClass.js" />
@@ -33,10 +49,23 @@ brackets.
3349
To fix this problem, the regular expression should be rewritten to <code>/(password|pwd) =/</code>.
3450
</p>
3551

52+
<p>
53+
<strong>Example 2: CSS unit matching</strong>
54+
</p>
55+
<p>
56+
The pattern <code>r?e[m|x]</code> appears to be trying to match "rem" or "rex", but actually matches "re" followed by any of the characters <code>{m, |, x}</code>. The correct pattern should be <code>r?e(m|x)</code> or <code>r?e[mx]</code>.
57+
</p>
58+
59+
<p>
60+
Similarly, <code>v[h|w|min|max]</code> should be <code>v(h|w|min|max)</code> to properly match "vh", "vw", "vmin", or "vmax".
61+
</p>
62+
3663
</example>
3764
<references>
3865

3966
<li>Mozilla Developer Network: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions">JavaScript Regular Expressions</a>.</li>
67+
<li>MDN: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes">Character Classes</a> - Details on how character classes work.</li>
68+
<li>MDN: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges">Groups and Ranges</a> - Proper use of grouping with parentheses.</li>
4069

4170
</references>
4271
</qhelp>

javascript/ql/src/RegExp/DuplicateCharacterInCharacterClass.ql

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
* @kind problem
66
* @problem.severity warning
77
* @id js/regex/duplicate-in-character-class
8-
* @tags reliability
8+
* @tags quality
9+
* reliability
910
* correctness
1011
* regular-expressions
1112
* @precision very-high

javascript/ql/test/query-tests/RegExp/DuplicateCharacterInCharacterClass/DuplicateCharacterInCharacterClass.expected

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,39 @@
77
| tst.js:8:3:8:3 | a | Character 'a' is $@. | tst.js:8:5:8:5 | a | repeated in the same character class |
88
| tst.js:9:3:9:6 | \\x0a | Character '\\x0a' is $@. | tst.js:9:7:9:10 | \\x0a | repeated in the same character class |
99
| tst.js:10:3:10:8 | \\u000a | Character '\\u000a' is $@. | tst.js:10:9:10:10 | \\n | repeated in the same character class |
10+
| tst.js:15:4:15:4 | \| | Character '\|' is $@. | tst.js:15:6:15:6 | \| | repeated in the same character class |
11+
| tst.js:16:3:16:3 | : | Character ':' is $@. | tst.js:16:9:16:9 | : | repeated in the same character class |
12+
| tst.js:17:4:17:4 | ^ | Character '^' is $@. | tst.js:17:11:17:11 | ^ | repeated in the same character class |
13+
| tst.js:17:5:17:5 | s | Character 's' is $@. | tst.js:17:12:17:12 | s | repeated in the same character class |
14+
| tst.js:17:6:17:6 | t | Character 't' is $@. | tst.js:17:13:17:13 | t | repeated in the same character class |
15+
| tst.js:17:6:17:6 | t | Character 't' is $@. | tst.js:17:15:17:15 | t | repeated in the same character class |
16+
| tst.js:17:6:17:6 | t | Character 't' is $@. | tst.js:17:19:17:19 | t | repeated in the same character class |
17+
| tst.js:17:7:17:7 | y | Character 'y' is $@. | tst.js:17:20:17:20 | y | repeated in the same character class |
18+
| tst.js:17:8:17:8 | l | Character 'l' is $@. | tst.js:17:21:17:21 | l | repeated in the same character class |
19+
| tst.js:17:9:17:9 | e | Character 'e' is $@. | tst.js:17:22:17:22 | e | repeated in the same character class |
20+
| tst.js:18:3:18:3 | . | Character '.' is $@. | tst.js:18:5:18:5 | . | repeated in the same character class |
21+
| tst.js:19:6:19:6 | \u0645 | Character '\u0645' is $@. | tst.js:19:8:19:8 | \u0645 | repeated in the same character class |
22+
| tst.js:22:3:22:4 | \\p | Character '\\p' is $@. | tst.js:22:15:22:16 | \\p | repeated in the same character class |
23+
| tst.js:22:5:22:5 | { | Character '{' is $@. | tst.js:22:17:22:17 | { | repeated in the same character class |
24+
| tst.js:22:7:22:7 | e | Character 'e' is $@. | tst.js:22:10:22:10 | e | repeated in the same character class |
25+
| tst.js:22:8:22:8 | t | Character 't' is $@. | tst.js:22:9:22:9 | t | repeated in the same character class |
26+
| tst.js:22:12:22:12 | } | Character '}' is $@. | tst.js:22:23:22:23 | } | repeated in the same character class |
27+
| tst.js:22:13:22:13 | & | Character '&' is $@. | tst.js:22:14:22:14 | & | repeated in the same character class |
28+
| tst.js:22:21:22:21 | I | Character 'I' is $@. | tst.js:22:22:22:22 | I | repeated in the same character class |
29+
| tst.js:26:3:26:4 | \\p | Character '\\p' is $@. | tst.js:26:13:26:14 | \\p | repeated in the same character class |
30+
| tst.js:26:5:26:5 | { | Character '{' is $@. | tst.js:26:15:26:15 | { | repeated in the same character class |
31+
| tst.js:26:7:26:7 | e | Character 'e' is $@. | tst.js:26:10:26:10 | e | repeated in the same character class |
32+
| tst.js:26:7:26:7 | e | Character 'e' is $@. | tst.js:26:17:26:17 | e | repeated in the same character class |
33+
| tst.js:26:7:26:7 | e | Character 'e' is $@. | tst.js:26:28:26:28 | e | repeated in the same character class |
34+
| tst.js:26:8:26:8 | t | Character 't' is $@. | tst.js:26:9:26:9 | t | repeated in the same character class |
35+
| tst.js:26:11:26:11 | r | Character 'r' is $@. | tst.js:26:29:26:29 | r | repeated in the same character class |
36+
| tst.js:26:12:26:12 | } | Character '}' is $@. | tst.js:26:30:26:30 | } | repeated in the same character class |
37+
| tst.js:26:20:26:20 | m | Character 'm' is $@. | tst.js:26:26:26:26 | m | repeated in the same character class |
38+
| tst.js:28:3:28:3 | / | Character '/' is $@. | tst.js:28:5:28:5 | / | repeated in the same character class |
39+
| tst.js:30:4:30:4 | ^ | Character '^' is $@. | tst.js:30:5:30:5 | ^ | repeated in the same character class |
40+
| tst.js:31:4:31:4 | * | Character '*' is $@. | tst.js:31:5:31:5 | * | repeated in the same character class |
41+
| tst.js:33:5:33:5 | \| | Character '\|' is $@. | tst.js:33:6:33:7 | \\\| | repeated in the same character class |
42+
| tst_replace.js:3:26:3:26 | n | Character 'n' is $@. | tst_replace.js:3:28:3:28 | n | repeated in the same character class |
43+
| tst_replace.js:11:18:11:18 | n | Character 'n' is $@. | tst_replace.js:11:20:11:20 | n | repeated in the same character class |
44+
| tst_replace.js:25:18:25:18 | n | Character 'n' is $@. | tst_replace.js:25:20:25:20 | n | repeated in the same character class |
45+
| tst_replace.js:42:18:42:18 | n | Character 'n' is $@. | tst_replace.js:42:20:42:20 | n | repeated in the same character class |

javascript/ql/test/query-tests/RegExp/DuplicateCharacterInCharacterClass/tst.js

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,31 @@
33
/[\uDC3A\uDC3C]/;
44
/[??]/; // $ Alert
55
/[\u003F\u003f]/; // $ Alert
6-
/[\u003F?]/; // $ Alert
6+
/[\u003F?]/; // $ Alert -- \u003F evaluates to ?, which is the same as ? in the character class
77
/[\x3f\u003f]/; // $ Alert
88
/[aaa]/; // $ Alert
99
/[\x0a\x0a]/; // $ Alert
10-
/[\u000a\n]/; // $ Alert
10+
/[\u000a\n]/; // $ Alert -- \u000a evaluates to \n, which is the same as \n in the character class
1111
/[\u{ff}]/;
1212
/[\u{12340}-\u{12345}]/u;
1313
new RegExp("[\u{12340}-\u{12345}]", "u");
1414
const regex = /\b(?:https?:\/\/|mailto:|www\.)(?:[\S--[\p{P}<>]]|\/|[\S--[\[\]]]+[\S--[\p{P}<>]])+|\b[\S--[@\p{Ps}\p{Pe}<>]]+@([\S--[\p{P}<>]]+(?:\.[\S--[\p{P}<>]]+)+)/gmv;
15+
/[a|b|c]/; // $ Alert -- Repeated | character in character class, which has no special meaning in this context
16+
/[:alnum:]/; // $ Alert -- JavaScript does not support POSIX character classes like `[:alnum:]` in regular expressions, thus characters in the class are treated as literals
17+
/[(^style|^staticStyle)]/; // $ Alert
18+
/[.x.]/i; // $ Alert -- Repeated . character in character class
19+
/^[يفمأمسند]/i; // $ Alert -- م duplicate
20+
/[\u{1F600}-\u{1F64F}]/u;
21+
/[\p{Letter}&&\p{ASCII}]/v; // && is an intersection operator while /v flag is present
22+
/[\p{Letter}&&\p{ASCII}]/; // $ Alert -- without /v flag, && is not a valid operator and treated as member of character class thus duplicate
23+
/[\p{Decimal_Number}&&[0-9A-F]]/v;
24+
/[\p{Letter}--[aeiouAEIOU]]/v;
25+
/[\p{Letter}\p{Decimal_Number}]/v; // Union operation between two character classes only with /v flag
26+
/[\p{Letter}\p{Decimal_Number}]/; // $ Alert -- without /v flag, this is not a valid operation and treated as member of character class thus duplicate
27+
/[\[\]]/;
28+
/[/[/]]/; // $ Alert
29+
/[^^abc]/; // First `^` is a negation operator, second treated as literal `^` is a member of character class
30+
/[^^^abc]/; // $ Alert -- Second and third `^` are treated as literals thus duplicates
31+
/[^**]/; // $ Alert
32+
/[-a-z]/; // Matches `-` and range `a-z` no duplicate
33+
/^[:|\|]/ // $ Alert -- `|` is treated as a literal character in the character class, thus duplicate even with escape character
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
function reg(){
2+
const nonIdPattern = 'a-z';
3+
const basePattern = /[<nonId>]/.source; // $ SPURIOUS:Alert
4+
const finalPattern = basePattern.replace(/<nonId>/g, nonIdPattern);
5+
console.log(finalPattern);
6+
const regex2 = new RegExp(finalPattern);
7+
}
8+
9+
function reg1(){
10+
const nonIdPattern = 'a-z';
11+
const reg = /[<nonId>]/; // $ SPURIOUS:Alert
12+
const basePattern = reg.source;
13+
const finalPattern = basePattern.replace(/<nonId>/g, nonIdPattern);
14+
console.log(finalPattern);
15+
const regex2 = new RegExp(finalPattern);
16+
}
17+
18+
function replacer(reg1, reg2){
19+
const basePattern = reg1.source;
20+
const finalPattern = basePattern.replace(/<nonId>/g, reg2);
21+
return new RegExp(finalPattern);
22+
}
23+
function reg2(){
24+
const nonIdPattern = 'a-z';
25+
const reg = /[<nonId>]/; // $ SPURIOUS:Alert
26+
replacer(reg, nonIdPattern);
27+
}
28+
29+
30+
function replacer3(str, reg2){
31+
const finalPattern = str.replace(/<nonId>/g, reg2);
32+
return new RegExp(finalPattern);
33+
}
34+
35+
function replacer2(reg1, reg2){
36+
const basePattern = reg1.source;
37+
return replacer3(basePattern, reg2);
38+
}
39+
40+
function reg3(){
41+
const nonIdPattern = 'a-z';
42+
const reg = /[<nonId>]/; // $ SPURIOUS:Alert
43+
replacer2(reg, nonIdPattern);
44+
}

0 commit comments

Comments
 (0)