-
Notifications
You must be signed in to change notification settings - Fork 18
Add warnings for Invalid language escape sequences in text strings #689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds private validation of language escape sequences for BOM-prefixed text strings in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/main/java/org/verapdf/cos/COSString.java (1)
260-260: Fix grammar in warning message.The warning message uses "have" but should use "has" for grammatical correctness.
🔎 Apply this diff:
- LOGGER.warning("Text string language escape sequence have invalid length"); + LOGGER.warning("Text string language escape sequence has invalid length");- LOGGER.warning("Text string language escape sequence have invalid length"); + LOGGER.warning("Text string language escape sequence has invalid length");Also applies to: 291-291
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/main/java/org/verapdf/cos/COSString.java(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/main/java/org/verapdf/cos/COSString.java (1)
src/main/java/org/verapdf/tools/PDFDocEncoding.java (1)
PDFDocEncoding(28-112)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (2)
src/main/java/org/verapdf/cos/COSString.java (2)
298-301: LGTM!The helper method correctly identifies ASCII letters (A-Z and a-z) with proper bit masking to handle Java's signed bytes.
237-265: No changes needed—the language code encoding is correct per PDF specification.The PDF specification defines the escape sequence as: the Unicode value U+001B (byte sequence 0 followed by 27), followed by a 2-character ISO 639 language code—for example, en for English or ja for Japanese. Language escape sequences in UTF-16BE strings use a 2-byte language tag with an optional 2-byte country code.
The current implementation is correct. Language codes in UTF-16BE strings are stored as raw ASCII bytes (not UTF-16BE encoded characters). For example, the language code "en" is represented as two single bytes (0x65, 0x6E), not as UTF-16BE encoded characters that would require four bytes (0x00 0x65, 0x00 0x6E). The method correctly validates this format by checking individual bytes as ASCII letters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/main/java/org/verapdf/cos/COSString.java (1)
222-236: Critical: Validation return values are ignored (issue not resolved from previous review).The
checkUTF16BEEscapeSequenceandcheckUTF8EscapeSequencemethods returnfalsewhen invalid escape sequences are detected, butisTextStringignores these return values and always returnstruefor BOM-prefixed strings. This completely defeats the purpose of the validation—strings with invalid language escape sequences are incorrectly classified as valid text strings.🔎 Apply this diff to fix the issue:
public boolean isTextString() { if (value.length > 2) { if ((value[0] & 0xFF) == 0xFE && (value[1] & 0xFF) == 0xFF) { - checkUTF16BEEscapeSequence(value); - return true; + return checkUTF16BEEscapeSequence(value); } } if (value.length > 3) { if ((value[0] & 0xFF) == 0xEF && (value[1] & 0xFF) == 0xBB && (value[2] & 0xFF) == 0xBF) { - checkUTF8EscapeSequence(value); - return true; + return checkUTF8EscapeSequence(value); } } return PDFDocEncoding.isPDFDocEncodingString(value);
🧹 Nitpick comments (3)
src/main/java/org/verapdf/as/warnings/StringWarnings.java (1)
3-7: Consider following utility class best practices.The class contains only static constants but can still be instantiated or subclassed. Consider making the class
finaland adding a private constructor to prevent instantiation, following standard utility class patterns.🔎 View suggested enhancement:
-public class StringWarnings { +public final class StringWarnings { + private StringWarnings() { + // Prevent instantiation + } + public static final String NOT_ASCII_LETTER = "Text string language escape sequence contains not ASCII letter"; public static final String INVALID_LANGUAGE_ESCAPE_SEQUENCE_LENGTH = "Text string language escape sequence has invalid length"; public static final String NOT_SUPPORTED_UTF16LE_ENCODING = "String object uses encoding UTF16-LE not supported by PDF"; }src/main/java/org/verapdf/cos/COSString.java (2)
261-261: Add space after comma.- LOGGER.log(Level.WARNING,StringWarnings.INVALID_LANGUAGE_ESCAPE_SEQUENCE_LENGTH); + LOGGER.log(Level.WARNING, StringWarnings.INVALID_LANGUAGE_ESCAPE_SEQUENCE_LENGTH);
292-292: Add space after comma.- LOGGER.log(Level.WARNING,StringWarnings.INVALID_LANGUAGE_ESCAPE_SEQUENCE_LENGTH); + LOGGER.log(Level.WARNING, StringWarnings.INVALID_LANGUAGE_ESCAPE_SEQUENCE_LENGTH);
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/main/java/org/verapdf/as/warnings/StringWarnings.java(1 hunks)src/main/java/org/verapdf/cos/COSString.java(3 hunks)
🔇 Additional comments (5)
src/main/java/org/verapdf/cos/COSString.java (5)
23-23: LGTM!Import correctly added to reference the new warning constants.
115-115: LGTM!Refactoring to use a centralized constant improves maintainability.
268-297: UTF-8 escape sequence validation logic looks correct.The validation properly handles UTF-8 encoding where ASCII characters are single bytes. The pattern matching for 2 or 4 letter sequences followed by ESC markers is appropriate.
299-302: LGTM!The helper method correctly identifies ASCII letters (A-Z, a-z) by checking the appropriate hex ranges.
238-266: Fix minor formatting issue: add space after comma in log statement (line 261)Line 261 is missing a space after the comma in the log statement:
LOGGER.log(Level.WARNING,StringWarnings.INVALID_LANGUAGE_ESCAPE_SEQUENCE_LENGTH);should beLOGGER.log(Level.WARNING, StringWarnings.INVALID_LANGUAGE_ESCAPE_SEQUENCE_LENGTH);
Summary by CodeRabbit
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.