-
Notifications
You must be signed in to change notification settings - Fork 277
String solver: avoid cubic formulas for string-to-int conversions #5226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String solver: avoid cubic formulas for string-to-int conversions #5226
Conversation
|
||
if(!digit_constraints.empty()) | ||
if(no_overflow.has_value()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have the size - 1 >= max_string_length - 2 || radix_ul == 0
check here and build the no_overflow
conjunction inside a5
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT it wants to assert about the old sum, I'd guess that's why it was this way to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix sounds reasonable but 1. I don't know the string solver internals that well and 2. this doesn't sound like a very good design and the "unknown radix" part has me concerned, so, YMMV.
{ | ||
const implies_exprt a5(premise, conjunction(digit_constraints)); | ||
const binary_predicate_exprt string_length_gte_size{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ why gte and not ge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just habit to call this "Greater-Than or Equal", but I see we use ge elsewhere
/*symbol_exprt result = | ||
fresh_symbol("string_to_int_no_overflow_check", no_overflow.type()); | ||
constraints.existential.push_back(equal_exprt(result, no_overflow)); | ||
digit_constraints.push_back(result);*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't leave commented code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops will fix
const implies_exprt a5(premise, conjunction(digit_constraints)); | ||
const binary_predicate_exprt string_length_gte_size{ | ||
length_expr, ID_ge, from_integer(size, length_expr.type())}; | ||
const implies_exprt a5(string_length_gte_size, *no_overflow); | ||
constraints.existential.push_back(a5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the corresponding documentation needs to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
7651787
to
576b9f4
Compare
Previously when the string radix was unknown we generated a conjunction of no-overflow checks which was (in total) cubic in the number of digits in the string being converted to an integer. For instance, it would define a series of 'sum's, like: sum0 = digit0 sum1 = digit0 * radix + digit1 sum2 = (digit0 * radix + digit1) * radix + digit2 Where the definitions are truly inlined like this and don't use symbols to represent sum0...n, so |sumn| is O(n), and the sum of all sums is O(n^2) Then for each digit it would emit an overflow check: constraint2 = size = 2 => nooverflow(sum1) constraint3 = size = 3 => nooverflow(sum1) && nooverflow(sum2) constraint4 = size = 4 => nooverflow(sum1) && nooverflow(sum2) && nooverflow(sum3) However those sums are again expanded inline, meaning the sum of all constraints is O(n^3) The obvious fix is to define symbols for sum0... sumn and then constraint0... constraintn, but this slows the solver down considerably for small formulas (e.g. Byte.parseByte, with max size 3 digits), so I compromise here by replacing the constraints with: constraint2 = size >= 2 => nooverflow(sum1) constraint3 = size >= 3 => nooverflow(sum2) constraint4 = size >= 4 => nooverflow(sum3) Thus the formula size is quadratic not cubic but still fast for cases where the radix is known.
Previously we passed the input to parseInt/parseLong straight to the built-in string solver functions, meaning it was possible to crash jbmc by passing an out-of-range constant radix or for the solver to produce crazy results when it regarded the radix as a free choice. Now the models apply the constraint that it falls in a sensible range. I add a test for parseint with a free radix, but am unable to add one for parselong yet as it takes too long to solve.
576b9f4
to
b8e774c
Compare
Codecov Report
@@ Coverage Diff @@
## develop #5226 +/- ##
===========================================
- Coverage 67.43% 67.43% -0.01%
===========================================
Files 1157 1157
Lines 95354 95352 -2
===========================================
- Hits 64306 64302 -4
- Misses 31048 31050 +2
Continue to review full report at Codecov.
|
@@ -0,0 +1,18 @@ | |||
public class Test { | |||
|
|||
public static void testMe(String input, int radix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird formatting
@@ -0,0 +1,18 @@ | |||
public class Test { | |||
|
|||
public static void testMe(String input, int radix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird formatting
@@ -1574,6 +1574,12 @@ void java_string_library_preprocesst::initialize_conversion_table() | |||
std::placeholders::_2, | |||
std::placeholders::_3, | |||
std::placeholders::_4); | |||
cprover_equivalent_to_java_function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Previously when the string radix was unknown we generated a conjunction of no-overflow checks
which was (in total) cubic in the number of digits in the string being converted to an integer.
For instance, it would define a series of 'sum's, like:
Where the definitions are truly inlined like this and don't use symbols to represent sum0...n,
so |sumn| is O(n), and the sum of all sums is O(n^2)
Then for each digit it would emit an overflow check:
However those sums are again expanded inline, meaning the sum of all constraints is O(n^3)
The obvious fix is to define symbols for sum0... sumn and then constraint0... constraintn,
but this slows the solver down considerably for small formulas (e.g. Byte.parseByte, with
max size 3 digits), so I compromise here by replacing the constraints with:
Thus the formula size is quadratic not cubic but still fast for cases where the radix is known.