Skip to content

String solver: avoid cubic formulas for string-to-int conversions #5226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Feb 6, 2020

Previously when the string radix was unknown we generated a conjunction of no-overflow checks
which was (in total) cubic in the number of digits in the string being converted to an integer.

For instance, it would define a series of 'sum's, like:

sum0 = digit0
sum1 = digit0 * radix + digit1
sum2 = (digit0 * radix + digit1) * radix + digit2

Where the definitions are truly inlined like this and don't use symbols to represent sum0...n,
so |sumn| is O(n), and the sum of all sums is O(n^2)

Then for each digit it would emit an overflow check:

constraint2 = size = 2 => nooverflow(sum1)
constraint3 = size = 3 => nooverflow(sum1) && nooverflow(sum2)
constraint4 = size = 4 => nooverflow(sum1) && nooverflow(sum2) && nooverflow(sum3)

However those sums are again expanded inline, meaning the sum of all constraints is O(n^3)

The obvious fix is to define symbols for sum0... sumn and then constraint0... constraintn,
but this slows the solver down considerably for small formulas (e.g. Byte.parseByte, with
max size 3 digits), so I compromise here by replacing the constraints with:

constraint2 = size >= 2 => nooverflow(sum1)
constraint3 = size >= 3 => nooverflow(sum2)
constraint4 = size >= 4 => nooverflow(sum3)

Thus the formula size is quadratic not cubic but still fast for cases where the radix is known.


if(!digit_constraints.empty())
if(no_overflow.has_value())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have the size - 1 >= max_string_length - 2 || radix_ul == 0 check here and build the no_overflow conjunction inside a5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT it wants to assert about the old sum, I'd guess that's why it was this way to begin with.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix sounds reasonable but 1. I don't know the string solver internals that well and 2. this doesn't sound like a very good design and the "unknown radix" part has me concerned, so, YMMV.

{
const implies_exprt a5(premise, conjunction(digit_constraints));
const binary_predicate_exprt string_length_gte_size{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ why gte and not ge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just habit to call this "Greater-Than or Equal", but I see we use ge elsewhere

/*symbol_exprt result =
fresh_symbol("string_to_int_no_overflow_check", no_overflow.type());
constraints.existential.push_back(equal_exprt(result, no_overflow));
digit_constraints.push_back(result);*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't leave commented code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops will fix

const implies_exprt a5(premise, conjunction(digit_constraints));
const binary_predicate_exprt string_length_gte_size{
length_expr, ID_ge, from_integer(size, length_expr.type())};
const implies_exprt a5(string_length_gte_size, *no_overflow);
constraints.existential.push_back(a5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the corresponding documentation needs to be updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Previously when the string radix was unknown we generated a conjunction of no-overflow checks
which was (in total) cubic in the number of digits in the string being converted to an integer.

For instance, it would define a series of 'sum's, like:
sum0 = digit0
sum1 = digit0 * radix + digit1
sum2 = (digit0 * radix + digit1) * radix + digit2

Where the definitions are truly inlined like this and don't use symbols to represent sum0...n,
so |sumn| is O(n), and the sum of all sums is O(n^2)

Then for each digit it would emit an overflow check:

constraint2 = size = 2 => nooverflow(sum1)
constraint3 = size = 3 => nooverflow(sum1) && nooverflow(sum2)
constraint4 = size = 4 => nooverflow(sum1) && nooverflow(sum2) && nooverflow(sum3)

However those sums are again expanded inline, meaning the sum of all constraints is O(n^3)

The obvious fix is to define symbols for sum0... sumn and then constraint0... constraintn,
but this slows the solver down considerably for small formulas (e.g. Byte.parseByte, with
max size 3 digits), so I compromise here by replacing the constraints with:

constraint2 = size >= 2 => nooverflow(sum1)
constraint3 = size >= 3 => nooverflow(sum2)
constraint4 = size >= 4 => nooverflow(sum3)

Thus the formula size is quadratic not cubic but still fast for cases where the radix is known.
Previously we passed the input to parseInt/parseLong straight to the built-in string solver functions,
meaning it was possible to crash jbmc by passing an out-of-range constant radix or for the solver to
produce crazy results when it regarded the radix as a free choice. Now the models apply the constraint
that it falls in a sensible range.

I add a test for parseint with a free radix, but am unable to add one for parselong yet as it takes
too long to solve.
@smowton smowton force-pushed the smowton/feature/faster-string-to-int branch from 576b9f4 to b8e774c Compare February 6, 2020 16:59
@codecov-io
Copy link

codecov-io commented Feb 6, 2020

Codecov Report

Merging #5226 into develop will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5226      +/-   ##
===========================================
- Coverage    67.43%   67.43%   -0.01%     
===========================================
  Files         1157     1157              
  Lines        95354    95352       -2     
===========================================
- Hits         64306    64302       -4     
- Misses       31048    31050       +2
Flag Coverage Δ
#cproversmt2 42.67% <0%> (ø) ⬆️
#regression 63.96% <100%> (-0.01%) ⬇️
#unit 31.95% <0%> (ø) ⬆️
Impacted Files Coverage Δ
...c/java_bytecode/java_string_library_preprocess.cpp 96.81% <100%> (-0.02%) ⬇️
...rs/strings/string_constraint_generator_valueof.cpp 100% <100%> (ø) ⬆️
src/solvers/strings/string_builtin_function.h 74.39% <0%> (-2.44%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cccdfd9...b8e774c. Read the comment docs.

@@ -0,0 +1,18 @@
public class Test {

public static void testMe(String input, int radix) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird formatting

@@ -0,0 +1,18 @@
public class Test {

public static void testMe(String input, int radix) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird formatting

@@ -1574,6 +1574,12 @@ void java_string_library_preprocesst::initialize_conversion_table()
std::placeholders::_2,
std::placeholders::_3,
std::placeholders::_4);
cprover_equivalent_to_java_function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@smowton smowton merged commit 778fb97 into diffblue:develop Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants