String solver: avoid cubic formulas for string-to-int conversions #5226

smowton · 2020-02-06T15:03:17Z

Previously when the string radix was unknown we generated a conjunction of no-overflow checks
which was (in total) cubic in the number of digits in the string being converted to an integer.

For instance, it would define a series of 'sum's, like:

sum0 = digit0
sum1 = digit0 * radix + digit1
sum2 = (digit0 * radix + digit1) * radix + digit2

Where the definitions are truly inlined like this and don't use symbols to represent sum0...n,
so |sumn| is O(n), and the sum of all sums is O(n^2)

Then for each digit it would emit an overflow check:

constraint2 = size = 2 => nooverflow(sum1)
constraint3 = size = 3 => nooverflow(sum1) && nooverflow(sum2)
constraint4 = size = 4 => nooverflow(sum1) && nooverflow(sum2) && nooverflow(sum3)

However those sums are again expanded inline, meaning the sum of all constraints is O(n^3)

The obvious fix is to define symbols for sum0... sumn and then constraint0... constraintn,
but this slows the solver down considerably for small formulas (e.g. Byte.parseByte, with
max size 3 digits), so I compromise here by replacing the constraints with:

constraint2 = size >= 2 => nooverflow(sum1)
constraint3 = size >= 3 => nooverflow(sum2)
constraint4 = size >= 4 => nooverflow(sum3)

Thus the formula size is quadratic not cubic but still fast for cases where the radix is known.

xbauch · 2020-02-06T15:12:38Z

src/solvers/strings/string_constraint_generator_valueof.cpp


-    if(!digit_constraints.empty())
+    if(no_overflow.has_value())


Why not have the size - 1 >= max_string_length - 2 || radix_ul == 0 check here and build the no_overflow conjunction inside a5?

AFAICT it wants to assert about the old sum, I'd guess that's why it was this way to begin with.

martin-cs

Fix sounds reasonable but 1. I don't know the string solver internals that well and 2. this doesn't sound like a very good design and the "unknown radix" part has me concerned, so, YMMV.

romainbrenguier · 2020-02-06T15:53:14Z

src/solvers/strings/string_constraint_generator_valueof.cpp

    {
-      const implies_exprt a5(premise, conjunction(digit_constraints));
+      const binary_predicate_exprt string_length_gte_size{


⛏️ why gte and not ge?

Just habit to call this "Greater-Than or Equal", but I see we use ge elsewhere

romainbrenguier · 2020-02-06T15:53:33Z

src/solvers/strings/string_constraint_generator_valueof.cpp

+      /*symbol_exprt result =
+        fresh_symbol("string_to_int_no_overflow_check", no_overflow.type());
+      constraints.existential.push_back(equal_exprt(result, no_overflow));
+      digit_constraints.push_back(result);*/


don't leave commented code

Ooops will fix

romainbrenguier · 2020-02-06T15:56:32Z

src/solvers/strings/string_constraint_generator_valueof.cpp

-      const implies_exprt a5(premise, conjunction(digit_constraints));
+      const binary_predicate_exprt string_length_gte_size{
+        length_expr, ID_ge, from_integer(size, length_expr.type())};
+      const implies_exprt a5(string_length_gte_size, *no_overflow);
      constraints.existential.push_back(a5);


the corresponding documentation needs to be updated

Previously when the string radix was unknown we generated a conjunction of no-overflow checks which was (in total) cubic in the number of digits in the string being converted to an integer. For instance, it would define a series of 'sum's, like: sum0 = digit0 sum1 = digit0 * radix + digit1 sum2 = (digit0 * radix + digit1) * radix + digit2 Where the definitions are truly inlined like this and don't use symbols to represent sum0...n, so |sumn| is O(n), and the sum of all sums is O(n^2) Then for each digit it would emit an overflow check: constraint2 = size = 2 => nooverflow(sum1) constraint3 = size = 3 => nooverflow(sum1) && nooverflow(sum2) constraint4 = size = 4 => nooverflow(sum1) && nooverflow(sum2) && nooverflow(sum3) However those sums are again expanded inline, meaning the sum of all constraints is O(n^3) The obvious fix is to define symbols for sum0... sumn and then constraint0... constraintn, but this slows the solver down considerably for small formulas (e.g. Byte.parseByte, with max size 3 digits), so I compromise here by replacing the constraints with: constraint2 = size >= 2 => nooverflow(sum1) constraint3 = size >= 3 => nooverflow(sum2) constraint4 = size >= 4 => nooverflow(sum3) Thus the formula size is quadratic not cubic but still fast for cases where the radix is known.

Previously we passed the input to parseInt/parseLong straight to the built-in string solver functions, meaning it was possible to crash jbmc by passing an out-of-range constant radix or for the solver to produce crazy results when it regarded the radix as a free choice. Now the models apply the constraint that it falls in a sensible range. I add a test for parseint with a free radix, but am unable to add one for parselong yet as it takes too long to solve.

codecov-io · 2020-02-06T18:13:10Z

Codecov Report

Merging #5226 into develop will decrease coverage by <.01%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           develop    #5226      +/-   ##
===========================================
- Coverage    67.43%   67.43%   -0.01%     
===========================================
  Files         1157     1157              
  Lines        95354    95352       -2     
===========================================
- Hits         64306    64302       -4     
- Misses       31048    31050       +2

Flag	Coverage Δ
#cproversmt2	`42.67% <0%> (ø)`	⬆️
#regression	`63.96% <100%> (-0.01%)`	⬇️
#unit	`31.95% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
...c/java_bytecode/java_string_library_preprocess.cpp	`96.81% <100%> (-0.02%)`	⬇️
...rs/strings/string_constraint_generator_valueof.cpp	`100% <100%> (ø)`	⬆️
src/solvers/strings/string_builtin_function.h	`74.39% <0%> (-2.44%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cccdfd9...b8e774c. Read the comment docs.

peterschrammel · 2020-02-07T16:36:10Z

jbmc/regression/strings-smoke-tests/java_parseint_unknown_radix/Test.java

@@ -0,0 +1,18 @@
+public class Test {
+
+    public static void testMe(String input, int radix) {


weird formatting

peterschrammel · 2020-02-07T16:36:33Z

jbmc/regression/strings-smoke-tests/java_parselong_unknown_radix/Test.java

@@ -0,0 +1,18 @@
+public class Test {
+
+    public static void testMe(String input, int radix) {


weird formatting

peterschrammel · 2020-02-07T16:37:09Z

jbmc/src/java_bytecode/java_string_library_preprocess.cpp

@@ -1574,6 +1574,12 @@ void java_string_library_preprocesst::initialize_conversion_table()
        std::placeholders::_2,
        std::placeholders::_3,
        std::placeholders::_4);
+  cprover_equivalent_to_java_function


smowton requested review from martin-cs, peterschrammel and romainbrenguier as code owners February 6, 2020 15:03

smowton requested a review from allredj February 6, 2020 15:03

xbauch reviewed Feb 6, 2020

View reviewed changes

martin-cs approved these changes Feb 6, 2020

View reviewed changes

romainbrenguier approved these changes Feb 6, 2020

View reviewed changes

smowton requested review from cesaro, chrisr-diffblue, cristina-david, kroening, pkesseli, tautschnig and thk123 as code owners February 6, 2020 16:54

smowton force-pushed the smowton/feature/faster-string-to-int branch from 7651787 to 576b9f4 Compare February 6, 2020 16:56

smowton added 2 commits February 6, 2020 16:59

smowton force-pushed the smowton/feature/faster-string-to-int branch from 576b9f4 to b8e774c Compare February 6, 2020 16:59

peterschrammel approved these changes Feb 7, 2020

View reviewed changes

smowton merged commit 778fb97 into diffblue:develop Feb 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

String solver: avoid cubic formulas for string-to-int conversions #5226

String solver: avoid cubic formulas for string-to-int conversions #5226

Uh oh!

smowton commented Feb 6, 2020

Uh oh!

xbauch Feb 6, 2020

Uh oh!

smowton Feb 6, 2020

Uh oh!

martin-cs left a comment

Uh oh!

romainbrenguier Feb 6, 2020

Uh oh!

smowton Feb 6, 2020

Uh oh!

romainbrenguier Feb 6, 2020

Uh oh!

smowton Feb 6, 2020

Uh oh!

romainbrenguier Feb 6, 2020

Uh oh!

smowton Feb 6, 2020

Uh oh!

codecov-io commented Feb 6, 2020 •

edited

Loading

Uh oh!

peterschrammel Feb 7, 2020

Uh oh!

peterschrammel Feb 7, 2020

Uh oh!

peterschrammel Feb 7, 2020

Uh oh!

Uh oh!

		@@ -0,0 +1,18 @@
		public class Test {

		public static void testMe(String input, int radix) {

String solver: avoid cubic formulas for string-to-int conversions #5226

String solver: avoid cubic formulas for string-to-int conversions #5226

Uh oh!

Conversation

smowton commented Feb 6, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martin-cs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Feb 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-io commented Feb 6, 2020 •

edited

Loading