Description
It appears the logic in value_set_dereferencet::build_reference_to
can produce incorrect pointer_offset
expressions, leading to wrong offsets in array accesses.
Example:
struct java_style_array {
int length;
void** data;
}
data
will sometimes be treated generically (as void**
) and sometimes with a specific type (e.g. struct java.lang.String**
.
Array accesses are currently generated using expressions like:
(java.lang.String*)*(array->data + 2)
(pretty-printed as array->data[2]
)
Suppose the symex phase identifies the actual allocation stored in data
as dynamic_1_array
and the struct as a whole dynamic_object_2
. Then in response to this access it will notice (a) the actual-allocation and reference types don't match, so it must use the byte-extract path at value_set_dereference.cpp:585
, and (b) the actual referent is dynamic_1_array
.
So far so good: we'd like to see it emit something like (java.lang.String*)dynamic_1_array[2]
or less optimistically byte_extract(java.lang.String*, 2 * sizeof(java.lang.String*), dynamic_1_array)
.
Instead of that offset however we get pointer_offset(dynamic_object_2.data + 2)
. This is incorrect, as it evaluates to pointer_offset(dynamic_object_2.data) + (2 * sizeof(void*))
= 24, producing an off-by-one error indexing the array.
In its most general terms, I think the problem is that the value-set analysis can walk across object boundaries, thus mapping array.data[2]
to the underlying allocation for data
not array
, but the pointer_offset
expression built at value_set_dereference.cpp:590
does not take this into account. In this case it needed to follow the recursive walk over the expression performed by value_sett::get_value_set_rec
et al and note that since we had:
array -> dynamic_object_2
array->data -> dynamic_1_array
array->data[2] -> dynamic_1_array
...then we need to output dynamic_1_array + 2
rather than what amounts to dynamic_1_array + (&array->data - array) + 2
.
My test cases for this are based on my work in progress on Java arrays, so can't be run against master, but I will follow up with a C test-case if I can reproduce the same behaviour there.