Add analysis cache #3204

smowton · 2018-10-18T15:03:42Z

This adds an analysis cache, which is styled after the LLVM pass manager, and is intended to allow passes (such as goto-convert, remove-returns, etc) to create and share analyses they use on demand.

The cache is owned by goto_modelt (and borrowed by goto_model_functiont for function-level passes). Intended use: a cache that uses analysis my_analysist might call goto_model.get_analysis_cache().get_or_create<my_analysist>(constructor_args...). This creates the analysis on first usage, or returns an existing one if present. Analyses can be destroyed if a pass invalidates one using object_cachet::erase.

Analyses are currently assumed to be module-global, but we could extend this with per-function analyses in future. Only one analysis can be stored per type in order to avoid brittle string-typed keys or similar.

The real work here is done by @JohnDumbell, I've just cleaned it up a little to get it in front of reviewers while he's on holiday.

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
~~My commit message includes data points confirming performance improvements (if claimed).~~
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

object_cachet is a cache for heavy-to-initalize data structures, which we use to share such objects across passes. The class_hierarchy is one instance that benefits from being created once then updating if necessary.

And also the move constructors/operators which didn't have the default implementations.

No functional changes here, just fixing a missing end-of-file newline.

By using the analysis cache we can avoid recomputing it unnecessarily, which can be very expensive for some symbol tables.

…xceptions and remove_instanceof.

NathanJPhillips · 2018-10-18T15:18:09Z

jbmc/src/java_bytecode/remove_instanceof.cpp

-  rem.lower_instanceof(function.body);
-}
+  symbol_table_baset &symbol_table = goto_model_function.get_symbol_table();
+  const class_hierarchyt &class_hierarchy(


Unusual to initialise a reference using brackets rather than = (repeated below)

NathanJPhillips · 2018-10-18T15:22:14Z

src/goto-programs/goto_model.h

+    const goto_functionst &goto_functions)
+    : symbol_table(symbol_table),
+      goto_functions(goto_functions),
+      analysis_cache()


Unnecessary change - suggest you leave the default construction implicit.

NathanJPhillips · 2018-10-18T15:22:47Z

src/goto-programs/goto_model.h

-  goto_modelt(goto_modelt &&other):
-    symbol_table(std::move(other.symbol_table)),
-    goto_functions(std::move(other.goto_functions))
+  goto_modelt(goto_modelt &&other)


Perhaps just replace with = default (same for move assignment)

NathanJPhillips · 2018-10-18T15:25:11Z

src/util/object_cache.h

+  template <typename cached_type, typename... arguments>
+  cached_type &get_or_create(arguments &&... values)
+  {
+    // We're key'd off the current templates type.


"The key for this object is the current template's type"

NathanJPhillips · 2018-10-18T15:26:00Z

src/util/object_cache.h

+  cached_type &get_or_create(arguments &&... values)
+  {
+    // We're key'd off the current templates type.
+    const class_idt current_type_id = &cached_type_idt<cached_type>::id;


Nit: class_id would match type name better than current_type_id (also in erase below).

tautschnig · 2018-10-18T15:42:43Z

I have read through the code and I don't think there is enough information for me to judge whether what is being done here is the most appropriate approach, mainly because I don't know what problem is being solved. I'll thus leave this to others who may have more context.

smowton · 2018-10-19T07:55:38Z

@tautschnig with a large symbol table (specifically lots of class types) it can be expensive to repeatedly recreate class_hierarchyt. We could just parameterise remove-exceptions and remove-instanceof to take one as an argument and have the driver program create it, but we thought it would be generally commonplace to want to share a particular analysis across passes, or across different functions, hence introducing a place to put such analyses. We figured it should live on goto_modelt, though it could live alongside it if you prefer.

tautschnig · 2018-10-19T08:17:51Z

@smowton Many thanks for the additional info. I would like to leave it to others to make the call on this, but from my very limited point of view the proposed changes create a maintenance problem: there is no data or regression test included, and the design decision to create this very general store seems somewhat arbitrary. If anyone is to revisit this bit of code in future, how would they know they aren't breaking anything? Specifically, to me the new member of goto_modelt is a bit of a zombie (for the lack of a better word), because I wouldn't dare touching it in any way as I neither understand why it needs to be there nor what implications changes to it might have. It was certainly the case that while reviewing I asked myself several times why the class_hierarchyt wasn't just being passed around, which would be a change that is somewhat local and therefore easy to understand even for people like myself.

smowton · 2018-10-19T08:33:09Z

I don't mind if it's a member of goto-model or an additional parameter to passes, but my reason for a general store is to avoid having to alter every driver program in order to share an analysis. For example, compare:

driver {
  call_grapht cg(goto_model.goto_functions);
  control_flow_grapht cfg(goto_model.goto_functions);
  class_hierarchyt ch(goto_model.symbol_table);

  pass1(cg, cfg, goto_model);
  pass2(cfg, goto_model);
  pass3(cg, ch, goto_model);
}

Whereas with a generic store there's no need to alter the driver program:

driver {
  pass1(goto_model); // or goto_model, analysis_cache
  pass2(goto_model);
  pass3(goto_model);
}

pass1(goto_modelt &goto_model) {
  const call_grapht &cg = goto_model.get_analysis_cache().get_or_create<call_grapht>(...);
  const class_hierarchyt &ch = goto_model.get_analysis_cache().get_or_create<class_hierarchyt>(...);

  do_pass1(goto_model, cg, ch);
}

In short it avoids introducing spaghetti, duplicated per driver program, into the process_goto_function routine. It also gives the (global) analysis a place to live in between calls to process_goto_function (singular).

smowton · 2018-10-19T08:42:28Z

(NB. obviously fully developed this concept would include analyses that use other analyses, like reaching-definitions + dependency-graph for example)

tautschnig · 2018-10-19T09:54:12Z

Whereas with a generic store there's no need to alter the driver program: [...]

We might be touching upon matters of taste here when I say that I find precisely defined interfaces a valuable source of information? Taking it to the extreme, I'd feel tempted to rename goto_modelt to worldt and make it the everything-store? I am of course exaggerating here, but I find it much easier to debug a problem when I know very precisely which parts of the codebase may or may not touch a particular part of the state. Passing in the everything-store of course reduces friction at the interface, but makes it hard to tell when and where an object is being modified.

It also gives the (global) analysis a place to live [...]

It surely does, but then I might question whether that's the right place? I can see that specifically the class hierarchy would be something that could be maintained/cached together with the other parts that make up a goto_modelt as it is syntactic information about the input program. But I wouldn't want to say the same for various other analysis, let alone anything deduced from doing symbolic execution.

Having said the above, a more serious concern arises: consistency between the members of a goto_modelt is strictly required, and several recent PRs starting from #3123 try to sanity check this. Adding more elements to goto_modelt will require additional sanity checks, and the proposed model does not make it easy to actually accomplish this (because the cache is very much anonymous). Consider passes that transform either the symbol table or goto functions, or both of them. Such include lazy loading. What do any of those transformations mean for the analysis cache? Is it invalidated? Is all of it invalidated? Who takes care of doing so?

smowton · 2018-10-19T10:05:23Z

So the LLVM approach is for transform passes to nominate analyses that they promise they don't perturb -- if I remember rightly the syntax is something like passManger.preserves<SomeAnalysis>(). Of course this requires regularisation of transform passes to make sure anything not preserved is invalidated. With class_hierarchyt we're in the fortunate position that nothing messes with the class graph after the initial load, but this is not true of e.g. call_grapht.

smowton · 2018-10-19T10:07:21Z

In any case it sounds like you're in favour of explicit dependencies at the pass interface level, so shall we just do that? Unless @kroening or @peterschrammel want to give an opinion about how you'd like to see analysis sharing done?

tautschnig · 2018-10-19T10:26:12Z

Yes, I am in favour of explicit dependencies. But that could well just be me.

martin-cs · 2018-10-19T15:54:18Z

@tautschnig it's not just you. I very much see (and like) the use case; it makes no sense to keep recomputing these things. One of the things that I'm deliberately doing in ASVAT, is creating analysis result objects (with const goto_modelt &'s) to avoid this.

However, I think I'm with Michael w.r.t. where this should go, we might as well s/goto_modelt/worldt/ which is really the same as making everything global. I think explicit results objects and keeping things const-clean is the way forward. Given that we don't yet have a good handle on "normal forms" of goto-programs and structural invariants I worry that explict annotations of which transforms preserve which analysis is over-ambitious.

I see that there is a bit of a difference between us and LLVM here. While they are very much a program transformation pipeline and the whole point is to change, modify and transform the program, our tools tend to follow a "transform out the hard stuff, instrument, then have the program very fixed while we do the actual analysis" pattern.

smowton · 2018-10-22T11:31:01Z

@tautschnig @martin-cs @NathanJPhillips please see #3216 for a simpler alternative to this PR.

smowton · 2018-10-24T09:42:56Z

Replaced by #3216

JohnDumbell added 6 commits October 18, 2018 16:01

Adding object_cachet

7f40646

object_cachet is a cache for heavy-to-initalize data structures, which we use to share such objects across passes. The class_hierarchy is one instance that benefits from being created once then updating if necessary.

Adding new constructor to class_hierarchy

bbc2701

And also the move constructors/operators which didn't have the default implementations.

Fix whitespace error

aca4040

No functional changes here, just fixing a missing end-of-file newline.

remove_instanceof: compute the class hierarchy on demand

8148adc

By using the analysis cache we can avoid recomputing it unnecessarily, which can be very expensive for some symbol tables.

remove_exceptions: compute the class hierarchy on demand

bf6fdec

By using the analysis cache we can avoid recomputing it unnecessarily, which can be very expensive for some symbol tables.

Update test to use goto_modelt and goto_model_functiont with remove_e…

71ea34d

…xceptions and remove_instanceof.

smowton requested review from NathanJPhillips, cesaro, cristina-david, kroening, peterschrammel, pkesseli, tautschnig and thk123 as code owners October 18, 2018 15:03

NathanJPhillips approved these changes Oct 18, 2018

View reviewed changes

smowton mentioned this pull request Oct 22, 2018

Share a common class_hierarchyt instance across multiple users #3216

Merged

smowton closed this Oct 24, 2018

Add analysis cache #3204

Add analysis cache #3204

Uh oh!

Conversation

smowton commented Oct 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanJPhillips Oct 18, 2018

Choose a reason for hiding this comment

Uh oh!

NathanJPhillips Oct 18, 2018

Choose a reason for hiding this comment

Uh oh!

NathanJPhillips Oct 18, 2018

Choose a reason for hiding this comment

Uh oh!

NathanJPhillips Oct 18, 2018

Choose a reason for hiding this comment

Uh oh!

NathanJPhillips Oct 18, 2018

Choose a reason for hiding this comment

Uh oh!

tautschnig commented Oct 18, 2018

Uh oh!

smowton commented Oct 19, 2018

Uh oh!

tautschnig commented Oct 19, 2018

Uh oh!

smowton commented Oct 19, 2018

Uh oh!

smowton commented Oct 19, 2018

Uh oh!

tautschnig commented Oct 19, 2018

Uh oh!

smowton commented Oct 19, 2018

Uh oh!

smowton commented Oct 19, 2018

Uh oh!

tautschnig commented Oct 19, 2018

Uh oh!

martin-cs commented Oct 19, 2018

Uh oh!

smowton commented Oct 22, 2018

Uh oh!

smowton commented Oct 24, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

smowton commented Oct 18, 2018 •

edited

Loading