Skip to content

Conversation

@smowton
Copy link
Contributor

@smowton smowton commented Oct 18, 2018

This adds an analysis cache, which is styled after the LLVM pass manager, and is intended to allow passes (such as goto-convert, remove-returns, etc) to create and share analyses they use on demand.

The cache is owned by goto_modelt (and borrowed by goto_model_functiont for function-level passes). Intended use: a cache that uses analysis my_analysist might call goto_model.get_analysis_cache().get_or_create<my_analysist>(constructor_args...). This creates the analysis on first usage, or returns an existing one if present. Analyses can be destroyed if a pass invalidates one using object_cachet::erase.

Analyses are currently assumed to be module-global, but we could extend this with per-function analyses in future. Only one analysis can be stored per type in order to avoid brittle string-typed keys or similar.

The real work here is done by @JohnDumbell, I've just cleaned it up a little to get it in front of reviewers while he's on holiday.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

object_cachet is a cache for heavy-to-initalize data structures, which we use to share such objects across passes. The class_hierarchy is one instance that benefits from being created once then updating if necessary.
And also the move constructors/operators which didn't have the default implementations.
No functional changes here, just fixing a missing end-of-file newline.
By using the analysis cache we can avoid recomputing it unnecessarily, which can be very expensive for some symbol tables.
By using the analysis cache we can avoid recomputing it unnecessarily, which can be very expensive for some symbol tables.
rem.lower_instanceof(function.body);
}
symbol_table_baset &symbol_table = goto_model_function.get_symbol_table();
const class_hierarchyt &class_hierarchy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unusual to initialise a reference using brackets rather than = (repeated below)

const goto_functionst &goto_functions)
: symbol_table(symbol_table),
goto_functions(goto_functions),
analysis_cache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary change - suggest you leave the default construction implicit.

goto_modelt(goto_modelt &&other):
symbol_table(std::move(other.symbol_table)),
goto_functions(std::move(other.goto_functions))
goto_modelt(goto_modelt &&other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps just replace with = default (same for move assignment)

template <typename cached_type, typename... arguments>
cached_type &get_or_create(arguments &&... values)
{
// We're key'd off the current templates type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The key for this object is the current template's type"

cached_type &get_or_create(arguments &&... values)
{
// We're key'd off the current templates type.
const class_idt current_type_id = &cached_type_idt<cached_type>::id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: class_id would match type name better than current_type_id (also in erase below).

@tautschnig
Copy link
Collaborator

I have read through the code and I don't think there is enough information for me to judge whether what is being done here is the most appropriate approach, mainly because I don't know what problem is being solved. I'll thus leave this to others who may have more context.

@smowton
Copy link
Contributor Author

smowton commented Oct 19, 2018

@tautschnig with a large symbol table (specifically lots of class types) it can be expensive to repeatedly recreate class_hierarchyt. We could just parameterise remove-exceptions and remove-instanceof to take one as an argument and have the driver program create it, but we thought it would be generally commonplace to want to share a particular analysis across passes, or across different functions, hence introducing a place to put such analyses. We figured it should live on goto_modelt, though it could live alongside it if you prefer.

@tautschnig
Copy link
Collaborator

@smowton Many thanks for the additional info. I would like to leave it to others to make the call on this, but from my very limited point of view the proposed changes create a maintenance problem: there is no data or regression test included, and the design decision to create this very general store seems somewhat arbitrary. If anyone is to revisit this bit of code in future, how would they know they aren't breaking anything? Specifically, to me the new member of goto_modelt is a bit of a zombie (for the lack of a better word), because I wouldn't dare touching it in any way as I neither understand why it needs to be there nor what implications changes to it might have. It was certainly the case that while reviewing I asked myself several times why the class_hierarchyt wasn't just being passed around, which would be a change that is somewhat local and therefore easy to understand even for people like myself.

@smowton
Copy link
Contributor Author

smowton commented Oct 19, 2018

I don't mind if it's a member of goto-model or an additional parameter to passes, but my reason for a general store is to avoid having to alter every driver program in order to share an analysis. For example, compare:

driver {
  call_grapht cg(goto_model.goto_functions);
  control_flow_grapht cfg(goto_model.goto_functions);
  class_hierarchyt ch(goto_model.symbol_table);

  pass1(cg, cfg, goto_model);
  pass2(cfg, goto_model);
  pass3(cg, ch, goto_model);
}

Whereas with a generic store there's no need to alter the driver program:

driver {
  pass1(goto_model); // or goto_model, analysis_cache
  pass2(goto_model);
  pass3(goto_model);
}

pass1(goto_modelt &goto_model) {
  const call_grapht &cg = goto_model.get_analysis_cache().get_or_create<call_grapht>(...);
  const class_hierarchyt &ch = goto_model.get_analysis_cache().get_or_create<class_hierarchyt>(...);

  do_pass1(goto_model, cg, ch);
}

In short it avoids introducing spaghetti, duplicated per driver program, into the process_goto_function routine. It also gives the (global) analysis a place to live in between calls to process_goto_function (singular).

@smowton
Copy link
Contributor Author

smowton commented Oct 19, 2018

(NB. obviously fully developed this concept would include analyses that use other analyses, like reaching-definitions + dependency-graph for example)

@tautschnig
Copy link
Collaborator

Whereas with a generic store there's no need to alter the driver program: [...]

We might be touching upon matters of taste here when I say that I find precisely defined interfaces a valuable source of information? Taking it to the extreme, I'd feel tempted to rename goto_modelt to worldt and make it the everything-store? I am of course exaggerating here, but I find it much easier to debug a problem when I know very precisely which parts of the codebase may or may not touch a particular part of the state. Passing in the everything-store of course reduces friction at the interface, but makes it hard to tell when and where an object is being modified.

It also gives the (global) analysis a place to live [...]

It surely does, but then I might question whether that's the right place? I can see that specifically the class hierarchy would be something that could be maintained/cached together with the other parts that make up a goto_modelt as it is syntactic information about the input program. But I wouldn't want to say the same for various other analysis, let alone anything deduced from doing symbolic execution.

Having said the above, a more serious concern arises: consistency between the members of a goto_modelt is strictly required, and several recent PRs starting from #3123 try to sanity check this. Adding more elements to goto_modelt will require additional sanity checks, and the proposed model does not make it easy to actually accomplish this (because the cache is very much anonymous). Consider passes that transform either the symbol table or goto functions, or both of them. Such include lazy loading. What do any of those transformations mean for the analysis cache? Is it invalidated? Is all of it invalidated? Who takes care of doing so?

@smowton
Copy link
Contributor Author

smowton commented Oct 19, 2018

So the LLVM approach is for transform passes to nominate analyses that they promise they don't perturb -- if I remember rightly the syntax is something like passManger.preserves<SomeAnalysis>(). Of course this requires regularisation of transform passes to make sure anything not preserved is invalidated. With class_hierarchyt we're in the fortunate position that nothing messes with the class graph after the initial load, but this is not true of e.g. call_grapht.

@smowton
Copy link
Contributor Author

smowton commented Oct 19, 2018

In any case it sounds like you're in favour of explicit dependencies at the pass interface level, so shall we just do that? Unless @kroening or @peterschrammel want to give an opinion about how you'd like to see analysis sharing done?

@tautschnig
Copy link
Collaborator

Yes, I am in favour of explicit dependencies. But that could well just be me.

@martin-cs
Copy link
Collaborator

@tautschnig it's not just you. I very much see (and like) the use case; it makes no sense to keep recomputing these things. One of the things that I'm deliberately doing in ASVAT, is creating analysis result objects (with const goto_modelt &'s) to avoid this.

However, I think I'm with Michael w.r.t. where this should go, we might as well s/goto_modelt/worldt/ which is really the same as making everything global. I think explicit results objects and keeping things const-clean is the way forward. Given that we don't yet have a good handle on "normal forms" of goto-programs and structural invariants I worry that explict annotations of which transforms preserve which analysis is over-ambitious.

I see that there is a bit of a difference between us and LLVM here. While they are very much a program transformation pipeline and the whole point is to change, modify and transform the program, our tools tend to follow a "transform out the hard stuff, instrument, then have the program very fixed while we do the actual analysis" pattern.

@smowton
Copy link
Contributor Author

smowton commented Oct 22, 2018

@tautschnig @martin-cs @NathanJPhillips please see #3216 for a simpler alternative to this PR.

@smowton
Copy link
Contributor Author

smowton commented Oct 24, 2018

Replaced by #3216

@smowton smowton closed this Oct 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants