@@ -329,3 +329,181 @@ This stage concludes the *analysis-independent* program transformations.
329329\subsubsection goto-program-example-1-section Unsigned mult (unsigned a, unsigned b) { int acc, i; for (i = 0; i < b; i++) acc += a; return acc; }
330330
331331To be documented.
332+
333+
334+ \section section-goto-binary Binary Represenration
335+
336+ An instance of ` goto_modelt ` can be serialised to a binary stream (which is
337+ typically a file on the disk), and later deserialised from that stream back to
338+ an equivalent ` goto_modelt ` instance.
339+
340+ \subsection subsection-goto-binary-serialisation Serialisation
341+
342+ The serialisation is implemented in C++ modules:
343+ - ` write_goto_binary.h `
344+ - ` write_goto_binary.cpp `
345+
346+ To serialise a ` goto_modelt ` instance ` gm ` to a stream ` ostr ` call the function ` write_goto_binary ` , e.g. ` write_goto_binary(ostr, gm) ` .
347+
348+ The content of the written stream will have this strucutre:
349+ - The header:
350+ - A magic number: byte ` 0x7f ` followed by 3 characters ` GBF ` .
351+ - A version number written in the 7-bit encoding. Currently, only version ` 4 ` is supported.
352+ - The symbol table:
353+ - The number of symbols in the table in the 7-bit encoding.
354+ - The array of individual symbols in the table. Each written symbol ` s ` has this structure:
355+ - The ` irept ` instance ` s.type ` .
356+ - The ` irept ` instance ` s.value ` .
357+ - The ` irept ` instance ` s.location ` .
358+ - The string ` s.name ` .
359+ - The string ` s.module ` .
360+ - The string ` s.base_name ` .
361+ - The string ` s.mode ` .
362+ - The string ` s.pretty_name ` .
363+ - The number ` 0 ` in the 7-bit encoding.
364+ - The flags word in the 7-bit encoding. The bits in the flags word correspond to the following ` bool ` fields (from the most significant bit):
365+ - ` s.is_weak `
366+ - ` s.is_type `
367+ - ` s.is_property `
368+ - ` s.is_macro `
369+ - ` s.is_exported `
370+ - ` s.is_input `
371+ - ` s.is_output `
372+ - ` s.is_state_var `
373+ - ` s.is_parameter `
374+ - ` s.is_auxiliary `
375+ - ` 0 ` (corresponding to ` s.binding ` , i.e. we always clear this info)
376+ - ` s.is_lvalue `
377+ - ` s.is_static_lifetime `
378+ - ` s.is_thread_local `
379+ - ` s.is_file_local `
380+ - ` s.is_extern `
381+ - ` s.is_volatile `
382+ - The functions with bodies, i.e. those missing a body are skipped.
383+ - The number of functions with bodies in the 7-bit encoding.
384+ - The array of individual function with bodies. Each written function has this structure:
385+ - The string with the name of the function.
386+ - The number of instructions in the body of the function in the 7-bit encoding.
387+ - The array of individual instruction in function's body. Each written instruction ` I ` has this structure:
388+ - The ` irept ` instance ` I.code ` , i.e. data of the instruction, like arguments.
389+ - The string ` I.function ` , i.e. the name of the function this instruction belongs to.
390+ - The ` irept ` instance ` I.source_location ` , i.e. the reference to the original source code (file, line).
391+ - The word in the 7-bit encoding ` I.type ` , i.e. the op-code of the instruction.
392+ - The ` irept ` instance ` I.guard ` .
393+ - The empty string (representing former ` I.event ` ).
394+ - The word in the 7-bit encoding ` I.target_number ` , i.e. the jump target to this instruction from other instructions.
395+ - The word in the 7-bit encoding ` I.targets.size() ` , i.e. the count of jump targets from this instruction.
396+ - The array of individual jump targets from this instruction, each written as a word in the 7-bit encoding.
397+ - The word in the 7-bit encoding ` I.labels.size() ` .
398+ - The array of individual labels, each written as a word in the 7-bit encoding.
399+
400+ An important propery of the serialisation is that each serialised ` irept `
401+ instance occurs in the stream exactly once. Namely, in the position of
402+ its first serialisation query. All other such queries save only hash
403+ code (i.e. reference) of the ` irept ` instance.
404+
405+ The similar strategy is used for serialisation of string constants
406+ shared amongst ` irept ` instances. Such a string is fully saved only in
407+ the first serialisation query of an ` irept ` instance it appears in and
408+ all other queries only save its integer has code.
409+
410+ Details about serialisation of ` irept ` instances, strings, and words in
411+ 7-bit encoding can be found [ here] (\ref irep-serialization).
412+
413+ \subsection subsection-goto-binary-deserialisation Deserialisation
414+
415+ The deserialisation is implemented in C++ modules:
416+ - ` read_goto_binary.h `
417+ - ` read_goto_binary.cpp `
418+ - ` read_bin_goto_object.h `
419+ - ` read_bin_goto_object.cpp `
420+
421+ First two modules are responsible for location of the stream with the
422+ serialised data within a passed file. And the remaining two modules
423+ perform the actual deserialisation of ` goto_modelt ` instance from
424+ the located stream.
425+
426+ To deserialise a ` goto_modelt ` instance ` gm ` from a file ` /some/path/name.gbf `
427+ call the function ` read_goto_binary ` , e.g.
428+ ` read_goto_binary("/some/path/name.gbf", gm, message_handler) ` , where
429+ ` message_handler ` must be an instance of ` message_handlert ` and serves
430+ for reporting issues during the process.
431+
432+ The passed binary file is assumed to have the same structure as described in
433+ the [ previous subsection] (\ref subsection-goto-binary-serialisation).
434+ The process of the deserialisation does not involve any seeking in the file.
435+ The content is read linearly from the beginning to the end. ` irept ` instances
436+ and their string constats are deserialised into the memory only once at their
437+ first occurreces in the stream. All subsequent deserialisation queries are
438+ resolved as in-memory references to the ` irept ` instances and/or strings.
439+
440+ NOTE: The first deserialisation is detected so that the loaded hash code
441+ is new. That implies that the full definition follows right after the hash.
442+
443+ Details about serialisation of ` irept ` instances, strings, and words in
444+ 7-bit encoding can be found [ here] (\ref irep-serialization).
445+
446+ \subsubsection subsection-goto-binary-deserialisation-from-elf Deserialisation from ELF image
447+
448+ One can decide to store the serialised stream as a separate section, named
449+ ` goto-cc ` , into an ELF image. Then the deserialisation has a support of
450+ automatic detection of that section in an ELF file and the deserialisation
451+ will be automatically started from that section.
452+
453+ For reading the ELF images there is used an instance of ` elf_readert `
454+ implemented in the C++ module:
455+ - ` elf_reader.h `
456+ - ` elf_reader.cpp `
457+
458+ \subsubsection subsection-goto-binary-deserialisation-from-mac-o-fat-image Deserialisation from Mac-O fat image
459+
460+ One can decide to store the serialised stream into Mac-O fat image as a
461+ separate non-empty section with flags ` CPU_TYPE_HPPA ` and
462+ ` CPU_SUBTYPE_HPPA_7100LC ` . Then the deserialisation has a support of
463+ automatic detection of that section in a Mac-O fat image, extraction
464+ of the section from the emage into a temporary file (this is done by
465+ calling ` lipo ` utility with ` -thin hppa7100LC ` option), and the
466+ deserialisation will be automatically started from that temporary
467+ file.
468+
469+ For reading the Mac-O fat images there is used an instance of
470+ ` osx_fat_readert ` implemented in the C++ module:
471+ - ` osx_fat_reader.h `
472+ - ` osx_fat_reader.cpp `
473+
474+ NOTE: This functionality is available only when the modules are built
475+ on a MacOS machine.
476+
477+ \subsubsection subsection-goto-binary-is-binary-file Checking file type
478+
479+ You can use function ` is_goto_binary ` to check whether a passed file contains
480+ a sederialised ` goto_modelt ` instance or not. This is done by checking the
481+ magic number of the stream (see subsection
482+ [ Serialisation] (\ref subsection-goto-binary-serialisation)). However, in the
483+ case when the deserialised data were stored into ELF or Mac-O fat image, then
484+ only the check for presence of the concrete section in the image is performed.
485+
486+ \subsubsection subsection-goto-binary-deserialisation-linking Linking Goto Models
487+
488+ Similar to linking of object files together by C/C++ linker, the module
489+ provides linking of a dereserialised ` goto_modelt ` instance into a given
490+ (i.e. previously deserialised or otherwise created) ` goto_modelt ` instance.
491+
492+ This is implemented in function ` read_object_and_link ` . The function first
493+ deserialises the passed file into a temporary ` goto_modelt ` instance, and
494+ then it performs 'linking' of the temporary into a passed destination
495+ ` goto_modelt ` instance.
496+
497+ Details about linking of ` goto_modelt ` instances can be found
498+ [ here] (\ref section-linking-goto-models).
499+
500+
501+ \section section-linking-goto-models Linking Goto Models
502+
503+ C++ modules:
504+ - ` link_goto_model.h `
505+ - ` link_goto_model.cpp `
506+
507+ Dependencies:
508+ - [ linking folder] (\ref linking).
509+ - [ typecheck] (\ref section-goto-typecheck).
0 commit comments