@@ -377,3 +377,181 @@ This stage concludes the *analysis-independent* program transformations.
377377\subsubsection goto-program-example-1-section Unsigned mult (unsigned a, unsigned b) { int acc, i; for (i = 0; i < b; i++) acc += a; return acc; }
378378
379379To be documented.
380+
381+
382+ \section section-goto-binary Binary Represenration
383+
384+ An instance of ` goto_modelt ` can be serialised to a binary stream (which is
385+ typically a file on the disk), and later deserialised from that stream back to
386+ an equivalent ` goto_modelt ` instance.
387+
388+ \subsection subsection-goto-binary-serialisation Serialisation
389+
390+ The serialisation is implemented in C++ modules:
391+ - ` write_goto_binary.h `
392+ - ` write_goto_binary.cpp `
393+
394+ To serialise a ` goto_modelt ` instance ` gm ` to a stream ` ostr ` call the function ` write_goto_binary ` , e.g. ` write_goto_binary(ostr, gm) ` .
395+
396+ The content of the written stream will have this strucutre:
397+ - The header:
398+ - A magic number: byte ` 0x7f ` followed by 3 characters ` GBF ` .
399+ - A version number written in the 7-bit encoding. Currently, only version ` 4 ` is supported.
400+ - The symbol table:
401+ - The number of symbols in the table in the 7-bit encoding.
402+ - The array of individual symbols in the table. Each written symbol ` s ` has this structure:
403+ - The ` irept ` instance ` s.type ` .
404+ - The ` irept ` instance ` s.value ` .
405+ - The ` irept ` instance ` s.location ` .
406+ - The string ` s.name ` .
407+ - The string ` s.module ` .
408+ - The string ` s.base_name ` .
409+ - The string ` s.mode ` .
410+ - The string ` s.pretty_name ` .
411+ - The number ` 0 ` in the 7-bit encoding.
412+ - The flags word in the 7-bit encoding. The bits in the flags word correspond to the following ` bool ` fields (from the most significant bit):
413+ - ` s.is_weak `
414+ - ` s.is_type `
415+ - ` s.is_property `
416+ - ` s.is_macro `
417+ - ` s.is_exported `
418+ - ` s.is_input `
419+ - ` s.is_output `
420+ - ` s.is_state_var `
421+ - ` s.is_parameter `
422+ - ` s.is_auxiliary `
423+ - ` 0 ` (corresponding to ` s.binding ` , i.e. we always clear this info)
424+ - ` s.is_lvalue `
425+ - ` s.is_static_lifetime `
426+ - ` s.is_thread_local `
427+ - ` s.is_file_local `
428+ - ` s.is_extern `
429+ - ` s.is_volatile `
430+ - The functions with bodies, i.e. those missing a body are skipped.
431+ - The number of functions with bodies in the 7-bit encoding.
432+ - The array of individual function with bodies. Each written function has this structure:
433+ - The string with the name of the function.
434+ - The number of instructions in the body of the function in the 7-bit encoding.
435+ - The array of individual instruction in function's body. Each written instruction ` I ` has this structure:
436+ - The ` irept ` instance ` I.code ` , i.e. data of the instruction, like arguments.
437+ - The string ` I.function ` , i.e. the name of the function this instruction belongs to.
438+ - The ` irept ` instance ` I.source_location ` , i.e. the reference to the original source code (file, line).
439+ - The word in the 7-bit encoding ` I.type ` , i.e. the op-code of the instruction.
440+ - The ` irept ` instance ` I.guard ` .
441+ - The empty string (representing former ` I.event ` ).
442+ - The word in the 7-bit encoding ` I.target_number ` , i.e. the jump target to this instruction from other instructions.
443+ - The word in the 7-bit encoding ` I.targets.size() ` , i.e. the count of jump targets from this instruction.
444+ - The array of individual jump targets from this instruction, each written as a word in the 7-bit encoding.
445+ - The word in the 7-bit encoding ` I.labels.size() ` .
446+ - The array of individual labels, each written as a word in the 7-bit encoding.
447+
448+ An important propery of the serialisation is that each serialised ` irept `
449+ instance occurs in the stream exactly once. Namely, in the position of
450+ its first serialisation query. All other such queries save only hash
451+ code (i.e. reference) of the ` irept ` instance.
452+
453+ The similar strategy is used for serialisation of string constants
454+ shared amongst ` irept ` instances. Such a string is fully saved only in
455+ the first serialisation query of an ` irept ` instance it appears in and
456+ all other queries only save its integer has code.
457+
458+ Details about serialisation of ` irept ` instances, strings, and words in
459+ 7-bit encoding can be found [ here] (\ref irep-serialization).
460+
461+ \subsection subsection-goto-binary-deserialisation Deserialisation
462+
463+ The deserialisation is implemented in C++ modules:
464+ - ` read_goto_binary.h `
465+ - ` read_goto_binary.cpp `
466+ - ` read_bin_goto_object.h `
467+ - ` read_bin_goto_object.cpp `
468+
469+ First two modules are responsible for location of the stream with the
470+ serialised data within a passed file. And the remaining two modules
471+ perform the actual deserialisation of ` goto_modelt ` instance from
472+ the located stream.
473+
474+ To deserialise a ` goto_modelt ` instance ` gm ` from a file ` /some/path/name.gbf `
475+ call the function ` read_goto_binary ` , e.g.
476+ ` read_goto_binary("/some/path/name.gbf", gm, message_handler) ` , where
477+ ` message_handler ` must be an instance of ` message_handlert ` and serves
478+ for reporting issues during the process.
479+
480+ The passed binary file is assumed to have the same structure as described in
481+ the [ previous subsection] (\ref subsection-goto-binary-serialisation).
482+ The process of the deserialisation does not involve any seeking in the file.
483+ The content is read linearly from the beginning to the end. ` irept ` instances
484+ and their string constats are deserialised into the memory only once at their
485+ first occurreces in the stream. All subsequent deserialisation queries are
486+ resolved as in-memory references to the ` irept ` instances and/or strings.
487+
488+ NOTE: The first deserialisation is detected so that the loaded hash code
489+ is new. That implies that the full definition follows right after the hash.
490+
491+ Details about serialisation of ` irept ` instances, strings, and words in
492+ 7-bit encoding can be found [ here] (\ref irep-serialization).
493+
494+ \subsubsection subsection-goto-binary-deserialisation-from-elf Deserialisation from ELF image
495+
496+ One can decide to store the serialised stream as a separate section, named
497+ ` goto-cc ` , into an ELF image. Then the deserialisation has a support of
498+ automatic detection of that section in an ELF file and the deserialisation
499+ will be automatically started from that section.
500+
501+ For reading the ELF images there is used an instance of ` elf_readert `
502+ implemented in the C++ module:
503+ - ` elf_reader.h `
504+ - ` elf_reader.cpp `
505+
506+ \subsubsection subsection-goto-binary-deserialisation-from-mac-o-fat-image Deserialisation from Mac-O fat image
507+
508+ One can decide to store the serialised stream into Mac-O fat image as a
509+ separate non-empty section with flags ` CPU_TYPE_HPPA ` and
510+ ` CPU_SUBTYPE_HPPA_7100LC ` . Then the deserialisation has a support of
511+ automatic detection of that section in a Mac-O fat image, extraction
512+ of the section from the emage into a temporary file (this is done by
513+ calling ` lipo ` utility with ` -thin hppa7100LC ` option), and the
514+ deserialisation will be automatically started from that temporary
515+ file.
516+
517+ For reading the Mac-O fat images there is used an instance of
518+ ` osx_fat_readert ` implemented in the C++ module:
519+ - ` osx_fat_reader.h `
520+ - ` osx_fat_reader.cpp `
521+
522+ NOTE: This functionality is available only when the modules are built
523+ on a MacOS machine.
524+
525+ \subsubsection subsection-goto-binary-is-binary-file Checking file type
526+
527+ You can use function ` is_goto_binary ` to check whether a passed file contains
528+ a sederialised ` goto_modelt ` instance or not. This is done by checking the
529+ magic number of the stream (see subsection
530+ [ Serialisation] (\ref subsection-goto-binary-serialisation)). However, in the
531+ case when the deserialised data were stored into ELF or Mac-O fat image, then
532+ only the check for presence of the concrete section in the image is performed.
533+
534+ \subsubsection subsection-goto-binary-deserialisation-linking Linking Goto Models
535+
536+ Similar to linking of object files together by C/C++ linker, the module
537+ provides linking of a dereserialised ` goto_modelt ` instance into a given
538+ (i.e. previously deserialised or otherwise created) ` goto_modelt ` instance.
539+
540+ This is implemented in function ` read_object_and_link ` . The function first
541+ deserialises the passed file into a temporary ` goto_modelt ` instance, and
542+ then it performs 'linking' of the temporary into a passed destination
543+ ` goto_modelt ` instance.
544+
545+ Details about linking of ` goto_modelt ` instances can be found
546+ [ here] (\ref section-linking-goto-models).
547+
548+
549+ \section section-linking-goto-models Linking Goto Models
550+
551+ C++ modules:
552+ - ` link_goto_model.h `
553+ - ` link_goto_model.cpp `
554+
555+ Dependencies:
556+ - [ linking folder] (\ref linking).
557+ - [ typecheck] (\ref section-goto-typecheck).
0 commit comments