|
| 1 | +<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc --> |
| 2 | +**Table of Contents** |
| 3 | + |
| 4 | +- [Assembly Blobs format and purpose](#assembly-blobs-format-and-purpose) |
| 5 | + - [Rationale](#rationale) |
| 6 | +- [Blob kinds and locations](#blob-kinds-and-locations) |
| 7 | +- [Blob format](#blob-format) |
| 8 | + - [Common header](#common-header) |
| 9 | + - [Assembly descriptor table](#assembly-descriptor-table) |
| 10 | + - [Index blob](#index-blob) |
| 11 | + - [Hash table format](#hash-table-format) |
| 12 | + |
| 13 | +<!-- markdown-toc end --> |
| 14 | + |
| 15 | +# Assembly Blobs format and purpose |
| 16 | + |
| 17 | +Assembly blobs are binary files which contain inside them the managed |
| 18 | +assemblies, their debug data (optionally) and the associated config |
| 19 | +file (optionally). They are placed inside the Android APK/AAB |
| 20 | +archives, replacing individual assemblies/pdb/config files. |
| 21 | + |
| 22 | +Blobs are an optional form of assembly storage in the archive, they |
| 23 | +can be used in all build configurations **except** when Fast |
| 24 | +Deployment is in effect (in which case assemblies aren't placed in the |
| 25 | +archives at all, they are instead synchronized from the host to the |
| 26 | +device/emulator filesystem) |
| 27 | + |
| 28 | +## Rationale |
| 29 | + |
| 30 | +During native startup, the Xamarin.Android runtime looks inside the |
| 31 | +application APK file for the managed assemblies (and their associated |
| 32 | +pdb and config files, if applicable) in order to map them (using the |
| 33 | +`mmap(2)` call) into memory so that they can be given to the Mono |
| 34 | +runtime when it requests a given assembly is loaded. The reason for |
| 35 | +the memory mapping is that, as far as Android is concerned, managed |
| 36 | +assembly files are just data/resources and, thus, aren't extracted to |
| 37 | +the filesystem. As a result, Mono wouldn't be able to find the |
| 38 | +assemblies by scanning the filesystem - the host application |
| 39 | +(Xamarin.Android) must give it a hand in finding them. |
| 40 | + |
| 41 | +Applications can contain hundreds of assemblies (for instance a Hello |
| 42 | +World MAUI application currently contains over 120 assemblies) and |
| 43 | +each of them would have to be mmapped at startup, together with its |
| 44 | +pdb and config files, if found. This not only costs time (each `mmap` |
| 45 | +invocation is a system call) but it also makes the assembly discovery |
| 46 | +an O(n) algorithm, which takes more time as more assemblies are added |
| 47 | +to the APK/AAB archive. |
| 48 | + |
| 49 | +An assembly blob, however, needs to be mapped only once and any |
| 50 | +further operations are merely pointer arithmetic, making the process |
| 51 | +not only faster but also reducing the algorithm complexity to O(1). |
| 52 | + |
| 53 | +# Blob kinds and locations |
| 54 | + |
| 55 | +Each application will contain at least a single blob, with assemblies |
| 56 | +that are architecture-agnostics and any number of |
| 57 | +architecture-specific blobs. dotnet ships with a handful of |
| 58 | +assemblies that **are** architecture-specific - those assemblies are |
| 59 | +placed in an architecture specific blob, one per architecture |
| 60 | +supported by and enabled for the application. On the execution time, |
| 61 | +the Xamarin.Android runtime will always map the architecture-agnostic |
| 62 | +blob and one, and **only** one, of the architecture-specific blobs. |
| 63 | + |
| 64 | +Blobs are placed in the same location in the APK/AAB archive where the |
| 65 | +individual assemblies traditionally live, the `assemblies/` (for APK) |
| 66 | +and `base/root/assemblies/` (for AAB) folders. |
| 67 | + |
| 68 | +The architecture agnostic blob is always named `assemblies.blob` while |
| 69 | +the architecture-specific one is called `assemblies.[ARCH].blob`. |
| 70 | + |
| 71 | +Currently, Xamarin.Android applications will produce only one set of |
| 72 | +blobs but when Xamarin.Android adds support for Android Features, each |
| 73 | +feature APK will contain its own set of blobs. All of the APKs will |
| 74 | +follow the location, format and naming conventions described above. |
| 75 | + |
| 76 | +# Blob format |
| 77 | + |
| 78 | +Each blob is a structured binary file, using little-endian byte order |
| 79 | +and aligned to a byte boundary. Each blob consists of a header, an |
| 80 | +assembly descriptor table and, optionally (see below), two tables with |
| 81 | +assembly name hashes. All the blobs are assigned a unique ID, with |
| 82 | +the blob having ID equal to `0` being the [Index blob](#index-blob) |
| 83 | + |
| 84 | +Assemblies are stored as adjacent byte streams: |
| 85 | + |
| 86 | + - **Image data** |
| 87 | + Required to be present for all assemblies, contains the actual |
| 88 | + assembly PE image. |
| 89 | + - **Debug data** |
| 90 | + Optional. Contains the assembly's PDB or MDB debug data. |
| 91 | + - **Config data** |
| 92 | + Optional. Contains the assembly's .config file. Config data |
| 93 | + **must** be terminated with a `NUL` character (`0`), this is to |
| 94 | + make runtime code slightly more efficient. |
| 95 | + |
| 96 | +All the structures described here are defined in the |
| 97 | +[`xamarin-app.hh`](../../src/monodroid/jni/xamarin-app.hh) file. |
| 98 | +Should there be any difference between this document and the |
| 99 | +structures in the header file, the information from the header is the |
| 100 | +one that should be trusted. |
| 101 | + |
| 102 | +## Common header |
| 103 | + |
| 104 | +All kinds of blobs share the following header format: |
| 105 | + |
| 106 | + struct BundledAssemblyBlobHeader |
| 107 | + { |
| 108 | + uint32_t magic; |
| 109 | + uint32_t version; |
| 110 | + uint32_t local_entry_count; |
| 111 | + uint32_t global_entry_count; |
| 112 | + uint32_t blob_id; |
| 113 | + ; |
| 114 | + |
| 115 | +Individual fields have the following meanings: |
| 116 | + |
| 117 | + - `magic`: has the value of 0x41424158 (`XABA`) |
| 118 | + - `version`: a value increased every time blob format changes. |
| 119 | + - `local_entry_count`: number of assemblies stored in this blob (also |
| 120 | + the number of entries in the assembly descriptor table, see below) |
| 121 | + - `global_entry_count`: number of entries in the index blob's (see |
| 122 | + below) hash tables, all the other blobs store `0` in this field |
| 123 | + - `blob_id`: a unique ID of this blob. |
| 124 | + |
| 125 | +## Assembly descriptor table |
| 126 | + |
| 127 | +Each blob header is followed by a table of |
| 128 | +`BundledAssemblyBlobHeader.local_entry_count` entries, each entry |
| 129 | +defined by the following structure: |
| 130 | + |
| 131 | + struct BlobBundledAssembly |
| 132 | + { |
| 133 | + uint32_t data_offset; |
| 134 | + uint32_t data_size; |
| 135 | + uint32_t debug_data_offset; |
| 136 | + uint32_t debug_data_size; |
| 137 | + uint32_t config_data_offset; |
| 138 | + uint32_t config_data_size; |
| 139 | + }; |
| 140 | + |
| 141 | +Only the `data_offset` and `data_size` fields must have a non-zero |
| 142 | +value, other fields describe optional data and can be set to `0`. |
| 143 | + |
| 144 | +Individual fields have the following meanings: |
| 145 | + |
| 146 | + - `data_offset`: offset of the assembly image data from the |
| 147 | + beginning of the blob file |
| 148 | + - `data_size`: number of bytes of the image data |
| 149 | + - `debug_data_offset`: offset of the assembly's debug data from the |
| 150 | + beginning of the blob file. A value of `0` indicates there's no |
| 151 | + debug data for this assembly. |
| 152 | + - `debug_data_size`: number of bytes of debug data. Can be `0` only |
| 153 | + if `debug_data_offset` is `0` |
| 154 | + - `config_data_offset`: offset of the assembly's config file data |
| 155 | + from the beginning of the blob file. A value of `0` indicates |
| 156 | + there's no config file data for this assembly. |
| 157 | + - `config_data_size`: number of bytes of config file data. Can be |
| 158 | + `0` only if `config_data_offset` is `0` |
| 159 | + |
| 160 | +## Index blob |
| 161 | + |
| 162 | +Each application will contain exactly one blob with a global index - |
| 163 | +two tables with assembly name hashes. All the other blobs **do not** |
| 164 | +contain these tables. Two hash tables are necessary because hashes |
| 165 | +for 32-bit and 64-bit devices are different. |
| 166 | + |
| 167 | +The hash tables follow the [Assembly descriptor |
| 168 | +table](#assembly-descriptor-table) and precede the individual assembly |
| 169 | +streams. |
| 170 | + |
| 171 | +Placing the hash tables in a single index blob, while "wasting" a |
| 172 | +certain amount of memory (since 32-bit devices won't use the 64-bit |
| 173 | +table and vice versa), makes for simpler and faster runtime |
| 174 | +implementation and the amount of memory wasted isn't big (1000 |
| 175 | +two tables which are 8kb long each, this being the amount of memory |
| 176 | +wasted) |
| 177 | + |
| 178 | +### Hash table format |
| 179 | + |
| 180 | +Both tables share the same format, despite the hashes themselves being |
| 181 | +of different sizes. This is done to make handling of the tables |
| 182 | +easier on the runtime. |
| 183 | + |
| 184 | +Each entry contains, among other fields, the assembly name hash. The |
| 185 | +hash value is obtained using the |
| 186 | +[xxHash](https://cyan4973.github.io/xxHash/) algorithm and is |
| 187 | +calculated **without** including the `.dll` extension. This is done |
| 188 | +for runtime efficiency as the vast majority of Mono requests to load |
| 189 | +an assembly does not include the `.dll` suffix, thus saving us time of |
| 190 | +appending it in order to generate the hash for index lookup. |
| 191 | + |
| 192 | +Each entry is represented by the following structure: |
| 193 | + |
| 194 | + struct BlobHashEntry |
| 195 | + { |
| 196 | + union { |
| 197 | + uint64_t hash64; |
| 198 | + uint32_t hash32; |
| 199 | + }; |
| 200 | + uint32_t mapping_index; |
| 201 | + uint32_t local_blob_index; |
| 202 | + uint32_t blob_id; |
| 203 | + }; |
| 204 | + |
| 205 | +Individual fields have the following meanings: |
| 206 | + |
| 207 | + - `hash64`/`hash32`: the 32-bit or 64-bit hash of the assembly's name |
| 208 | + **without** the `.dll` suffix |
| 209 | + - `mapping_index`: index into a compile-time generated array of |
| 210 | + assembly data pointers. This is a global index, unique across |
| 211 | + **all** the APK files comprising the application. |
| 212 | + - `local_blob_index`: index into blob [Assembly descriptor table](#assembly-descriptor-table) |
| 213 | + describing the assembly. |
| 214 | + - `blob_id`: ID of the blob containing the assembly |
0 commit comments