Skip to content

Commit 15e80bd

Browse files
committed
[WIP] Add blob format documentation
1 parent 1d8f1b1 commit 15e80bd

File tree

2 files changed

+215
-0
lines changed

2 files changed

+215
-0
lines changed

Documentation/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111

1212
* [Submitting Bugs, Feature Requests, and Pull Requests][bugs]
1313
* [Directory Structure](project-docs/ExploringSources.md)
14+
* [Assembly blob format](project-docs/AssemblyBlobs.md)
1415

1516
[bugs]: https://github.com/xamarin/xamarin-android/wiki/Submitting-Bugs,-Feature-Requests,-and-Pull-Requests
1617

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
2+
**Table of Contents**
3+
4+
- [Assembly Blobs format and purpose](#assembly-blobs-format-and-purpose)
5+
- [Rationale](#rationale)
6+
- [Blob kinds and locations](#blob-kinds-and-locations)
7+
- [Blob format](#blob-format)
8+
- [Common header](#common-header)
9+
- [Assembly descriptor table](#assembly-descriptor-table)
10+
- [Index blob](#index-blob)
11+
- [Hash table format](#hash-table-format)
12+
13+
<!-- markdown-toc end -->
14+
15+
# Assembly Blobs format and purpose
16+
17+
Assembly blobs are binary files which contain inside them the managed
18+
assemblies, their debug data (optionally) and the associated config
19+
file (optionally). They are placed inside the Android APK/AAB
20+
archives, replacing individual assemblies/pdb/config files.
21+
22+
Blobs are an optional form of assembly storage in the archive, they
23+
can be used in all build configurations **except** when Fast
24+
Deployment is in effect (in which case assemblies aren't placed in the
25+
archives at all, they are instead synchronized from the host to the
26+
device/emulator filesystem)
27+
28+
## Rationale
29+
30+
During native startup, the Xamarin.Android runtime looks inside the
31+
application APK file for the managed assemblies (and their associated
32+
pdb and config files, if applicable) in order to map them (using the
33+
`mmap(2)` call) into memory so that they can be given to the Mono
34+
runtime when it requests a given assembly is loaded. The reason for
35+
the memory mapping is that, as far as Android is concerned, managed
36+
assembly files are just data/resources and, thus, aren't extracted to
37+
the filesystem. As a result, Mono wouldn't be able to find the
38+
assemblies by scanning the filesystem - the host application
39+
(Xamarin.Android) must give it a hand in finding them.
40+
41+
Applications can contain hundreds of assemblies (for instance a Hello
42+
World MAUI application currently contains over 120 assemblies) and
43+
each of them would have to be mmapped at startup, together with its
44+
pdb and config files, if found. This not only costs time (each `mmap`
45+
invocation is a system call) but it also makes the assembly discovery
46+
an O(n) algorithm, which takes more time as more assemblies are added
47+
to the APK/AAB archive.
48+
49+
An assembly blob, however, needs to be mapped only once and any
50+
further operations are merely pointer arithmetic, making the process
51+
not only faster but also reducing the algorithm complexity to O(1).
52+
53+
# Blob kinds and locations
54+
55+
Each application will contain at least a single blob, with assemblies
56+
that are architecture-agnostics and any number of
57+
architecture-specific blobs. dotnet ships with a handful of
58+
assemblies that **are** architecture-specific - those assemblies are
59+
placed in an architecture specific blob, one per architecture
60+
supported by and enabled for the application. On the execution time,
61+
the Xamarin.Android runtime will always map the architecture-agnostic
62+
blob and one, and **only** one, of the architecture-specific blobs.
63+
64+
Blobs are placed in the same location in the APK/AAB archive where the
65+
individual assemblies traditionally live, the `assemblies/` (for APK)
66+
and `base/root/assemblies/` (for AAB) folders.
67+
68+
The architecture agnostic blob is always named `assemblies.blob` while
69+
the architecture-specific one is called `assemblies.[ARCH].blob`.
70+
71+
Currently, Xamarin.Android applications will produce only one set of
72+
blobs but when Xamarin.Android adds support for Android Features, each
73+
feature APK will contain its own set of blobs. All of the APKs will
74+
follow the location, format and naming conventions described above.
75+
76+
# Blob format
77+
78+
Each blob is a structured binary file, using little-endian byte order
79+
and aligned to a byte boundary. Each blob consists of a header, an
80+
assembly descriptor table and, optionally (see below), two tables with
81+
assembly name hashes. All the blobs are assigned a unique ID, with
82+
the blob having ID equal to `0` being the [Index blob](#index-blob)
83+
84+
Assemblies are stored as adjacent byte streams:
85+
86+
- **Image data**
87+
Required to be present for all assemblies, contains the actual
88+
assembly PE image.
89+
- **Debug data**
90+
Optional. Contains the assembly's PDB or MDB debug data.
91+
- **Config data**
92+
Optional. Contains the assembly's .config file. Config data
93+
**must** be terminated with a `NUL` character (`0`), this is to
94+
make runtime code slightly more efficient.
95+
96+
All the structures described here are defined in the
97+
[`xamarin-app.hh`](../../src/monodroid/jni/xamarin-app.hh) file.
98+
Should there be any difference between this document and the
99+
structures in the header file, the information from the header is the
100+
one that should be trusted.
101+
102+
## Common header
103+
104+
All kinds of blobs share the following header format:
105+
106+
struct BundledAssemblyBlobHeader
107+
{
108+
uint32_t magic;
109+
uint32_t version;
110+
uint32_t local_entry_count;
111+
uint32_t global_entry_count;
112+
uint32_t blob_id;
113+
;
114+
115+
Individual fields have the following meanings:
116+
117+
- `magic`: has the value of 0x41424158 (`XABA`)
118+
- `version`: a value increased every time blob format changes.
119+
- `local_entry_count`: number of assemblies stored in this blob (also
120+
the number of entries in the assembly descriptor table, see below)
121+
- `global_entry_count`: number of entries in the index blob's (see
122+
below) hash tables, all the other blobs store `0` in this field
123+
- `blob_id`: a unique ID of this blob.
124+
125+
## Assembly descriptor table
126+
127+
Each blob header is followed by a table of
128+
`BundledAssemblyBlobHeader.local_entry_count` entries, each entry
129+
defined by the following structure:
130+
131+
struct BlobBundledAssembly
132+
{
133+
uint32_t data_offset;
134+
uint32_t data_size;
135+
uint32_t debug_data_offset;
136+
uint32_t debug_data_size;
137+
uint32_t config_data_offset;
138+
uint32_t config_data_size;
139+
};
140+
141+
Only the `data_offset` and `data_size` fields must have a non-zero
142+
value, other fields describe optional data and can be set to `0`.
143+
144+
Individual fields have the following meanings:
145+
146+
- `data_offset`: offset of the assembly image data from the
147+
beginning of the blob file
148+
- `data_size`: number of bytes of the image data
149+
- `debug_data_offset`: offset of the assembly's debug data from the
150+
beginning of the blob file. A value of `0` indicates there's no
151+
debug data for this assembly.
152+
- `debug_data_size`: number of bytes of debug data. Can be `0` only
153+
if `debug_data_offset` is `0`
154+
- `config_data_offset`: offset of the assembly's config file data
155+
from the beginning of the blob file. A value of `0` indicates
156+
there's no config file data for this assembly.
157+
- `config_data_size`: number of bytes of config file data. Can be
158+
`0` only if `config_data_offset` is `0`
159+
160+
## Index blob
161+
162+
Each application will contain exactly one blob with a global index -
163+
two tables with assembly name hashes. All the other blobs **do not**
164+
contain these tables. Two hash tables are necessary because hashes
165+
for 32-bit and 64-bit devices are different.
166+
167+
The hash tables follow the [Assembly descriptor
168+
table](#assembly-descriptor-table) and precede the individual assembly
169+
streams.
170+
171+
Placing the hash tables in a single index blob, while "wasting" a
172+
certain amount of memory (since 32-bit devices won't use the 64-bit
173+
table and vice versa), makes for simpler and faster runtime
174+
implementation and the amount of memory wasted isn't big (1000
175+
two tables which are 8kb long each, this being the amount of memory
176+
wasted)
177+
178+
### Hash table format
179+
180+
Both tables share the same format, despite the hashes themselves being
181+
of different sizes. This is done to make handling of the tables
182+
easier on the runtime.
183+
184+
Each entry contains, among other fields, the assembly name hash. The
185+
hash value is obtained using the
186+
[xxHash](https://cyan4973.github.io/xxHash/) algorithm and is
187+
calculated **without** including the `.dll` extension. This is done
188+
for runtime efficiency as the vast majority of Mono requests to load
189+
an assembly does not include the `.dll` suffix, thus saving us time of
190+
appending it in order to generate the hash for index lookup.
191+
192+
Each entry is represented by the following structure:
193+
194+
struct BlobHashEntry
195+
{
196+
union {
197+
uint64_t hash64;
198+
uint32_t hash32;
199+
};
200+
uint32_t mapping_index;
201+
uint32_t local_blob_index;
202+
uint32_t blob_id;
203+
};
204+
205+
Individual fields have the following meanings:
206+
207+
- `hash64`/`hash32`: the 32-bit or 64-bit hash of the assembly's name
208+
**without** the `.dll` suffix
209+
- `mapping_index`: index into a compile-time generated array of
210+
assembly data pointers. This is a global index, unique across
211+
**all** the APK files comprising the application.
212+
- `local_blob_index`: index into blob [Assembly descriptor table](#assembly-descriptor-table)
213+
describing the assembly.
214+
- `blob_id`: ID of the blob containing the assembly

0 commit comments

Comments
 (0)