internal/godoc/codec: improve documentation

jba · jba · commit 0709f9d1ccca · 2024-09-04T23:28:05.000Z
Change-Id: I7b2d5721ee5502897be4d49d8c69ed16ee33fa8d Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/610715 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Findley <rfindley@google.com> kokoro-CI: kokoro <noreply+kokoro@google.com>
diff --git a/internal/godoc/codec/doc.go b/internal/godoc/codec/doc.go
@@ -12,14 +12,13 @@ encoding the structures of the go/ast package, which is its sole purpose.
 
 # Encoding Scheme
 
-Every encoded value begins with a single byte that describes what (if
-anything) follows. There is enough information to skip over the value, since
-the decoder must be able to do that if it encounters a struct field it
-doesn't know.
+Every encoded value begins with a single byte that describes what (if anything) follows.
+A value's encoding contains enough information to skip over the value, since the
+decoder must be able to do that if it encounters a struct field it doesn't know.
 
 Most of the values of that initial byte can be devoted to small unsigned
 integers. For example, the number 17 is represented by the single byte 17.
-Only a few byte values have special meaning.
+Only a few byte values have special meanings, whose descriptions follow.
 
 The nil code indicates that the value is nil. We don't absolutely need this:
 we could always represent the nil value for a type as something that couldn't
@@ -37,10 +36,16 @@ example, the string "hi" is represented as:
 
 Unsigned integers that can't fit into the initial byte are encoded as byte
 sequences of length 4 or 8, holding little-endian uint32 or uint64 values. We
-use uint32s where possible to save space. We could have saved more space by
-also considering 16-byte numbers, or using a variable-length encoding like
-varints or gob's representation, but it didn't seem worth the additional
-complexity.
+use uint32s where possible to save space. For example, 255 is encoded as
+
+	nBytes 4 0 0 0 255
+
+This representation is not as space-efficient as others, but improving its space
+usage didn't seem worth the additional complexity.
+
+Signed integers use gob's encoding.
+
+Floats are encoded by converting them to uints using math.Float64bits.
 
 The nValues code is for sequences of values whose size is known beforehand,
 like a Go slice or array. The slice []string{"hi", "bye"} is encoded as
@@ -51,6 +56,29 @@ The ref code is used to refer to an earlier encoded value. It is followed by
 a uint denoting the index data of the value to use.
 
 The start and end codes delimit a value whose length is unknown beforehand.
-It is used for structs.
+They are used for structs.
+
+A struct is encoded as a sequence of fields between a start code and an end code.
+Each exported field is encoded as an integer field number followed by the field's value.
+Fields with zero values are omitted.
+
+The field number is initially the position of the field in the struct declaration. This
+initial ordering is preserved even if fields are added or removed; new fields are
+numbered after the initial ones. This is accomplished by preserving the field ordering
+in a comment in the generated code. For example, say a struct S has fields A and B.
+The generated encoder would assign 0 to A and 1 to B, and include this comment in the code:
+
+	// Fields of S: A B
+
+If later a field X was added between A and B, the generator would read the comment
+and preserve the original assignments of A and B. It would assign 2 to X, and the
+generated code would contain the new comment
+
+	// Fields of S: A B X
+
+Values of type any are encoded as a pair of an assigned type number and the value.
+All types that can appear as values must be registered by calling Register.
+Type numbers are assigned to type names by the encoder in the order that types are encountered.
+The assignments are saved at the beginning of the encoded data.
 */
 package codec