Skip to content

Commit 5a7c571

Browse files
committed
hash/maphash: revise API to be more idiomatic
This CL makes these changes to the hash/maphash API to make it fit a bit more into the standard library: - Move some of the package doc onto type Hash, so that `go doc maphash.Hash` shows it. - Instead of having identical AddBytes and Write methods, standardize on Write, the usual name for this function. Similarly, AddString -> WriteString, AddByte -> WriteByte. - Instead of having identical Hash and Sum64 methods, standardize on Sum64 (for hash.Hash64). Dropping the "Hash" method also helps because Hash is usually reserved to mean the state of a hash function (hash.Hash etc), not the hash value itself. - Make an uninitialized hash.Hash auto-seed with a random seed. It is critical that users not use the same seed for all hash functions in their program, at least not accidentally. So the Hash implementation must either panic if uninitialized or initialize itself. Initializing itself is less work for users and can be done lazily. - Now that the zero hash.Hash is useful, drop maphash.New in favor of new(maphash.Hash) or simply declaring a maphash.Hash. - Add a [0]func()-typed field to the Hash so that Hashes cannot be compared. (I considered doing the same for Seed but comparing seeds seems OK.) - Drop the integer argument from MakeSeed, to match the original design in golang.org/issue/28322. There is no point to giving users control over the specific seed bits, since we want the interpretation of those bits to be different in every different process. The only thing users need is to be able to create a new random seed at each call. (Fixes a TODO in MakeSeed's public doc comment.) This API is new in Go 1.14, so these changes do not violate the compatibility promise. Fixes #35060. Fixes #35348. Change-Id: Ie6fecc441f3f5ef66388c6ead92e875c0871f805 Reviewed-on: https://go-review.googlesource.com/c/go/+/205069 Run-TryBot: Russ Cox <[email protected]> Reviewed-by: Alan Donovan <[email protected]> Reviewed-by: Keith Randall <[email protected]>
1 parent 03aca99 commit 5a7c571

File tree

3 files changed

+168
-125
lines changed

3 files changed

+168
-125
lines changed

src/hash/maphash/maphash.go

+113-82
Original file line numberDiff line numberDiff line change
@@ -2,96 +2,128 @@
22
// Use of this source code is governed by a BSD-style
33
// license that can be found in the LICENSE file.
44

5-
// Package hash/maphash provides hash functions on byte sequences. These
6-
// hash functions are intended to be used to implement hash tables or
5+
// Package maphash provides hash functions on byte sequences.
6+
// These hash functions are intended to be used to implement hash tables or
77
// other data structures that need to map arbitrary strings or byte
8-
// sequences to a uniform distribution of integers. The hash functions
9-
// are collision-resistant but are not cryptographically secure (use
10-
// one of the hash functions in crypto/* if you need that).
8+
// sequences to a uniform distribution of integers.
119
//
12-
// The produced hashes depend only on the sequence of bytes provided
13-
// to the Hash object, not on the way in which they are provided. For
14-
// example, the calls
15-
// h.AddString("foo")
16-
// h.AddBytes([]byte{'f','o','o'})
17-
// h.AddByte('f'); h.AddByte('o'); h.AddByte('o')
18-
// will all have the same effect.
19-
//
20-
// Two Hash instances in the same process using the same seed
21-
// behave identically.
22-
//
23-
// Two Hash instances with the same seed in different processes are
24-
// not guaranteed to behave identically, even if the processes share
25-
// the same binary.
26-
//
27-
// Hashes are intended to be collision-resistant, even for situations
28-
// where an adversary controls the byte sequences being hashed.
29-
// All bits of the Hash result are close to uniformly and
30-
// independently distributed, so can be safely restricted to a range
31-
// using bit masking, shifting, or modular arithmetic.
10+
// The hash functions are collision-resistant but not cryptographically secure.
11+
// (See crypto/sha256 and crypto/sha512 for cryptographic use.)
3212
package maphash
3313

34-
import (
35-
"unsafe"
36-
)
14+
import "unsafe"
3715

38-
// A Seed controls the behavior of a Hash. Two Hash objects with the
39-
// same seed in the same process will behave identically. Two Hash
40-
// objects with different seeds will very likely behave differently.
16+
// A Seed is a random value that selects the specific hash function
17+
// computed by a Hash. If two Hashes use the same Seeds, they
18+
// will compute the same hash values for any given input.
19+
// If two Hashes use different Seeds, they are very likely to compute
20+
// distinct hash values for any given input.
21+
//
22+
// A Seed must be initialized by calling MakeSeed.
23+
// The zero seed is uninitialized and not valid for use with Hash's SetSeed method.
24+
//
25+
// Each Seed value is local to a single process and cannot be serialized
26+
// or otherwise recreated in a different process.
4127
type Seed struct {
4228
s uint64
4329
}
4430

45-
// A Hash object is used to compute the hash of a byte sequence.
31+
// A Hash computes a seeded hash of a byte sequence.
32+
//
33+
// The zero Hash is a valid Hash ready to use.
34+
// A zero Hash chooses a random seed for itself during
35+
// the first call to a Reset, Write, Seed, Sum64, or Seed method.
36+
// For control over the seed, use SetSeed.
37+
//
38+
// The computed hash values depend only on the initial seed and
39+
// the sequence of bytes provided to the Hash object, not on the way
40+
// in which the bytes are provided. For example, the three sequences
41+
//
42+
// h.Write([]byte{'f','o','o'})
43+
// h.WriteByte('f'); h.WriteByte('o'); h.WriteByte('o')
44+
// h.WriteString("foo")
45+
//
46+
// all have the same effect.
47+
//
48+
// Hashes are intended to be collision-resistant, even for situations
49+
// where an adversary controls the byte sequences being hashed.
50+
//
51+
// A Hash is not safe for concurrent use by multiple goroutines, but a Seed is.
52+
// If multiple goroutines must compute the same seeded hash,
53+
// each can declare its own Hash and call SetSeed with a common Seed.
4654
type Hash struct {
47-
seed Seed // initial seed used for this hash
48-
state Seed // current hash of all flushed bytes
49-
buf [64]byte // unflushed byte buffer
50-
n int // number of unflushed bytes
55+
_ [0]func() // not comparable
56+
seed Seed // initial seed used for this hash
57+
state Seed // current hash of all flushed bytes
58+
buf [64]byte // unflushed byte buffer
59+
n int // number of unflushed bytes
60+
}
61+
62+
// initSeed seeds the hash if necessary.
63+
// initSeed is called lazily before any operation that actually uses h.seed/h.state.
64+
// Note that this does not include Write/WriteByte/WriteString in the case
65+
// where they only add to h.buf. (If they write too much, they call h.flush,
66+
// which does call h.initSeed.)
67+
func (h *Hash) initSeed() {
68+
if h.seed.s == 0 {
69+
h.SetSeed(MakeSeed())
70+
}
5171
}
5272

53-
// AddByte adds b to the sequence of bytes hashed by h.
54-
func (h *Hash) AddByte(b byte) {
73+
// WriteByte adds b to the sequence of bytes hashed by h.
74+
// It never fails; the error result is for implementing io.ByteWriter.
75+
func (h *Hash) WriteByte(b byte) error {
5576
if h.n == len(h.buf) {
5677
h.flush()
5778
}
5879
h.buf[h.n] = b
5980
h.n++
81+
return nil
6082
}
6183

62-
// AddBytes adds b to the sequence of bytes hashed by h.
63-
func (h *Hash) AddBytes(b []byte) {
84+
// Write adds b to the sequence of bytes hashed by h.
85+
// It always writes all of b and never fails; the count and error result are for implementing io.Writer.
86+
func (h *Hash) Write(b []byte) (int, error) {
87+
size := len(b)
6488
for h.n+len(b) > len(h.buf) {
6589
k := copy(h.buf[h.n:], b)
6690
h.n = len(h.buf)
6791
b = b[k:]
6892
h.flush()
6993
}
7094
h.n += copy(h.buf[h.n:], b)
95+
return size, nil
7196
}
7297

73-
// AddString adds the bytes of s to the sequence of bytes hashed by h.
74-
func (h *Hash) AddString(s string) {
98+
// WriteString adds the bytes of s to the sequence of bytes hashed by h.
99+
// It always writes all of s and never fails; the count and error result are for implementing io.StringWriter.
100+
func (h *Hash) WriteString(s string) (int, error) {
101+
size := len(s)
75102
for h.n+len(s) > len(h.buf) {
76103
k := copy(h.buf[h.n:], s)
77104
h.n = len(h.buf)
78105
s = s[k:]
79106
h.flush()
80107
}
81108
h.n += copy(h.buf[h.n:], s)
109+
return size, nil
82110
}
83111

84-
// Seed returns the seed value specified in the most recent call to
85-
// SetSeed, or the initial seed if SetSeed was never called.
112+
// Seed returns h's seed value.
86113
func (h *Hash) Seed() Seed {
114+
h.initSeed()
87115
return h.seed
88116
}
89117

90-
// SetSeed sets the seed used by h. Two Hash objects with the same
91-
// seed in the same process will behave identically. Two Hash objects
92-
// with different seeds will very likely behave differently. Any
93-
// bytes added to h previous to this call will be discarded.
118+
// SetSeed sets h to use seed, which must have been returned by MakeSeed
119+
// or by another Hash's Seed method.
120+
// Two Hash objects with the same seed behave identically.
121+
// Two Hash objects with different seeds will very likely behave differently.
122+
// Any bytes added to h before this call will be discarded.
94123
func (h *Hash) SetSeed(seed Seed) {
124+
if seed.s == 0 {
125+
panic("maphash: use of uninitialized Seed")
126+
}
95127
h.seed = seed
96128
h.state = seed
97129
h.n = 0
@@ -100,43 +132,46 @@ func (h *Hash) SetSeed(seed Seed) {
100132
// Reset discards all bytes added to h.
101133
// (The seed remains the same.)
102134
func (h *Hash) Reset() {
135+
h.initSeed()
103136
h.state = h.seed
104137
h.n = 0
105138
}
106139

107140
// precondition: buffer is full.
108141
func (h *Hash) flush() {
109142
if h.n != len(h.buf) {
110-
panic("flush of partially full buffer")
143+
panic("maphash: flush of partially full buffer")
111144
}
145+
h.initSeed()
112146
h.state.s = rthash(h.buf[:], h.state.s)
113147
h.n = 0
114148
}
115149

116-
// Hash returns a value which depends on h's seed and the sequence of
117-
// bytes added to h (since the last call to Reset or SetSeed).
118-
func (h *Hash) Hash() uint64 {
150+
// Sum64 returns h's current 64-bit value, which depends on
151+
// h's seed and the sequence of bytes added to h since the
152+
// last call to Reset or SetSeed.
153+
//
154+
// All bits of the Sum64 result are close to uniformly and
155+
// independently distributed, so it can be safely reduced
156+
// by using bit masking, shifting, or modular arithmetic.
157+
func (h *Hash) Sum64() uint64 {
158+
h.initSeed()
119159
return rthash(h.buf[:h.n], h.state.s)
120160
}
121161

122-
// MakeSeed returns a Seed initialized using the bits in s.
123-
// Two seeds generated with the same s are guaranteed to be equal.
124-
// Two seeds generated with different s are very likely to be different.
125-
// TODO: disallow this? See Alan's comment in the issue.
126-
func MakeSeed(s uint64) Seed {
127-
return Seed{s: s}
128-
}
129-
130-
// New returns a new Hash object. Different hash objects allocated by
131-
// this function will very likely have different seeds.
132-
func New() *Hash {
133-
s1 := uint64(runtime_fastrand())
134-
s2 := uint64(runtime_fastrand())
135-
seed := Seed{s: s1<<32 + s2}
136-
return &Hash{
137-
seed: seed,
138-
state: seed,
162+
// MakeSeed returns a new random seed.
163+
func MakeSeed() Seed {
164+
var s1, s2 uint64
165+
for {
166+
s1 = uint64(runtime_fastrand())
167+
s2 = uint64(runtime_fastrand())
168+
// We use seed 0 to indicate an uninitialized seed/hash,
169+
// so keep trying until we get a non-zero seed.
170+
if s1|s2 != 0 {
171+
break
172+
}
139173
}
174+
return Seed{s: s1<<32 + s2}
140175
}
141176

142177
//go:linkname runtime_fastrand runtime.fastrand
@@ -154,22 +189,17 @@ func rthash(b []byte, seed uint64) uint64 {
154189
}
155190
lo := runtime_memhash(unsafe.Pointer(&b[0]), uintptr(seed), uintptr(len(b)))
156191
hi := runtime_memhash(unsafe.Pointer(&b[0]), uintptr(seed>>32), uintptr(len(b)))
157-
// TODO: mix lo/hi? Get 64 bits some other way?
158192
return uint64(hi)<<32 | uint64(lo)
159193
}
160194

161195
//go:linkname runtime_memhash runtime.memhash
162196
func runtime_memhash(p unsafe.Pointer, seed, s uintptr) uintptr
163197

164-
// Wrapper functions so that a hash/maphash.Hash implements
165-
// the hash.Hash and hash.Hash64 interfaces.
166-
167-
func (h *Hash) Write(b []byte) (int, error) {
168-
h.AddBytes(b)
169-
return len(b), nil
170-
}
198+
// Sum appends the hash's current 64-bit value to b.
199+
// It exists for implementing hash.Hash.
200+
// For direct calls, it is more efficient to use Sum64.
171201
func (h *Hash) Sum(b []byte) []byte {
172-
x := h.Hash()
202+
x := h.Sum64()
173203
return append(b,
174204
byte(x>>0),
175205
byte(x>>8),
@@ -180,8 +210,9 @@ func (h *Hash) Sum(b []byte) []byte {
180210
byte(x>>48),
181211
byte(x>>56))
182212
}
183-
func (h *Hash) Sum64() uint64 {
184-
return h.Hash()
185-
}
186-
func (h *Hash) Size() int { return 8 }
213+
214+
// Size returns h's hash value size, 8 bytes.
215+
func (h *Hash) Size() int { return 8 }
216+
217+
// BlockSize returns h's block size.
187218
func (h *Hash) BlockSize() int { return len(h.buf) }

src/hash/maphash/maphash_test.go

+31-23
Original file line numberDiff line numberDiff line change
@@ -2,32 +2,31 @@
22
// Use of this source code is governed by a BSD-style
33
// license that can be found in the LICENSE file.
44

5-
package maphash_test
5+
package maphash
66

77
import (
88
"hash"
9-
"hash/maphash"
109
"testing"
1110
)
1211

1312
func TestUnseededHash(t *testing.T) {
1413
m := map[uint64]struct{}{}
1514
for i := 0; i < 1000; i++ {
16-
h := maphash.New()
17-
m[h.Hash()] = struct{}{}
15+
h := new(Hash)
16+
m[h.Sum64()] = struct{}{}
1817
}
1918
if len(m) < 900 {
2019
t.Errorf("empty hash not sufficiently random: got %d, want 1000", len(m))
2120
}
2221
}
2322

2423
func TestSeededHash(t *testing.T) {
25-
s := maphash.MakeSeed(1234)
24+
s := MakeSeed()
2625
m := map[uint64]struct{}{}
2726
for i := 0; i < 1000; i++ {
28-
h := maphash.New()
27+
h := new(Hash)
2928
h.SetSeed(s)
30-
m[h.Hash()] = struct{}{}
29+
m[h.Sum64()] = struct{}{}
3130
}
3231
if len(m) != 1 {
3332
t.Errorf("seeded hash is random: got %d, want 1", len(m))
@@ -36,28 +35,37 @@ func TestSeededHash(t *testing.T) {
3635

3736
func TestHashGrouping(t *testing.T) {
3837
b := []byte("foo")
39-
h1 := maphash.New()
40-
h2 := maphash.New()
38+
h1 := new(Hash)
39+
h2 := new(Hash)
4140
h2.SetSeed(h1.Seed())
42-
h1.AddBytes(b)
41+
h1.Write(b)
4342
for _, x := range b {
44-
h2.AddByte(x)
43+
err := h2.WriteByte(x)
44+
if err != nil {
45+
t.Fatalf("WriteByte: %v", err)
46+
}
4547
}
46-
if h1.Hash() != h2.Hash() {
48+
if h1.Sum64() != h2.Sum64() {
4749
t.Errorf("hash of \"foo\" and \"f\",\"o\",\"o\" not identical")
4850
}
4951
}
5052

5153
func TestHashBytesVsString(t *testing.T) {
5254
s := "foo"
5355
b := []byte(s)
54-
h1 := maphash.New()
55-
h2 := maphash.New()
56+
h1 := new(Hash)
57+
h2 := new(Hash)
5658
h2.SetSeed(h1.Seed())
57-
h1.AddString(s)
58-
h2.AddBytes(b)
59-
if h1.Hash() != h2.Hash() {
60-
t.Errorf("hash of string and byts not identical")
59+
n1, err1 := h1.WriteString(s)
60+
if n1 != len(s) || err1 != nil {
61+
t.Fatalf("WriteString(s) = %d, %v, want %d, nil", n1, err1, len(s))
62+
}
63+
n2, err2 := h2.Write(b)
64+
if n2 != len(b) || err2 != nil {
65+
t.Fatalf("Write(b) = %d, %v, want %d, nil", n2, err2, len(b))
66+
}
67+
if h1.Sum64() != h2.Sum64() {
68+
t.Errorf("hash of string and bytes not identical")
6169
}
6270
}
6371

@@ -66,15 +74,15 @@ func TestHashHighBytes(t *testing.T) {
6674
const N = 10
6775
m := map[uint64]struct{}{}
6876
for i := 0; i < N; i++ {
69-
h := maphash.New()
70-
h.AddString("foo")
71-
m[h.Hash()>>32] = struct{}{}
77+
h := new(Hash)
78+
h.WriteString("foo")
79+
m[h.Sum64()>>32] = struct{}{}
7280
}
7381
if len(m) < N/2 {
7482
t.Errorf("from %d seeds, wanted at least %d different hashes; got %d", N, N/2, len(m))
7583
}
7684
}
7785

7886
// Make sure a Hash implements the hash.Hash and hash.Hash64 interfaces.
79-
var _ hash.Hash = &maphash.Hash{}
80-
var _ hash.Hash64 = &maphash.Hash{}
87+
var _ hash.Hash = &Hash{}
88+
var _ hash.Hash64 = &Hash{}

0 commit comments

Comments
 (0)