Add `differenceWithKey` #542

sjakobi · 2025-11-02T15:05:37Z

...and define differenceWith via differenceWithKey.

Closes #389, closes #364.

...and define `differenceWith` via `differenceWithKey` Closes #389.

This makes the overlapping case significantly faster.

treeowl · 2025-11-05T21:52:33Z

Data/HashMap/Internal.hs

+differenceWith f = differenceWithKey (const f)
+{-# INLINE differenceWith #-}
+
+-- | \(O(n \log m)\) Difference with a combining function. When two equal keys are


What are m and n here?

n is the size of the first map. m is the size of the second map. This is a convention this package uses for many functions. I suspect it was adopted from containers.

Okay, but I don't think that's necessarily what this implementation does; it was left unchanged from the old one.

It's not obviously wrong to me at least. If the first map is small, the second is a relatively large superset, and lookup[Cont] takes log(m), we still do n lookups in the larger map. To be fair, we don't start these lookups at the root, so maybe O(n log (m/n)) would be more accurate?!

IMHO these log(size)s are not very useful anyways, since on 64-bit systems we have a maximum tree height of 13, and on 32-bit systems the maximum tree height is 8; and you can still have a map with two entries and full tree height…

The bounds are given assuming sufficiently uniform hashing, but that's not at all the case for important instances like Int. It's ... a problem. I can't say if n log (m/n) is accurate or not, but it should be something symmetrical!

I take that back. Maybe not symmetrical. But ... I dunno...

Maybe not symmetrical actually. I have no idea.

Data/HashMap/Internal.hs

treeowl · 2025-11-05T21:57:45Z

Data/HashMap/Internal.hs

+    go_differenceWithKey !_s Empty _tB = Empty
+    go_differenceWithKey _s a Empty = a
+    go_differenceWithKey s a@(Leaf hA (L kA vA)) b
+      = lookupCont


Do we still need lookupCont for compatibility, or can we commit to unboxed sums and make life easier?

It seems that lookupCont is currently the only lookup version that takes a Shift argument and can therefore be used at different levels of the tree.

#410 proposes getting rid of lookupCont, although I personally don't find it so bad.

In principle, a version of lookupRecordCollision# that takes a shift should be able to reduce code size, compared to lookupCont, because we only need to compile it once for each key type. This package has a tradition of aggressively inlining everything to ensure specialization really happens, but I would love to imagine that we might not need to be quite so heavy-handed with modern GHC.

treeowl · 2025-11-05T22:07:01Z

Data/HashMap/Internal.hs

+              Just v | v `ptrEq` vA -> a
+                     | otherwise -> Leaf hA (L kA v))
+          hA kA s b
+    go_differenceWithKey _s a@(Collision hA aryA) (Leaf hB (L kB vB))


We generally assume that collisions are rare, And that multiple collisions are even more so. It might make sense to defer to basic operations when one of the maps is a collision bucket, rather than trying to do something fancy. That might require you to lift some go functions to the top level. That same sort of lifting might also give a better way to talk about leaves here.

I guess I agree, but could you make a concrete suggestion on how to handle this case?

IMHO the use of lookupInArrayCont is already quite clear.

I guess we could use a new function like

updateCollisionWithKey :: (k -> v -> Maybe v) -> Hash -> k -> Array (Leaf k v) -> HashMap k v -> HashMap k v

But I'm not yet convinced that it would make the code much clearer.

In general, I feel that our collection of functions for operating on Collision arrays is pretty awkward, especially the names (e.g. updateOrSnocWithKey). (#437 is related)

Add differenceWithKey

0c0d1f0

...and define `differenceWith` via `differenceWithKey` Closes #389.

sjakobi mentioned this pull request Nov 2, 2025

Speed up difference and differenceWith #520

Closed

1 task

differenceWith[Key]: Get the function argument to inline

261ba6d

This makes the overlapping case significantly faster.

sjakobi force-pushed the sjakobi/issue389-dWK branch from 5539624 to 261ba6d Compare November 5, 2025 21:17

sjakobi marked this pull request as ready for review November 5, 2025 21:25

treeowl reviewed Nov 5, 2025

View reviewed changes

Add differenceWithKey #542

Are you sure you want to change the base?

Add differenceWithKey #542

Conversation

sjakobi commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjakobi Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add `differenceWithKey` #542

Add `differenceWithKey` #542

sjakobi commented Nov 2, 2025 •

edited

Loading

sjakobi Nov 6, 2025 •

edited

Loading