Implement optimized linear probing #990

czgdp1807 · 2022-08-19T07:20:17Z

Separate chaining in its raw form requires creating linked list for each insertion in the dict. This implies expensive malloc calls for each insertion. This becomes a major overhead when we scale the number of insertions as can be seen with C++ benchmarks here. However the advantage of separate chaining is that when the length of linked list at a given index is 1 then we know for sure that we don't need to compare keys by value as no collision has happened yet.

So, instead of creating LinkedList for each new insertion, I have updated key_mask to provide the signal whether linear probing is to be done or not. This way we can have the benefit separate chaining (avoiding comparison of keys by value while reading) and linear probing (cache efficiency, low overhead as malloc calls are made only when rehashing the table) both at one time.

~~I am yet to decouple the logic into a separate child class of LLVMDict so that we can switch between the two collision resolution strategies easily.~~

Note - The benefit of this should be visible when we will be using derived data structures as keys (such as tuple, or very long strings). For now the benchmarks don't show any slowdowns.

1. Create pos_ptr iterator only if linear probing is done writing to a dict

Use key_mask to figure out whether probing is needed for reading a value from the dict.

1. Rename 'linear_probing_*' to 'resolve_collision_*' 2. Implemented 'resolve_collision_*' methods for LLVMDictOptimizedLinearProbing

czgdp1807 · 2022-08-19T10:28:51Z

@certik This is ready for review. Please let me know if anything should be changed here.

certik

The changes seem fine to me.

certik · 2022-08-19T12:10:34Z

So the current benchmarks do not show a slowdown.

Is there a benchmark that would show a speedup with this PR?

czgdp1807 · 2022-08-19T13:00:12Z

Is there a benchmark that would show a speedup with this PR?

"Note - The benefit of this should be visible when we will be using derived data structures as keys (such as tuple, or very long strings). For now the benchmarks don't show any slowdowns." - Quoting from #990 (comment). The reason is quoted below,

So, instead of creating LinkedList for each new insertion, I have updated key_mask to provide the signal whether linear probing is to be done or not. This way we can have the benefit separate chaining (avoiding comparison of keys by value while reading) and linear probing (cache efficiency, low overhead as malloc calls are made only when rehashing the table) both at one time.

certik · 2022-08-19T16:26:24Z

It might be a "premature optimization" if we can measure the speedup, but it's fine with me to merge, since you believe this will help us in the future.

czgdp1807 · 2022-08-19T16:40:56Z

Well. The choice is either this or linked lists ( a.k.a separate chaining) we want to avoid key comparison for two keys not having the same hash. Linked lists is not something I would go for as it will add costs with each insertion. The other thing left is what I implemented here.

Anyways, I will implement dict.delete and then benchmark on a bunch of hash functions for strings. So if there will be any issues I will fix them right away.

certik · 2022-08-20T06:30:26Z

I agree to avoid linked lists.

I thought we already had a solution in master for when two keys have the same hash.

czgdp1807 · 2022-08-20T06:49:29Z

Yes we already had collision resolution before this PR. But that compared keys by value at least once. But if you know beforehand that at a certain key hash no collision has happened then we don’t need to compare original keys at all. So while fetching elements from the dict we will get performance benefits for keys for which no collision has happened. For those which have faced collision we already have the algorithm implemented for that and it works well.

certik · 2022-08-20T13:39:15Z

I see, I understand now. Thanks for implementing it.

czgdp1807 added the llvm LLVM related changes label Aug 19, 2022

czgdp1807 added 6 commits August 19, 2022 15:27

Verify length of final dict after all insertions

a9ab350

Following changes have been made,

429b08f

1. Create pos_ptr iterator only if linear probing is done writing to a dict

Following changes have been made,

a57a02b

Use key_mask to figure out whether probing is needed for reading a value from the dict.

Added LLVMDictOptimizedLinearProbing

323ecc7

Following changes have been made,

3f371f2

1. Rename 'linear_probing_*' to 'resolve_collision_*' 2. Implemented 'resolve_collision_*' methods for LLVMDictOptimizedLinearProbing

Use LLVMDictOptimizedLinearProbing by default

74bdd1d

czgdp1807 mentioned this pull request Aug 19, 2022

Dict implementation design #983

Open

5 tasks

Clear typecode2dicttype in LLVMDict destructor

f525230

czgdp1807 marked this pull request as ready for review August 19, 2022 10:10

czgdp1807 force-pushed the dict03 branch from 6071637 to f525230 Compare August 19, 2022 10:11

czgdp1807 requested a review from certik August 19, 2022 10:28

certik approved these changes Aug 19, 2022

View reviewed changes

czgdp1807 merged commit 39d976d into lcompilers:main Aug 19, 2022

czgdp1807 deleted the dict03 branch August 19, 2022 13:00

czgdp1807 restored the dict03 branch August 19, 2022 16:44

czgdp1807 changed the title ~~Implement separate chaining implicitly via linear probing~~ Implement optimized linear probing Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement optimized linear probing #990

Implement optimized linear probing #990

Uh oh!

czgdp1807 commented Aug 19, 2022 •

edited

Loading

Uh oh!

czgdp1807 commented Aug 19, 2022

Uh oh!

certik left a comment

Uh oh!

certik commented Aug 19, 2022

Uh oh!

czgdp1807 commented Aug 19, 2022

Uh oh!

certik commented Aug 19, 2022

Uh oh!

czgdp1807 commented Aug 19, 2022 •

edited

Loading

Uh oh!

certik commented Aug 20, 2022

Uh oh!

czgdp1807 commented Aug 20, 2022

Uh oh!

certik commented Aug 20, 2022

Uh oh!

Uh oh!

Implement optimized linear probing #990

Implement optimized linear probing #990

Uh oh!

Conversation

czgdp1807 commented Aug 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

czgdp1807 commented Aug 19, 2022

Uh oh!

certik left a comment

Choose a reason for hiding this comment

Uh oh!

certik commented Aug 19, 2022

Uh oh!

czgdp1807 commented Aug 19, 2022

Uh oh!

certik commented Aug 19, 2022

Uh oh!

czgdp1807 commented Aug 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

certik commented Aug 20, 2022

Uh oh!

czgdp1807 commented Aug 20, 2022

Uh oh!

certik commented Aug 20, 2022

Uh oh!

Uh oh!

czgdp1807 commented Aug 19, 2022 •

edited

Loading

czgdp1807 commented Aug 19, 2022 •

edited

Loading