Lockfree Skip List #65

sooraj-srini · 2023-03-10T09:11:53Z

Implementation of a lock free skiplist taken directly from the Java implementation given in Herlihy et al's "The Art of Programming" 2nd Edition, section 14.4. There is a different implementation of [contains] from the textbook as the textbook implementation contains typos.

To do in this PR

Make it polymorphic
Debug : get_random_level : returns a level of at least 1, why not 0 ? (it is not working with 0 but why ?)
More documentation
Add benchmarks with comparison to naive lock implementation and with the lazy skiplist of Lazy skiplist #90.
Change get_random_level to remove the loop (as @polytypic suggested)

To do in a future PR

a version with hashed key : it would enable optimization on integers and avoid some bad cases related to sorted additions (not sure about that)
a version functorized over key type and compare function : key comparison is done with = and < so with some types, it could be very costly or just not work at all.

Sudha247

Thanks for the work @sooraj-srini! This is going in the right direction. I did a quick pass and left some comments below. Will try to do another pass on add and remove functions.

src/atomicskiplist.ml

polytypic · 2023-03-15T11:04:35Z

src/atomicskiplist.ml

+(** Get a random level from 1 till max_height (both included) *)
+let get_random_level () =
+  let rec count_level cur_level =
+    if cur_level == max_height || Random.float 1.0 <= 0.5 then cur_level


Why not use Random.bool ()?

BTW, instead of using a loop, one could roughly:

Compute max_height - 1 random bits: Random.bits () land ((1 lsl (max_height - 1)) - 1) lor (1 lsl (max_height - 1)).

Use the technique described in this paper to find the index of the lowest 1 bit.

Add 1.

polytypic · 2023-03-15T11:32:21Z

src/atomicskiplist.ml

+(** get_mark_ref: Returns the node and the mark from an Atomic markablereference *)
+let get_mark_ref atomic_ref =
+  let ref = Atomic.get atomic_ref in
+  (ref.node, ref.marked)


It should be unnecessary to copy the values from the immutable record. This whole helper function could be just removed.

The Java implementation of AtomicMarkableReference includes functions like getReference and get which are analogous to get_ref and get_mark_ref in my implementation. In my aim to stick as close as possible to the implementation in "The Art of Multiprocessor Programming", I used similar functions.

I guess that is a reasonable approach to get a working reference implementation. However, Java and OCaml have many differences that make a direct transliteration from Java to OCaml undesirable. For example, using options to emulate null is inefficient, because in OCaml that adds a level of indirection. I would not recommend such an approach (i.e. direct transliteration) for a serious implementation.

polytypic · 2023-03-15T11:39:59Z

src/atomicskiplist.ml

+  let init level =
+    let prev = Some head in
+    let curr = get_ref (Option.get prev).next.(level) in
+    let succ, mark = get_mark_ref (Option.get curr).next.(level) in


This could be just let {node = succ; marked = mark} = Atomic.get (Option.get curr).next.(level).

Also, Option.get should generally be avoided.

kayceesrk · 2023-04-07T09:13:49Z

@sooraj-srini will finish his course project in 2 weeks. I wondered what needs to be done to get this across the line to a merge while we have his attention. @Sudha247 @polytypic?

lyrm

I did a primary review (I did not dive deep into the algorithm).

It is a great start : the implementation seems to work. I have done STM test (that I will add to the #61). It works. The single dscheck test takes a very long time but finished. I will add some and run them through a night to see if any issue raises.

I however think quite a bit of things need to be polished before merging: the code needs to be cleaned and there is improvement to do to better use Ocaml features, as @polytypic mentioned.

In a general way:

formatting (dune build @fmt and then dune promote)
documentation in .mli

About the code itself, I wrote a few comments, but there seems to be a lot of avoidable copies and indirections. I will try to dive deeper into it at the beginning of next week to help move it forward quickly.

src/atomicskiplist.mli

src/atomicskiplist.ml

lyrm · 2023-04-14T15:39:07Z

src/atomicskiplist.ml

+
+let null_node = {key = Int.max_int; height = 0; next = [||]}
+
+let max_height = 10


This should be an optional parameter of the create function as it is related to how well the skip list performs compared to its size. It also allows dscheck to finish by setting it at a low value.

lyrm · 2023-04-14T17:08:30Z

test/atomicskiplist/atomic_skiplist_dscheck.ml

@@ -0,0 +1,28 @@
+(* This dscheck testcase is not terminating. *)


This dscheck test takes forever (probably because of too many possible interleavings) but it can actually finish.

It also displays a weird behavior (if find calls are replaced by add calls) which is due to randomness (as explained in this dscheck issue). It is easily avoided by adding Random.init 0 at the beginning of the test (right after Atomic.trace (fun () ->).

lyrm

A more complete review of the code. A few important things to do :

add max_height as a (possibly optional) parameter of create.
remove the get_mark_ref and get_ref functions
minor changes in the add function to avoid calling Atomic.get on a newly created and still unlinked node
rename find to mem
either correct the types (as it is an integer skip list) or add the necessary changes to make it a polymorphic one.

src/atomicskiplist.ml

lyrm · 2023-04-20T10:35:07Z

With the proper changes, this following dcheck tests are finishing and passing (with the source-set dscheck branch, that is most likely going to be merged soon).

open Atomicskiplist

let _two_find () =
  Atomic.trace (fun () ->
      Random.init 0;
      let sl = create ~max_height:2 () in
      let added1 = ref false in
      let found1 = ref false in
      let found2 = ref false in

      Atomic.spawn (fun () ->
          added1 := add sl 1;
          found1 := find sl 1);

      Atomic.spawn (fun () -> found2 := find sl 2);

      Atomic.final (fun () ->
          Atomic.check (fun () -> !added1 && !found1 && not !found2)))

let _two_add () =
  Atomic.trace (fun () ->
      Random.init 0;
      let sl = Atomicskiplist.create ~max_height:2 () in
      let added1 = ref false in
      let added2 = ref false in

      Atomic.spawn (fun () -> added1 := add sl 1);
      Atomic.spawn (fun () -> added2 := add sl 2);

      Atomic.final (fun () ->
          Atomic.check (fun () -> !added1 && !added2 && find sl 1 && find sl 2)))

let _two_add_same () =
  Atomic.trace (fun () ->
      Random.init 0;
      let sl = Atomicskiplist.create ~max_height:2 () in
      let added1 = ref false in
      let added2 = ref false in

      Atomic.spawn (fun () -> added1 := add sl 1);
      Atomic.spawn (fun () -> added2 := add sl 1);

      Atomic.final (fun () ->
          Atomic.check (fun () ->
              (!added1 && not !added2)
              || (((not !added1) && !added2) && find sl 1))))

let _two_remove_same () =
  Atomic.trace (fun () ->
      Random.init 0;
      let sl = create ~max_height:1 () in
      let added1 = ref false in
      let removed1 = ref false in
      let removed2 = ref false in

      Atomic.spawn (fun () ->
          added1 := add sl 1;
          removed1 := remove sl 1);
      Atomic.spawn (fun () -> removed2 := remove sl 1);

      Atomic.final (fun () ->
          Atomic.check (fun () ->
              !added1
              && ((!removed1 && not !removed2) || ((not !removed1) && !removed2))
              && not (find sl 1))))

let _two_remove () =
  Atomic.trace (fun () ->
      Random.init 0;
      let sl = create ~max_height:1 () in
      let added1 = ref false in
      let removed1 = ref false in
      let removed2 = ref false in

      Atomic.spawn (fun () ->
          added1 := add sl 1;
          removed1 := remove sl 1);
      Atomic.spawn (fun () -> removed2 := remove sl 2);

      Atomic.final (fun () ->
          Atomic.check (fun () ->
              let found1 = find sl 1 in
              !added1 && !removed1 && not !removed2 && not found1)))

let () =
  let open Alcotest in
  run "atomic_skiplist_dscheck"
    [
      ( "basic",
        [
          test_case "2-find" `Slow _two_find;
          test_case "2-add-same" `Slow _two_add_same;
          test_case "2-add" `Slow _two_add;
          test_case "2-remove-same" `Slow _two_remove_same;
          test_case "2-remove" `Slow _two_remove;
        ] );
    ]

Sudha247 · 2023-06-19T14:59:28Z

Hi @sooraj-srini, I believe @lyrm has some updates to this PR before we merge. Is it ok to push updates directly? If so, could you give access to @lyrm to your fork please?

sooraj-srini · 2023-06-20T22:03:22Z

Sure, I have added @lyrm to my fork as a collaborator.
Additionally, I had written some extra benchmarks in this repository along with an implementation of a composable skip list using kcas. I had hoped to have the time to include these my fork, but did not manage to do so.

lyrm · 2023-06-27T17:17:04Z

(Sorry, some hmmm git shenanigans)

lyrm · 2023-11-09T09:44:16Z

I have merged all the small changes/improvements/debugs I did on this implementation. There are still some stuff that can be done, but nothing that should change massively the implementation. I will list the tasks to do (in this PR or in a future one) in the first comment.

@polytypic : Could you review this ? I guess at this point, I am mostly interested on optimization/improvement that could be done in this algorithm. In particular, I think the implementation suffer a lot from false sharing, as the next field of a node is an array. This is particularly true for the head node that is passed by at every function call. I am not sure on how to improve that.

polytypic · 2023-11-09T13:42:59Z

bench/bench_skiplist.ml

+          if prob < add then Skiplist.add sl (Random.int 10000) |> ignore
+          else if prob >= add && prob < add +. remove then
+            Skiplist.remove sl (Random.int 10000) |> ignore
+          else Skiplist.mem sl elems.(i) |> ignore


Hmm... The use of elems array here and above is a bit strange. A huge array is initialized and then only the fraction for a single thread is used. Why not use a Random.int here as well?

polytypic · 2023-11-09T13:53:19Z

bench/bench_skiplist.ml

+    List.map (fun domain -> Domain.join domain) threads
+  in
+  let end_time = Unix.gettimeofday () in
+  let time_diff = end_time -. List.nth start_time_threads 0 in


Hmm... The timing collection seems to just use the start time of the first domain and the time after joining all of the domains.

How about:

Use a barrier to synchronize all the domains before their loops.

Individually time the loop inside each domain and return that from each domain.

Various measures could then be calculated from the collection of timings. E.g. compute average of the times from each domain to get roughly the same kind of measurement as here, but taking all domains into account rather than the start time of the first domain and the end time (+ some) of the domain that finished last.

polytypic · 2023-11-11T11:44:57Z

src_lockfree/skiplist.ml

+            Atomic.make mark_ref)
+          succs
+      in
+      let new_node = { key; height = top_level; next = new_node_next } in


The next array of the new node is initialized to contain as many elements as the maximum number of levels, but only a part (according to height) of those are actually used.

polytypic · 2023-11-12T11:22:32Z

src_lockfree/skiplist.ml

+          (fun element ->
+            let mark_ref = { node = element; marked = false } in
+            Atomic.make mark_ref)
+          succs


Hmm... Aside from the array created here being too large, I believe there might be a more subtle space leak issue.

What happens here is that a new node is being constructed and as a part of that an array of references to other nodes is created. A particular reference of this array is not updated during add nor are the references in the array created here subject to updates until a reference to the new_node is added to a predecessor node at a specific level (except when the node being added might be removed after it has been linked on level 0).

Consider the following scenario:

A domain performing add is suspended after creating the new_node_next array.

Another domain removes one of the nodes to which the new_node_next array has a reference at a non-zero level.

The domain performing add is resumed and completes the operation.

What will happen then that the add will notice that a successor node was removed at around line 118+ as the compare_and_set_mark_ref fails. The add will then call find_in to update the preds and succs. This will not, however, update the reference in the new_node_next array, which was created based on an earlier succs. That is because the reference is at a level on which the new_node is not yet attached to the skip list (that is because the compare_and_set_mark_ref failed).

This means that after add returns, the new_node has been added to the skip list and the new_node contains a reference to a removed node. This means that the key contained in that removed node cannot be garbage collected. It will remain in memory until some call to find_in will notice that the removed node (due to marked references) and removes it. But there is no guarantee such a call will happen. It might never happen.

Am I missing something?

polytypic · 2023-11-13T09:35:51Z

I wrote a lock-free skiplist from scratch somewhat inspired by what I learned from reviewing the code in this PR. I used a number of techniques to optimize it and it is roughly 1.75 times faster (6.34 M op/s vs 3.55 M op/s) on the benchmark in this PR. You can find the code in this gist. The code in the gist has some comments on some of the optimizations. The main improvement likely comes from the internal representation that avoids a level of indirection and takes less memory.

kayceesrk · 2023-11-14T05:47:41Z

I don't see a reason to retain this PR if the new implementation is 1.75x faster. It may be best to close this PR and open a new one with the code from the gist.

lyrm · 2023-11-16T16:45:31Z

I'm closing this PR as an improved implementation of this skiplist algorithm is proposed in PR #99. Thanks to all contributors!

Sudha247 reviewed Mar 14, 2023

View reviewed changes

src/atomicskiplist.ml Outdated Show resolved Hide resolved

src/atomicskiplist.ml Outdated Show resolved Hide resolved

src/atomicskiplist.ml Outdated Show resolved Hide resolved

src/atomicskiplist.ml Outdated Show resolved Hide resolved

src/atomicskiplist.ml Outdated Show resolved Hide resolved

polytypic reviewed Mar 15, 2023

View reviewed changes

bartoszmodelski mentioned this pull request Apr 14, 2023

Randomness ocaml-multicore/dscheck#20

Open

lyrm reviewed Apr 14, 2023

View reviewed changes

lyrm reviewed Apr 20, 2023

View reviewed changes

sooraj-srini force-pushed the main branch from 2932528 to 2a82404 Compare April 20, 2023 13:42

lyrm closed this Jun 27, 2023

lyrm force-pushed the main branch from fe2aef7 to 2713111 Compare June 27, 2023 17:05

lyrm reopened this Jun 27, 2023

This was referenced Jul 27, 2023

added initial files for priority queue #80

Draft

Lock-based Skiplist implementation #87

Closed

Sudha247 added this to the 1.0 milestone Sep 20, 2023

lyrm added 4 commits October 24, 2023 14:54

Lockfree skiplist.

2e6ca9b

Add stm and more dscheck tests.

0efac79

Renaming atomicskiplist to skiplist

ad8782a

Merge with main.

1d49325

lyrm closed this Nov 3, 2023

lyrm force-pushed the main branch from 12458a9 to 49e4e59 Compare November 3, 2023 14:59

lyrm reopened this Nov 3, 2023

lyrm added 3 commits November 7, 2023 18:08

Make skiplist polymorphic.

dcf3df2

Debug level issues.

f733944

Cleanup and documentation.

6eb5303

lyrm force-pushed the main branch from 97b56e8 to 6eb5303 Compare November 8, 2023 17:53

lyrm added 2 commits November 9, 2023 10:32

Remove unnecessary Atomic.get

20355ce

Format.

1aeae33

polytypic reviewed Nov 9, 2023

View reviewed changes

polytypic reviewed Nov 11, 2023

View reviewed changes

polytypic reviewed Nov 12, 2023

View reviewed changes

lyrm closed this Nov 16, 2023


		let null_node = {key = Int.max_int; height = 0; next = [\|\|]}

		let max_height = 10

		@@ -0,0 +1,28 @@
		(* This dscheck testcase is not terminating. *)

Lockfree Skip List #65

Lockfree Skip List #65

Uh oh!

Conversation

sooraj-srini commented Mar 10, 2023 • edited by lyrm Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

To do in this PR

To do in a future PR

Uh oh!

Sudha247 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kayceesrk commented Apr 7, 2023

Uh oh!

lyrm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lyrm Apr 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lyrm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lyrm commented Apr 20, 2023

Uh oh!

Sudha247 commented Jun 19, 2023

Uh oh!

sooraj-srini commented Jun 20, 2023

Uh oh!

lyrm commented Jun 27, 2023

Uh oh!

lyrm commented Nov 9, 2023

Uh oh!

polytypic Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

polytypic Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

polytypic Nov 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

polytypic commented Nov 13, 2023

Uh oh!

sooraj-srini commented Mar 10, 2023 •

edited by lyrm

Loading

lyrm Apr 14, 2023 •

edited

Loading

polytypic Nov 9, 2023 •

edited

Loading

polytypic Nov 9, 2023 •

edited

Loading

polytypic Nov 12, 2023 •

edited

Loading