Fix overlap identification in `position_dodge2()` #5939

teunbrand · 2024-06-07T14:07:11Z

This PR aims to fix #4327 and fix #5938.

Briefly, find_x_overlap() is completely rewritten.
The trick was to search wether xmin[i + 1] is larger than cummax(xmax)[i], then start a new range.

The reprex from #4327, note no misplacement of points:

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2

df <- data.frame(
  dimension = paste0("dim ", c(1, 1, 2, 2)),
  group = c("A", "B", "A", "B"),
  value = c(0.6, 0.7, 0.4, 0.6)
)

ggplot(df, aes(dimension, value, colour = group)) +
  geom_point(position = position_dodge2(width = 0))

Reprex from #5938, note range 3 is correctly identified as overlapping with 1 & 2.

f <- function(x, xend) find_x_overlaps(data.frame(xmin = x, xmax = xend))

df <- data.frame(
  xmin = c(1, 2, 3, 5),
  xmax = c(4, 3, 4, 6),
  group = factor(1:4)
)

ggplot(df, aes(x = xmin, xend = xmax, y = group)) +
  geom_segment(aes(colour = factor(f(xmin, xmax))))

Better dodging behaviour:

ggplot(df, aes(xmin = xmin, xmax = xmax)) +
  geom_rect(aes(ymin = 0, ymax = 1, fill = group),
            alpha = 0.75,
            position = position_dodge2())

Time considerations in details (tl;dr: it's faster)

new <- function(df) {
  
  start   <- df$xmin
  nonzero <- df$xmax != df$xmin
  missing <- is.na(df$xmin) | is.na(df$xmax)
  end <- cummax(c(df$xmax[1], df$xmax[-nrow(df)]))
  overlaps <- cumsum(start > end | (start == end & nonzero))
  overlaps[missing] <- seq_len(sum(missing)) + max(overlaps, na.rm = TRUE)
  match(overlaps, unique0(overlaps))
}

old <- function(df) {
  overlaps <- numeric(nrow(df))
  overlaps[1] <- counter <- 1
  for (i in seq_asc(2, nrow(df))) {
    if (is.na(df$xmin[i]) || is.na(df$xmax[i - 1]) || df$xmin[i] >= df$xmax[i - 1]) {
      counter <- counter + 1
    }
    overlaps[i] <- counter
  }
  overlaps
}

df <- data.frame(xmin = runif(1000, max = 100))
df$xmax <- df$xmin + runif(1000, max = 2)
df <- df[order(df$xmin, df$xmax), ]

bench::mark(
  new(df),
  old(df),
  check = FALSE
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new(df)      29.4µs   30.8µs    27374.    99.9KB   106.  
#> 2 old(df)      1.53ms   1.61ms      579.    95.5KB     8.42

Also tested for smaller nrow(df) and new is faster than old from nrow(df) == 5 onwards.

^{Created on 2024-06-07 with reprex v2.1.0}

thomasp85

LGTM - nice work

teunbrand and others added 4 commits June 7, 2024 15:48

fix and vectorise find_x_overlaps()

8bcdd5f

add news bullet

a5aa12e

add test

9ae8a55

Merge branch 'main' into fix_overlap

bd99abe

teunbrand changed the title ~~Fix overlap identifier~~ Fix overlap identification in position_dodge2() Jun 7, 2024

teunbrand added 2 commits June 7, 2024 16:15

deal with NAs more properly

4a98ed9

pass end correctly

e97e1bd

thomasp85 approved these changes Jul 11, 2024

View reviewed changes

Merge branch 'main' into fix_overlap

5023549

teunbrand merged commit 6298aec into tidyverse:main Jul 11, 2024

teunbrand deleted the fix_overlap branch July 11, 2024 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix overlap identification in `position_dodge2()` #5939

Fix overlap identification in `position_dodge2()` #5939

Uh oh!

teunbrand commented Jun 7, 2024 •

edited

Loading

Uh oh!

thomasp85 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix overlap identification in position_dodge2() #5939

Fix overlap identification in position_dodge2() #5939

Uh oh!

Conversation

teunbrand commented Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasp85 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix overlap identification in `position_dodge2()` #5939

Fix overlap identification in `position_dodge2()` #5939

teunbrand commented Jun 7, 2024 •

edited

Loading