Skip to content

Conversation

@teunbrand
Copy link
Collaborator

@teunbrand teunbrand commented Jun 7, 2024

This PR aims to fix #4327 and fix #5938.

Briefly, find_x_overlap() is completely rewritten.
The trick was to search wether xmin[i + 1] is larger than cummax(xmax)[i], then start a new range.

The reprex from #4327, note no misplacement of points:

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2

df <- data.frame(
  dimension = paste0("dim ", c(1, 1, 2, 2)),
  group = c("A", "B", "A", "B"),
  value = c(0.6, 0.7, 0.4, 0.6)
)

ggplot(df, aes(dimension, value, colour = group)) +
  geom_point(position = position_dodge2(width = 0))

Reprex from #5938, note range 3 is correctly identified as overlapping with 1 & 2.

f <- function(x, xend) find_x_overlaps(data.frame(xmin = x, xmax = xend))

df <- data.frame(
  xmin = c(1, 2, 3, 5),
  xmax = c(4, 3, 4, 6),
  group = factor(1:4)
)

ggplot(df, aes(x = xmin, xend = xmax, y = group)) +
  geom_segment(aes(colour = factor(f(xmin, xmax))))

Better dodging behaviour:

ggplot(df, aes(xmin = xmin, xmax = xmax)) +
  geom_rect(aes(ymin = 0, ymax = 1, fill = group),
            alpha = 0.75,
            position = position_dodge2())

Time considerations in details (tl;dr: it's faster)

new <- function(df) {
  
  start   <- df$xmin
  nonzero <- df$xmax != df$xmin
  missing <- is.na(df$xmin) | is.na(df$xmax)
  end <- cummax(c(df$xmax[1], df$xmax[-nrow(df)]))
  overlaps <- cumsum(start > end | (start == end & nonzero))
  overlaps[missing] <- seq_len(sum(missing)) + max(overlaps, na.rm = TRUE)
  match(overlaps, unique0(overlaps))
}

old <- function(df) {
  overlaps <- numeric(nrow(df))
  overlaps[1] <- counter <- 1
  for (i in seq_asc(2, nrow(df))) {
    if (is.na(df$xmin[i]) || is.na(df$xmax[i - 1]) || df$xmin[i] >= df$xmax[i - 1]) {
      counter <- counter + 1
    }
    overlaps[i] <- counter
  }
  overlaps
}

df <- data.frame(xmin = runif(1000, max = 100))
df$xmax <- df$xmin + runif(1000, max = 2)
df <- df[order(df$xmin, df$xmax), ]

bench::mark(
  new(df),
  old(df),
  check = FALSE
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new(df)      29.4µs   30.8µs    27374.    99.9KB   106.  
#> 2 old(df)      1.53ms   1.61ms      579.    95.5KB     8.42

Also tested for smaller nrow(df) and new is faster than old from nrow(df) == 5 onwards.

Created on 2024-06-07 with reprex v2.1.0

@teunbrand teunbrand changed the title Fix overlap identifier Fix overlap identification in position_dodge2() Jun 7, 2024
Copy link
Member

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - nice work

@teunbrand teunbrand merged commit 6298aec into tidyverse:main Jul 11, 2024
@teunbrand teunbrand deleted the fix_overlap branch July 11, 2024 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Internal function find_x_overlaps() has bug point is misplaced if width = 0 is used in position_dodge2 with geom_point

2 participants