Skip to content

Determine how lazy LazyList should be #11307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NthPortal opened this issue Dec 13, 2018 · 26 comments
Closed

Determine how lazy LazyList should be #11307

NthPortal opened this issue Dec 13, 2018 · 26 comments

Comments

@NthPortal
Copy link

NthPortal commented Dec 13, 2018

Background

(quoted from #11105)

Currently, LazyList has two forms of laziness:

  1. laziness in determining whether or not it is empty (and by extension, how many elements it has)
  2. independent laziness of elements - that is, evaluating a subsequent head or tail of the LazyList does not cause the evaluation of previous heads.

Form 1 is one of the primary reasons for the creation of LazyList, is generally maintained throughout operations on it, and I think is is generally agreed to be of very high value.

Form 2 is less obviously valuable. Historically, it exists because the original implementation of LazyList had a strict state (empty or cons), and lazy head and tail (like Stream, but with a lazy head as well); when the design of LazyList was changed to have a lazy state and strict tail (because lazy tail doesn't add anything once the state is lazy), the lazy head was still kept.

During various discussions on various issues, it has been noted that the fact that LazyList's supports form 2 of laziness

  • makes some implementations more complex (though not overly so)
  • often adds no value, because the LazyList is created in a fashion (e.g. from an Iterator) that does not have form 2 of laziness
  • is subverted by many common operations (such as filter, dropWhile) which by their nature must evaluate elements sequentially in order to determine the next element

This yields the question (quoted from @sjrd at scala/scala#6880 (comment)):

As added food for thought: I wonder whether it is still useful that the head be lazy. Are there use cases where one already knows that the lazy list is non-empty, but doesn't know yet what its head will be (and cannot synchronously compute it)? My experience from programming in Oz (where this lazier LazyList is basically the base data type used everywhere--and optimized by the VM) doesn't support any such use case. [...]

(see also #11089 (comment))

Issue and Investigation Areas

Should LazyList support form 2 of laziness by having a lazy head? There are a few points to think about in this regard:

  1. How significant is the implementation cost of supporting form 2 of laziness? Keeping in mind that the implementation supporting form 2 of laziness already exists, but must be maintained.
  2. How significant is the performance cost of having a lazy head?
  3. Are there use cases where, despite the fact that the creation of the LazyList does not support form 2 of laziness, the fact that some methods (such as map) support form 2 of laziness is valuable because it allows performing independently lazy expensive operations?
  4. Are there use cases for which form 2 of laziness is valuable in the creation of the LazyList (in contrast to what is discussed in the previous question)? I and several others have yet to come up with one.

I think that if we cannot come up any compelling use cases for form 2 of laziness, taking into account how frequently it is not present anyway, then LazyList should be changed to have a strict head and not support form 2 of laziness. I say this despite the fact that I have invested significant effort into writing comprehensive tests which ensure that all methods which can preserve/support form 2 of laziness do so. Once 2.13 is out, it will be difficult to change LazyList's behaviour even in subsequent releases of Scala because it would be a significant change in semantics (much like the way Stream cannot be a deprecated type alias for LazyList).

Consensus

Form 2 of laziness should be removed, and head made strict.


I don't know if this issue actually needs to be resolved by 2.13.0-RC1, but it definitely needs to be resolved by the final release of 2.13.0, and it would probably be better to resolve it at least a few release candidates beforehand.

@NthPortal NthPortal added this to the 2.13.0-RC1 milestone Dec 13, 2018
@NthPortal
Copy link
Author

Regarding point 1: I am happy to maintain LazyList for the foreseeable future, but that's obviously not a guarantee forever

@lrytz
Copy link
Member

lrytz commented Dec 13, 2018

Thank you very much for writing this up! The design definitely needs to be finalized before RC1, which is scheduled for end of January.

It sounds to me like going for a strict head is a better choice for the standard library. The goal of the standard library is to find the "right" balance between simplicity, productivity, performance. Since we're not aware of a good use case for a lazy head right now, we should go for the simpler solution. If the need shows up, people can implement that collection outside the standard library, and if that really takes off we'll reconsider.

@NthPortal

This comment has been minimized.

@NthPortal

This comment has been minimized.

@NthPortal

This comment has been minimized.

@sjrd
Copy link
Member

sjrd commented Dec 14, 2018

I believe my opinion on this question is already clear to anyone who's read my various comments in the various issues, but I'll state it unambiguously here.

I think LazyList should not have a lazy head.

A lazy head clearly has more overhead and a higher implementation complexity, and as was mentioned we don't have any use case.

@NthPortal
Copy link
Author

NthPortal commented Dec 14, 2018

I believe I have come up with a use case (point 3).

You have a text file, and you read the lines into a LazyList (which does not have form 2 of laziness). You then perform an expensive analysis on each line of the file (via map).

val lines: LazyList[String] = ???
val analysed = lines map doExpensiveAnalysis

However, the result of the analysis may tell you that the following n lines are of little value to you (for whatever reason), so you want to drop them. With a lazy head, you can avoid the expensive analysis for the lines you are skipping using drop.

val filtered = {
  def next(elems: LazyList[Elem]): LazyList[Elem] =
    if (elems.isEmpty) elems
    else {
      val head = elems.head
      val skip = head.skip
      head #:: next(elems.drop(skip + 1))
    }
  next(analysed)
}

Without a lazy head, it is more difficult to avoid the analysis (because drop evaluates elements), and makes the code less clear (and requires an impure function updating a var).

var skip = 0
val filtered = lines flatMap { line =>
  if (skip > 0) {
    skip -= 1
    Nil
  } else {
    val elem = doExpensiveAnalysis(line)
    skip = elem.skipCount
    elem :: Nil
  }
}

If doExpensiveAnalysis is not its own method, the code becomes even less clear as the filtering (done via flatMap) and analysis are now mingled together.

Or perhaps you would like to do something with the last analysed line in the file matching some predicate, and then proceed to perform other operations (perhaps like the one above) starting from the front of the LazyList.

// side note: why isn't there a `findLast`?
val lastMatchingIdx = analysed.lastIndexWhere(somePredicate)
val lastMatching = analysed(lastMatchingIdx)

Without a lazy head, doing that search would require analysing all lines, whereas with a lazy head, it might analyse only a single line (if the predicate is true for the last line). It still needs to read all the lines, but it doesn't need to perform the expensive analysis for all of them.

@lrytz
Copy link
Member

lrytz commented Dec 14, 2018

Thanks, good to have an example! An alternative is to keep a reference to the original non-mapped lines and call drop on that.

I think as long as a lazy head "only" provides some convenience (and that doesn't seem to come up too often), it would be better to opt for the simpler model. If there was a more serious limitation like the one we had with 2.12 Stream (#10696), that would be a much stronger argument.

@odersky
Copy link

odersky commented Dec 15, 2018

I also tend to make head strict. It seems to be simpler conceptually.

@shawjef3
Copy link

shawjef3 commented Dec 18, 2018

I've run into a scenario where I wanted head to be lazy. Sometimes I want to use Iterator as control structure rather than a collection, where I get to accumulate results if I like. I use Stream (<= Scala 2.12) to lazily run each iteration and get the result. In these cases, when I call Iterator#toStream, I don't want the Iterator to execute until I explicitly start walking the Stream.

Here's an example of what happens for Scala 2.12. I don't want side effects when calling toStream, but there are.

scala> :paste
// Entering paste mode (ctrl-D to finish)

val i = new Iterator[Int] {
  var x = 0
  override def next(): Int = {
    val currentX = x
    x += 1
    println(currentX)
    currentX
  }

  override def hasNext: Boolean = true
}

// Exiting paste mode, now interpreting.

i: Iterator[Int]{def x: Int; def x_=(x$1: Int): Unit} = <iterator>

scala> i.toStream
0
res7: scala.collection.immutable.Stream[Int] = Stream(0, ?)

@sjrd
Copy link
Member

sjrd commented Dec 18, 2018

@shawjef3 What you're describing is "form 1" of laziness as defined in the first post of this thread, and that is a given for LazyList. We're only considering dropping form 2 of laziness.

@joroKr21
Copy link
Member

Is there any use case of lazy head which can't be solved by type LazierList[A] = LazyList[Need[A]] where Need is a simple wrapper that encapsulates evaluation by need (it exists in scalaz)?

@NthPortal
Copy link
Author

NthPortal commented Dec 18, 2018

@joroKr21 no, but that introduces another layer of indirection (and boxing and dependent load); if it's a strong use case, we should have it not require extra boxing and indirection.

Edit: it would also make it a pain to use, as all method calls would need another indirection of mapping or retrieving the lazy-boxed value.

@NthPortal
Copy link
Author

Requesting input from others who have been involved with LazyList design and/or bugs and may have input on this question.

pinging @hrhino @julienrf @szeiger @SethTisue @paulp.

I am currently leaning the same way as the current rough consensus, to make head strict and remove "form 2" of laziness.

@OlegYch
Copy link

OlegYch commented Dec 24, 2018

i often hear people complaining about Stream head being strict, and then some people started pointing out that it's fixed in LazyList
does this suggest to revert to Stream behavior?

@sjrd
Copy link
Member

sjrd commented Dec 25, 2018

No it doesn't. When people complain about the head being strict in Stream, they are in fact referring to form 1 of laziness, and that one we keep.

@szeiger
Copy link

szeiger commented Jan 8, 2019

Given the high complexity of the new implementation I also agree with forcing elements when their existence has been determined. It would also resolve the problems with IterableFactory.fill (which doesn't really make sense for LazyList if the elements are not evaluated in order) and with Iterator not being lazy enough as a basic abstraction.

@szeiger szeiger added the blocker label Jan 8, 2019
@SethTisue
Copy link
Member

I'm fine with dropping form 2. It turned out to be a bigger can of worms that any of us anticipated.

@NthPortal NthPortal self-assigned this Jan 8, 2019
@SethTisue
Copy link
Member

@NthPortal also I think there's now sufficient consensus that you can go forward with this.

@Atry
Copy link

Atry commented Jul 23, 2019

LazyList[A] is essentially a () => LazyList.State[A], and Deferrer[A] is essentially a () => LazyList[A], therefore, Deferrer[A] is essentially a () => () => LazyList.State[A]. Does Deferrer really necessary?

@Jasper-M
Copy link

Jasper-M commented Jul 23, 2019

Looks like Deferrer was added as a workaround for a bug with right-associative operators: scala/collection-strawman#127 scala/collection-strawman#128
But that issue also seems to have been fixed in the compiler: scala/scala#5969

@NthPortal
Copy link
Author

you need Deferrer to allow defining recursive or circular LazyLists. this is what happens if #:: and #::: are defined on LazyList:

scala> lazy val ones: LazyList[Int] = 1 #:: ones
ones: LazyList[Int] = <lazy>

scala> ones.head
java.lang.StackOverflowError
  at .ones(<console>:1)
  at .ones$lzycompute(<console>:1)
  at .ones(<console>:1)
  at .ones$lzycompute(<console>:1)

scala> lazy val nats: LazyList[Int] = 1 #:: nats.map(_ + 1)
nats: LazyList[Int] = <lazy>

scala> nats.head
java.lang.StackOverflowError
  at .nats$lzycompute(<console>:1)
  at .nats(<console>:1)
  at .nats$lzycompute(<console>:1)
  at .nats(<console>:1)

@szeiger
Copy link

szeiger commented Aug 1, 2019

see scala/scala#7990 for a way that could potentially eliminate Deferrer

@NthPortal
Copy link
Author

@szeiger I'm not completely following how it can eliminate Deferrer. Deferrer is currently used/required to avoid initializing the base LazyList in recursively-defined instances; how can dropping the reference to this get around that?

@szeiger
Copy link

szeiger commented Aug 2, 2019

Oh, right, you need both operands to be lazy. I wonder if extension methods could be used. The current documentation doesn't mention it.

@NthPortal
Copy link
Author

if I understand correctly, Deferrer is essentially an extension method over a by-name value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests