Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Add a SortedSeq collection type #132

Closed
julienrf opened this issue Jun 28, 2017 · 20 comments
Closed

Add a SortedSeq collection type #132

julienrf opened this issue Jun 28, 2017 · 20 comments
Milestone

Comments

@julienrf
Copy link
Contributor

We have sorted maps and sets but not seqs. This one might also be handy.

@NthPortal
Copy link
Contributor

NthPortal commented Jul 5, 2017

Maps and sets have some sort of put, insert, or + operation, which, in general, does not specify where in the map/set the element will be placed. If it is a sorted map/set, the operation specifies that the element will be placed in sorted order.

On the other hand, sequences have:

  • for immutable: prepend (+:) and append (:+)
  • for mutable (GrowableSeq): insert(index, elem), as well as add and +=
    operations, which specify where in the sequence the element gets placed (add and += seem to place the elements at the beginning, based on this comment).

Would a sorted sequence define a new operation which places elements in a location based on the ordering? How would the new method(s) for this operation be differentiated from the existing methods, which already define an insertion location? For a mutable sorted sequence, it feels like that would collide with the methods in Growable/GrowableSeq. For an immutable sorted sequence, though it seems less likely to collide with existing methods, it still seems like it could be confusing.

@NthPortal
Copy link
Contributor

Also, without sorted maps/sets, it would be difficult to have a map/set with elements in a sorted order, as regular maps/sets have no defined order. However, sequences have a well-defined order, and have a sorted method if you want to reorder them according to an Ordering. What does a SortedSeq add over calling Seq.sorted?

@Ichoran
Copy link
Contributor

Ichoran commented Jul 5, 2017

Good points/questions. Though for reference, += adds to the end. (E.g. all builders use += to append.)

@NthPortal
Copy link
Contributor

I had actually originally thought += should add to the end, but the commented-out code (apparently wrong) said otherwise.

@Ichoran
Copy link
Contributor

Ichoran commented Jul 5, 2017

@NthPortal - += is not +=:

@NthPortal
Copy link
Contributor

NthPortal commented Jul 5, 2017

@Ichoran ah, thanks! - I had misread it

@szeiger
Copy link
Contributor

szeiger commented Jul 11, 2017

Rather than adding an ordering to a Seq it may be more useful to add duplicate elements to a Set, i.e. a MultiSet and SortedMultiSet. This would have to be a different basic collection type, it is neither a Seq nor a Set.

@szeiger
Copy link
Contributor

szeiger commented Jul 11, 2017

I see we already have #133 for MultiSet (but no mention of a SortedMultiSet there)

@julienrf
Copy link
Contributor Author

Actually I’ve just noticed that we have a mutable.PriorityQueue, which is essentially a sorted (mutable) Seq. It’s implemented in the current collections but hasn’t been ported to the strawman. We could make an immutable version as well.

@NthPortal
Copy link
Contributor

Does it behave enough as a Seq? I know that Java's PriorityQueue is not guaranteed to maintain the insertion order of elements which compare as equal (link).

If multiple elements are tied for least value, the head is one of those elements -- ties are broken arbitrarily.

I agree that it should probably be ported over; I'm just not sure how well it fits the Seq trait.

@julienrf
Copy link
Contributor Author

julienrf commented Nov 2, 2017

Is that property really important? I think most of the value of a PriorityQueue is just that order of iteration is always sorted, whatever the order of insertion is, and without having to intermediately call sort.

@Ichoran
Copy link
Contributor

Ichoran commented Nov 3, 2017

I'm not sure sorted things are enough like a Seq at all, even if ties are broken in a consistent way. One of the core laws for seqs as we have them now is effectively

val z = x ++ y
assert(z.take(x.size) == x && z.drop(x.size) == y)

That is, ordering is preserved across merges and splits. Of course one can allow exceptions to this, but it can yield surprising behavior when all you know is you have Seq and you're not thinking that the elements will run around when you concat.

@julienrf
Copy link
Contributor Author

julienrf commented Nov 7, 2017

Indeed, that makes sense. So the solution is probably to use a sorted MultiSet (see #133).

@joshlemer
Copy link

joshlemer commented Nov 10, 2017

@Ichoran Maybe we could allow ++ to return some kind of not-necessarily-ordered Seq (List, Seq, Vector, depending on what kind of sorted thing it is) which would conform to your law, then add appropriate

merge[CC[+_](that: Sorted[CC, A]): Sorted[CC, A]

method which preserves ordering?
@julienrf MultiSet may not cover all use cases of a SortedVector or SortedList, because of course a SortedVector would have much faster random access, and a SortedList better .tail etc

@julienrf
Copy link
Contributor Author

@joshlemer I’s not just about ++, but :+ and +: as well.

I see the value of the law shown by @Ichoran but I think a more general kind of Seq, which wouldn’t satisfy this law, would still be useful as long as it still satisfies that one: (xs ++ ys).size == xs.size + ys.size. Unfortunately that would mean adding a new branch to the hierarchy:

image

LinkedSeq would preserve insertion order (and satisfy @Ichoran’s law), whereas SortedSeq wouldn’t. (I don’t like the Linked prefix, by the way, but I’m afraid Ordered would not contrast enough with Sorted, thus being confusing)

Now the question is whether we want to introduce a new branch or not…

@julienrf
Copy link
Contributor Author

julienrf commented Nov 13, 2017

MultiSet may not cover all use cases of a SortedVector or SortedList, because of course a SortedVector would have much faster random access, and a SortedList better .tail etc

I would argue that if you want a sorted sequence you primarily want efficient insertion and deletion, which you wouldn’t get with a SortedList or a SortedVector. Therefore, I would suggest using a dedicated data structure (e.g. Heap, in my diagram above), so maybe MultiSet would work well enough…

@NthPortal
Copy link
Contributor

As @Ichoran said, SortedSeq does not adhere to the current rules for Seq, and I think users would find its behavior quite unexpected. If it were added anywhere, I think it would be more appropriate to attach it to Iterable in a separate branch of the hierarchy from Seq.

@julienrf
Copy link
Contributor Author

@NthPortal We would push down the Seq laws that are not satisfied by SortedSeq to LinkedSeq

@odersky
Copy link
Contributor

odersky commented Nov 14, 2017

I think changing the Seq hierarchy would upset too much. We have so far:

              Iterable

  Seq         Map        Set

And each of the lower classes defines apply in a fundamentally different way. That's simple and intuitive. I do not want to give that up.

So, if we do want to add SortedSeq we should do it on the side, in an extension module. I, for one, never had a usecase except a concrete priority queue, but there was no need to abstract over the sortedness of that one.

@Ichoran
Copy link
Contributor

Ichoran commented Nov 14, 2017

I don't think Seq which is not LinkedSeq is a useful abstraction. Essentially all the interesting properties of Seq that are different from both Set and Map have to do with being able to retain order. An "unpredictably ordered thing that you can index into" can just be an Iterable with an apply method.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants