-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Specialized pure Order/OrderDescending Distinct #120131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
/// <summary>A type can be pure ordered when every single time equal elements is side by side: [ (someVal), (someVal), (otherVal), (otherVal) ]</summary> | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
internal static bool TypeCanBePureOrdered<T>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A type can be pure ordered when the equality methods is pure, meaning for the same input always have the same output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a correct application of the term "pure". A struct with two fields implementing IEquatable that only projects to one field is also pure in that sense but it is invalid from the perspective of this optimization. The defining property you seem to be testing for is implicit stability, so this is merely an extension of the existing TypeIsImplicitlyStable
to more types.
Could !typeof(IEquatable<T>).IsAssignableFrom(typeof(T)) && !RuntimeHelpers.IsReferenceOrContainsReferences<T>()
be an acceptable proxy for widening this test to more types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could !typeof(IEquatable).IsAssignableFrom(typeof(T)) && !RuntimeHelpers.IsReferenceOrContainsReferences() be an acceptable proxy for widening this test to more types?
Answering my own question, probably not because T
could contain fields that themselves implement IEquatable<T>
in an unstable manner. I don't think we need this method, TypeIsImplicitlyStable
is probably good enough, although we could have conversation about how much we can extend its scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that "Pure" is not the best word to describe that, but that's the best name that I came up with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could !typeof(IEquatable).IsAssignableFrom(typeof(T)) && !RuntimeHelpers.IsReferenceOrContainsReferences() be an acceptable proxy for widening this test to more types?
Answering my own question, probably not because
T
could contain fields that themselves implementIEquatable<T>
in an unstable manner. I don't think we need this method,TypeIsImplicitlyStable
is probably good enough, although we could have conversation about how much we can extend its scope.
The method TypeIsImplicitlyStable is a subset of what the optimization can handle
Example:
class PureOrderableType : IEquatable<PureOrderableType>
{
public int Value { get; set; }
public bool Equals(PureOrderableType? other)
{
return other?.Value == Value;
}
public override bool Equals(object? obj)
{
return Equals(obj as PureOrderableType);
}
public override int GetHashCode()
{
return Value;
}
}
var firstElement = new PureOrderableType { Value = 1 };
var secondElement = new PureOrderableType { Value = 1 };
List<PureOrderableType> list = [firstElement, secondElement];
Although
firstElement.Equals(secondElement) == true
it needs to be in that order (stable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is implicit stability important here? If you have a sorted enumerable and equality is compatible with comparison, it follows that equal elements should appear sequentially. By applying your algorithm where the first element from each group of equal values is being yielded, you should obtain behavior that is equivalent to
Distinct
even when applied toDateTimeOffset
.
The implicitly stability is not important,
Consider this code:
List<DateTimeOffset> dateTimeOffsets = new List<DateTimeOffset>
{
DateTimeOffset.Parse("2025-09-29T12:00:00+00:00"),
DateTimeOffset.Parse("2025-09-29T08:00:00-04:00"),
DateTimeOffset.Parse("2025-09-29T07:00:00-05:00")
}.Order().ToList();
dateTimeOffsets.Distinct()
Will return: DateTimeOffset.Parse("2025-09-29T12:00:00+00:00") (The first item)
if the sorting wasn't stable the Distinct() can return other item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linq sorting is stable. I guess what I'm trying to hint at is that your change looks promising, but you need to better clarify the conditions for when the optimization kicks in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linq sorting is stable. I guess what I'm trying to hint at is that your change looks promising, but you need to better clarify the conditions for when the optimization kicks in.
I think that's the conditions for this optimization
If the type is not implicitly stable, it should use a stable sort, otherwise it can use an unstable (doesn't matter)
When looping the ordered collection the current element (when is a sequence of equal elements like [1,1,1] is threated like a single element [1]) it not equals to any other past element
To achive that, the methods GetHashCode
, CompareTo
, Equals
should be properly implemented
"properly implemented" means that the methods must agree with each other
Like this example:
class SomeType : IEquatable<SomeType>, IComparable<SomeType>
{
public int Value { get; set; }
public int CompareTo(SomeType? other)
{
return (Value % 2).CompareTo(other.Value % 2);
}
public override int GetHashCode()
{
return Value % 2;
}
public bool Equals(SomeType? other)
{
return (other?.Value % 2) == (Value % 2);
}
}
Other good example:
class OtherGoodType : IEquatable<OtherGoodType>, IComparable<OtherGoodType>
{
public int Value { get; set; }
public int CompareTo(OtherGoodType? other)
{
return (Value).CompareTo(other.Value);
}
public override int GetHashCode()
{
return Value;
}
public bool Equals(OtherGoodType? other)
{
return other?.Value == Value;
}
}
This would be invalid:
class InvalidType : IEquatable<InvalidType>, IComparable<InvalidType>
{
public int Value { get; set; }
public int CompareTo(InvalidType? other)
{
return (Value % 2).CompareTo(other.Value % 2);
}
public override int GetHashCode()
{
return Value % 2;
}
public bool Equals(InvalidType? other)
{
return other?.Value == Value; // Does not follow CompareTo
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're in agreement, but I would like you to update the name of your test method to better reflect what is being tested and potentially include even more types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're in agreement, but I would like you to update the name of your test method to better reflect what is being tested and potentially include even more types.
I added more tests, I tried to improve the names, it's better, but I agree that they aren't the best names
And I remembered other 2 types to add to this optimization: DateOnly
and TimeOnly
And about the failing tests in the ci, I believe that it's not related to this changes, all the logs says:
unable to pull image...
Tagging subscribers to this area: @dotnet/area-system-linq |
Type? nullableUnderlyingType = Nullable.GetUnderlyingType(t); | ||
if (nullableUnderlyingType != null) | ||
{ | ||
t = nullableUnderlyingType; | ||
} | ||
|
||
if (typeof(T).IsEnum) | ||
{ | ||
t = typeof(T).GetEnumUnderlyingType(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can have negative impact. It may break constant folding for typeof(T) == typeof(X)
. What does this path handle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The equality methods in nullable only add the null check, which are pure, if it is not null there is a fallback to the equality methods of the internal type
basically only the internal type of nullable matters
Assert.Equal(expected, source.Order().Distinct()); | ||
} | ||
|
||
[Fact] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I Added only the iterator part in the tests, because the other methods like ToArray
, ToList
, Count
doesn't add much logic and basically just consume the iterator, and they are in the tests, but only for int
I skipped the tests to some types such as long
, short
, Half
, etc. because, in the tests it already has a numeric type (int
), and floating point types (float
, double
, decimal
)
} | ||
} | ||
|
||
private sealed partial class PureOrderedIteratorImpl<TElement> : PureOrderedIterator<TElement> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got too distracted by the naming (FWIW I still think "pure" is inappropriate terminology and should be renamed) and didn't take a closer look at the value prop of the optimization: why would somebody want to chain Order()
with Distinct()
, a.k.a. impose a Distinct()
on the source enumerable is going to be more much more efficient regardless of this optimization, and if sorted outputs are still a prerequesite Order()
should be called after Distinct()
so that handles a potentially smaller number of elements.
Optimizations like this don't come free: besides the added type tests that it imposes each generic type being added to Linq can substantially grow the static footprint or memory usage of an application.
So while the optimization is a clever one, the fact that it is being applied to a pattern that is inefficient by definition makes me think that we shouldn't take this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, the path Order().Distinct() is not that common, my idea was using that with disconnected functions like
void SomeFunction(IEnumerable<SomeType> enumerable)
{
enumerable.Distinct().UseThisSomeHow() // if it's ordered, great, if it's not use the hashset
}
But probably with the limitations of the optimization would be better simply swapping the order of the operators
Order().Distinct() instead of using the operators in order, swapping to Distinct().Order(), which is faster, but in therms of memory it is not so good
public class Benchmarks
{
private static readonly Random _rnd = new Random(4);
private static readonly IEnumerable<int> _list = Enumerable.Range(0, 10000).Select(i => _rnd.Next(10000)).ToList();
[Benchmark]
public List<int> OrderDistinct()
{
return _list.Order().Distinct().ToList();
}
[Benchmark]
public List<int> DistinctOrder()
{
return _list.Distinct().Order().ToList();
}
}
| Method | Toolchain | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|-------------- |-------------------- |---------:|--------:|--------:|--------:|--------:|--------:|----------:|
| DistinctOrder | PureOrderedDistinct | 255.7 us | 0.74 us | 0.69 us | 38.0859 | 38.0859 | 38.0859 | 182.94 KB |
| DistinctOrder | Default | 271.0 us | 1.48 us | 1.23 us | 38.0859 | 38.0859 | 38.0859 | 182.94 KB |
| OrderDistinct | PureOrderedDistinct | 344.2 us | 3.32 us | 3.10 us | 3.9063 | - | - | 63.99 KB |
| OrderDistinct | Default | 426.6 us | 1.82 us | 1.70 us | 30.7617 | 30.7617 | 30.7617 | 316.22 KB |
imposes each generic type being added to Linq can substantially grow the static footprint or memory usage of an application
The creation of the new type can be useful for other (still really uncommon) optimizations like Order().Except(), etc.
I'm not saying that this would be a game changer but probably it would be used in more places, not just Order().Distinct()
I think I got too distracted by the naming (FWIW I still think "pure" is inappropriate terminology and should be renamed)
I agree with that, maybe AdjacentOrdered would be a better name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The creation of the new type can be useful for other (still really uncommon) optimizations like Order().Except(), etc.
I'm not saying that this would be a game changer but probably it would be used in more places, not just Order().Distinct()
The problem is that they all appear to be equally unlikely. As such, I expect this would be introducing marginal perf regressions in the 99% of uses of either Order
or Distinct
where this pattern does not hold.
Thanks for submitting this PR, it is legitimately clever and thought provoking even though unfortunately it doesn't meet the bar for inclusion into LINQ.
Fixes #120125
Changed to use specialized PureOrderedDistinct
Benchmark: