-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Optimize Collection unique() strict mode for string-only collections #57501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Collection unique() strict mode for string-only collections #57501
Conversation
|
Not to beat a dead horse, but, this provides more performance gains in strict mode for mixed type or non-integer type arrays by using a hashing method to avoid O(n²) complexity. Anyone think it's acceptably BC? public function unique($key = null, $strict = false)
{
// Strategy 1: No key, loose comparison
if ($key === null && $strict === false) {
return new static(array_unique($this->items, SORT_REGULAR));
}
// Strategy 2: No key, STRICT comparison (optimized)
if ($key === null && $strict === true) {
$canUseFastPath = true;
foreach ($this->items as $item) {
if (!is_string($item)) {
$canUseFastPath = false;
break;
}
}
if ($canUseFastPath) {
return new static(array_unique($this->items, SORT_STRING));
}
$seen = [];
$result = [];
foreach ($this->items as $k => $item) {
if (is_array($item)) {
$hash = serialize($item);
} elseif (is_object($item)) {
$hash = spl_object_hash($item);
} else {
$hash = gettype($item) . ':' . $item;
}
if (!isset($seen[$hash])) {
$seen[$hash] = true;
$result[$k] = $item;
}
}
return new static($result);
}
// Strategy 3: WITH KEY and STRICT (optimized)
if ($key !== null && $strict === true) {
$callback = $this->valueRetriever($key);
$seen = [];
$result = [];
foreach ($this->items as $k => $item) {
$id = $callback($item, $k);
if (is_array($id)) {
$hash = serialize($id);
} elseif (is_object($id)) {
$hash = spl_object_hash($id);
} else {
$hash = gettype($id) . ':' . $id;
}
if (!isset($seen[$hash])) {
$seen[$hash] = true;
$result[$k] = $item;
}
}
return new static($result);
}
// Fallback: Loose mode or complex cases - use original in_array approach
$callback = $this->valueRetriever($key);
$exists = [];
return $this->reject(function ($item, $key) use ($callback, $strict, &$exists) {
if (in_array($id = $callback($item, $key), $exists, $strict)) {
return true;
}
$exists[] = $id;
});
} |
|
And here's what the same hash based optimization on LazyCollection would look like: public function unique($key = null, $strict = false)
{
$callback = $this->valueRetriever($key);
return new static(function () use ($callback, $strict) {
$exists = [];
foreach ($this as $itemKey => $item) {
$id = $callback($item, $itemKey);
// Optimize ALL strict mode cases (not just $key === null)
if ($strict) {
// SAME hash logic as Collection for consistency
if (is_array($id)) {
$hash = serialize($id);
} elseif (is_object($id)) {
$hash = spl_object_hash($id);
} else {
$hash = gettype($id) . ':' . $id;
}
if (!isset($exists[$hash])) {
$exists[$hash] = true;
yield $itemKey => $item;
}
} else {
// Loose mode: keep in_array (BC safe)
if (!in_array($id, $exists, $strict)) {
yield $itemKey => $item;
$exists[] = $id;
}
}
}
});
}This PR does not contain these hash table dedups. But if someone can help me more thoroughly test the hash table with "real world" scenarios, it might be worth another PR. |
|
@jmarble |
Exactly my suggestion (#57480 (comment)) 👍🏻 |
|
Thanks for your pull request to Laravel! Unfortunately, I'm going to delay merging this code for now. To preserve our ability to adequately maintain the framework, we need to be very careful regarding the amount of code we include. If applicable, please consider releasing your code as a package so that the community can still take advantage of your contributions! |
|
@taylorotwell sorry to have previously made a mountain out of a molehill here, but this new PR is targeted, 100% BC, and provides an exponential performance gain when This PR is critical, because 4 years ago the unpredictable |
@macropay-solutions that's definitely a creative solution, but doesn't feel like the "Laravel way" of doing things. |
|
@jmarble we saw it first in laravel:) it is not our original solution. It was previously used by someone else for a patch revision. |
|
@macropay-solutions I have a |
|
|
|
@macropay-solutions very true, but I think for now it seems @taylorotwell would like to see something elegant? |
|
A possible backward compatible solution: This may impact also emails that start with a digit. A macro in the Collection can be registered and used instead of ->unique() on collections |
* Adapt to php/php-src#20262 and laravel/framework#57501 * Version * CR --------- Co-authored-by: Pantea Marius-ciclistu <>
How is having two methods more elegant than one? It seems to me, in the end, this is down to personal preference. I don't see @macropay-solutions's solution as antithetical to the framework's philosophy. |

Summary
This PR adds an exponential performance improvement for
unique(null, true)when the collection contains only strings.Tested on:
Implementation
Added an optimization path for strict mode that detects string-only collections and uses native
array_unique().Backward Compatibility
100% Backward Compatible
Performance Impact
See benchmark results here: https://gist.github.com/jmarble/27faeeece7ac58edecbc2024e84f842d
Benchmark test code (thank you Claude!): https://gist.github.com/jmarble/adf92aef31009652c8616c5b8d2966e9
Real-world use cases that benefit:
Why String-Only?
SORT_STRINGconverts all values to strings before comparison. This works perfectly for strict comparison when:1and'1'The optimization automatically falls back to the existing implementation for mixed-type collections.
No Breaking Changes
SORT_REGULARdespite the sometimes unpredictable results)