Skip to content

Conversation

@jmarble
Copy link

@jmarble jmarble commented Oct 22, 2025

Summary

This PR adds an exponential performance improvement for unique(null, true) when the collection contains only strings.

Tested on:

  • PHP 8.4.13
  • Laravel 12.x

Implementation

Added an optimization path for strict mode that detects string-only collections and uses native array_unique().

Backward Compatibility

100% Backward Compatible

  • No changes to non-strict mode behavior
  • Strict mode behavior unchanged (still uses strict comparison)
  • Only adds performance optimization for string-only collections

Performance Impact

See benchmark results here: https://gist.github.com/jmarble/27faeeece7ac58edecbc2024e84f842d
Benchmark test code (thank you Claude!): https://gist.github.com/jmarble/adf92aef31009652c8616c5b8d2966e9

Real-world use cases that benefit:

// Unit numbers, apartment codes
collect(['5', '10', '5', '3A', '5'])->unique(null, true);

// SKUs, product codes
collect(['SKU-001', 'SKU-002', 'SKU-001'])->unique(null, true);

// Email addresses
collect(['[email protected]', '[email protected]', '[email protected]'])->unique(null, true);

// File paths
collect(['/path/to/file', '/path/to/other', '/path/to/file'])->unique(null, true);

Why String-Only?

SORT_STRING converts all values to strings before comparison. This works perfectly for strict comparison when:

  • All items are strings -> No type conversion needed
  • Mixed types (int/string/bool) -> Would incorrectly merge 1 and '1'

The optimization automatically falls back to the existing implementation for mixed-type collections.

No Breaking Changes

  • Non-strict mode: Unchanged (uses SORT_REGULAR despite the sometimes unpredictable results)
  • Strict mode with mixed types: Unchanged (uses reject + in_array)
  • Strict mode with strings: Optimized (same behavior, faster execution)

@jmarble
Copy link
Author

jmarble commented Oct 22, 2025

Not to beat a dead horse, but, this provides more performance gains in strict mode for mixed type or non-integer type arrays by using a hashing method to avoid O(n²) complexity. Anyone think it's acceptably BC?

    public function unique($key = null, $strict = false)
    {
        // Strategy 1: No key, loose comparison
        if ($key === null && $strict === false) {
            return new static(array_unique($this->items, SORT_REGULAR));
        }

        // Strategy 2: No key, STRICT comparison (optimized)
        if ($key === null && $strict === true) {
            $canUseFastPath = true;
            foreach ($this->items as $item) {
                if (!is_string($item)) {
                    $canUseFastPath = false;
                    break;
                }
            }

            if ($canUseFastPath) {
                return new static(array_unique($this->items, SORT_STRING));
            }
            
            $seen = [];
            $result = [];

            foreach ($this->items as $k => $item) {
                if (is_array($item)) {
                    $hash = serialize($item);
                } elseif (is_object($item)) {
                    $hash = spl_object_hash($item);
                } else {
                    $hash = gettype($item) . ':' . $item;
                }

                if (!isset($seen[$hash])) {
                    $seen[$hash] = true;
                    $result[$k] = $item;
                }
            }

            return new static($result);
        }

        // Strategy 3: WITH KEY and STRICT (optimized)
        if ($key !== null && $strict === true) {
            $callback = $this->valueRetriever($key);
            
            $seen = [];
            $result = [];

            foreach ($this->items as $k => $item) {
                $id = $callback($item, $k);
                
                if (is_array($id)) {
                    $hash = serialize($id);
                } elseif (is_object($id)) {
                    $hash = spl_object_hash($id);
                } else {
                    $hash = gettype($id) . ':' . $id;
                }

                if (!isset($seen[$hash])) {
                    $seen[$hash] = true;
                    $result[$k] = $item;
                }
            }

            return new static($result);
        }

        // Fallback: Loose mode or complex cases - use original in_array approach
        $callback = $this->valueRetriever($key);
        $exists = [];

        return $this->reject(function ($item, $key) use ($callback, $strict, &$exists) {
            if (in_array($id = $callback($item, $key), $exists, $strict)) {
                return true;
            }

            $exists[] = $id;
        });
    }

@jmarble
Copy link
Author

jmarble commented Oct 23, 2025

And here's what the same hash based optimization on LazyCollection would look like:

    public function unique($key = null, $strict = false)
    {
        $callback = $this->valueRetriever($key);

        return new static(function () use ($callback, $strict) {
            $exists = [];

            foreach ($this as $itemKey => $item) {
                $id = $callback($item, $itemKey);

                // Optimize ALL strict mode cases (not just $key === null)
                if ($strict) {
                    // SAME hash logic as Collection for consistency
                    if (is_array($id)) {
                        $hash = serialize($id);
                    } elseif (is_object($id)) {
                        $hash = spl_object_hash($id);
                    } else {
                        $hash = gettype($id) . ':' . $id;
                    }

                    if (!isset($exists[$hash])) {
                        $exists[$hash] = true;
                        yield $itemKey => $item;
                    }
                } else {
                    // Loose mode: keep in_array (BC safe)
                    if (!in_array($id, $exists, $strict)) {
                        yield $itemKey => $item;
                        $exists[] = $id;
                    }
                }
            }
        });
    }

This PR does not contain these hash table dedups. But if someone can help me more thoroughly test the hash table with "real world" scenarios, it might be worth another PR.

@macropay-solutions
Copy link

macropay-solutions commented Oct 23, 2025

@jmarble
What do you think of this faster non breaking change that leaves the decision to the developer in terms of the flag used?
image

@shaedrich
Copy link
Contributor

What do you think of this faster non breaking change that leaves the decision to the developer in terms of the flag used?

Exactly my suggestion (#57480 (comment)) 👍🏻

macropay-solutions pushed a commit to macropay-solutions/maravel-framework that referenced this pull request Oct 23, 2025
@taylorotwell
Copy link
Member

Thanks for your pull request to Laravel!

Unfortunately, I'm going to delay merging this code for now. To preserve our ability to adequately maintain the framework, we need to be very careful regarding the amount of code we include.

If applicable, please consider releasing your code as a package so that the community can still take advantage of your contributions!

@jmarble
Copy link
Author

jmarble commented Oct 23, 2025

@taylorotwell sorry to have previously made a mountain out of a molehill here, but this new PR is targeted, 100% BC, and provides an exponential performance gain when uniqueStrict() is required for a collection of strings.

This PR is critical, because 4 years ago the unpredictable SORT_REGULAR was introduced, which was not BC. This PR aims to provide some consolation and reliable path for a performant implementation of uniqueStrict().

@jmarble
Copy link
Author

jmarble commented Oct 23, 2025

What do you think of this faster non breaking change that leaves the decision to the developer in terms of the flag used?

@macropay-solutions that's definitely a creative solution, but doesn't feel like the "Laravel way" of doing things.

@macropay-solutions
Copy link

@jmarble we saw it first in laravel:) it is not our original solution. It was previously used by someone else for a patch revision.

@jmarble
Copy link
Author

jmarble commented Oct 23, 2025

@macropay-solutions I have a uniqueStrings() method which I think is the correct path forward here and it should be super performant -- much better than the current buggy unique().

@macropay-solutions
Copy link

macropay-solutions commented Oct 23, 2025

$collection->unique(null, false, SORT_STRING) would work with our proposed solution. Also if the Collection would be retrieved from DI instead of being instantiated directly, you could override the unique function.

@jmarble
Copy link
Author

jmarble commented Oct 23, 2025

@macropay-solutions very true, but I think for now it seems @taylorotwell would like to see something elegant? uniqueStrings() is my attempt to do that, in a surprisingly performant way haha it makes SORT_REGULAR look slow! See #57517

@macropay-solutions
Copy link

macropay-solutions commented Oct 24, 2025

A possible backward compatible solution:
php/php-src#20262 (comment)

This may impact also emails that start with a digit.

A macro in the Collection can be registered and used instead of ->unique() on collections
Or $collection->unique()->toArray() can be replaced with \array_unique($collection->all()); for strings.

macropay-solutions pushed a commit to macropay-solutions/maravel-framework that referenced this pull request Oct 24, 2025
macropay-solutions added a commit to macropay-solutions/maravel-framework that referenced this pull request Oct 24, 2025
* Adapt to php/php-src#20262 and laravel/framework#57501

* Version

* CR

---------

Co-authored-by: Pantea Marius-ciclistu <>
@shaedrich
Copy link
Contributor

@macropay-solutions very true, but I think for now it seems @taylorotwell would like to see something elegant?

How is having two methods more elegant than one? It seems to me, in the end, this is down to personal preference. I don't see @macropay-solutions's solution as antithetical to the framework's philosophy.

@macropay-solutions
Copy link

@shaedrich #57528 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants