Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

Seek with Vector is 5 times slower then 'normal' Seek #512

Closed
Anderman opened this issue Dec 20, 2015 · 10 comments
Closed

Seek with Vector is 5 times slower then 'normal' Seek #512

Anderman opened this issue Dec 20, 2015 · 10 comments

Comments

@Anderman
Copy link
Contributor

I was looking for performance gains. Because for small strings there is some overhead in setting up the vector and finding the first byte.

But, I found out that the difference is also true for a long string.

Probably I do something wrong. But all test still working after changing the code

normal search

public unsafe int Seek(byte search)
        {
            if (IsDefault)
            {
                return -1;
            }
            var block = _block;
            var index = _index;
            while (true)
            {
                while (block.End == index)
                {
                    if (block.Next == null)
                    {
                        _block = block;
                        _index = index;
                        return -1;
                    }
                    block = block.Next;
                    index = block.Start;
                }
                fixed (byte* ptr = block.Array)
                {
                    for(;index< block.End ;index++)
                    {
                        if (*(ptr + index) == serach)
                        {
                            _block = block;
                            _index = index;
                            return search;
                        }
                    }
                }
            }
@Anderman Anderman mentioned this issue Dec 20, 2015
@benaadams
Copy link
Contributor

Been doing some benchmarks around into this; a change needed for the current version is the vectors need to be passed by ref as they are a large memory copy otherwise; will do a compare an contrast with your solution. (Also make sure Vector.IsHardwareAcclerated == true and that you aren't debugging in anyway as that switches off the JitIntrinsics)

e.g. if you see any time spent in the Vector.ctor then isn't not using the fast path.

@benaadams
Copy link
Contributor

Also you need to be running x64 (+RyuJIT) for Vectors to mean hardware acceleration (and currently Windows)

@benaadams
Copy link
Contributor

You are right about the non-optimal shorter than vector path though.

@Anderman
Copy link
Contributor Author

I ' am running core clr release. Do I also need to install x64 (+RyuJIT)?? I will check that

I did some test for both seek functions with the program below.
Both has an processor load of 60%

Start normal seek...
normal seek 451 call/ms
normal seek 450 call/ms
normal seek 450 call/ms
normal seek 448 call/ms
 Start vector seek...
Vector seek 44 call/ms
Vector seek 44 call/ms
Vector seek 44 call/ms
Vector seek 44 call/ms

test program

    public class Program
    {
        private static readonly Vector<byte> _vectorSpaces = new Vector<byte>((byte)' ');
        private static readonly Vector<byte> _vectorQuestionMark = new Vector<byte>((byte)'?');
        public static void Main()
        {
            var myStr = @"
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 ";
            var chars = myStr.ToCharArray().Select(c => (byte)c).ToArray();
            var mem = MemoryPoolBlock2.Create(new ArraySegment<byte>(chars), IntPtr.Zero, null, null);
            mem.End = chars.Length;
            var begin = mem.GetIterator();

            var maxRequest = 10000000;

            Console.WriteLine($"Start normal seek...");
            var scan = begin;
            for (var x = 0; x < 4; x++)
                new Task(() =>
                {
                    var sw = new Stopwatch();
                    sw.Start();
                    for (int i = 0; i < maxRequest; i++)
                    {
                        begin = scan;
                        begin.Seek(0x20, (byte)'?', (byte)'?');
                    }
                    sw.Stop();
                    Console.WriteLine($"normal seek {(int)(maxRequest / sw.Elapsed.TotalMilliseconds)} call/ms");
                }).Start();
            Console.ReadKey();
            Console.WriteLine($"Start vector seek...");
            for (var x = 0; x < 4; x++)
                new Task(() =>
                {
                    var sw2 = new Stopwatch();
                    sw2.Start();
                    for (int i = 0; i < maxRequest; i++)
                    {
                        begin = scan;
                        begin.Seek(_vectorSpaces, _vectorQuestionMark, _vectorQuestionMark);
                    }
                    sw2.Stop();
                    Console.WriteLine($"Vector seek {(int)(maxRequest / sw2.Elapsed.TotalMilliseconds)} call/ms");
                }).Start();
            Console.ReadKey();
        }

@benaadams
Copy link
Contributor

For coreclr you need to have the x64 runtime as the active one. Easiest way is to do:

dnvm upgrade -a x64 -r coreclr -u

Also add the line:

Console.WriteLine($"Vector.IsHardwareAcclerated == {Vector.IsHardwareAcclerated}");

To make sure its accelerated; after compiling in Release with optimizations, run your program at the command line and see the output.

Also try with:

https://github.com/benaadams/KestrelHttpServer/blob/memorypool.seek/src/Microsoft.AspNet.Server.Kestrel/Infrastructure/MemoryPoolIterator2.cs

@Anderman
Copy link
Contributor Author

Thx IsHardwareAcclerated== false on my system. will look at this later, Have to go now

I am now running on core clr 64

       1.0.0-rc2-16302   coreclr x86          win
  *    1.0.0-rc2-16319   coreclr x64          win             default
       1.0.0-rc2-16337   clr     x86          win
       1.0.0-rc2-16337   coreclr x86          win





C:\git\Other\KestrelHttpServer\src\test3>dnx test3
Vector.IsHardwareAcclerated == False
Start normal seek...
normal seek 39 call/ms
normal seek 39 call/ms
normal seek 39 call/ms
normal seek 38 call/ms

@Anderman
Copy link
Contributor Author

@benaadams After enabling factor the result are more with what I expected

Search for string with 16 chars is at the same speed

Vector.IsHardwareAcclerated == True

Start normal seek...
normal seek 7685 call/ms
normal seek 7511 call/ms
normal seek 7244 call/ms
normal seek 7130 call/ms
 Start vector seek...
Vector seek 7242 call/ms
Vector seek 7165 call/ms
Vector seek 7128 call/ms
Vector seek 7112 call/ms

@benaadams
Copy link
Contributor

@Anderman I'd reopen as it is slower than it needs to be when the length to be scanned it < Vector<byte>.Count and when Vector.IsHardwareAcclerated == false as would currently be the case on linux or running 32 bit

@Anderman
Copy link
Contributor Author

@benaadams, I can create a new PR for that? Or... did you already fix that?
I saw two other performance improvements

  • MemoryPoolIterator2Extensions is based on a struct and will do an copy of the struct when called.
  • A faster implementation of findfirstbyte. (only 10% performance increase)

Any thoughts about it

 public static int IndexOf(Vector<byte> byteEquals)
        {
            // Quasi-tree search
            if (!BitConverter.IsLittleEndian) return _IndexOf(byteEquals);

            var vector64 = Vector.AsVectorInt64(byteEquals);
            for (var i = 0; i < Vector<long>.Count; i++)
            {
                var longValue = vector64[i];
                if (longValue == 0) continue;
                {
                    return (i << 3) +
                        ((longValue & 0x00000000ffffffff) == 0
                        ? (longValue & 0x0000ffff00000000) == 0
                        ? (longValue & 0x00ff000000000000) == 0 ? 7 : 6
                        : (longValue & 0x000000ff00000000) == 0 ? 5 : 4
                        : (longValue & 0x000000000000ffff) == 0
                        ? (longValue & 0x0000000000ff0000) == 0 ? 3 : 2
                        : (longValue & 0x00000000000000ff) == 0 ? 1 : 0);
                }
            }
            return int.MaxValue;
        }

@benaadams
Copy link
Contributor

@Anderman should be fixed in #514

MemoryPoolIterator2Extensions is based on a struct

I have wondered in this, but currently its a ptr and an int; which would have to be passed anyway; so would increase the params required per function, but not sure if any gain. The this pointer shouldn't be copied so is only the end param, some behaviour relies the the copy behaviour (can modify and have an instant "undo" if needed) and with improvements to the methods in #498

A faster implementation of findfirstbyte

Worth investigating

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants