-
-
Notifications
You must be signed in to change notification settings - Fork 217
Fix a floatround 10 year old bug #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
floatround would overflow for floats greater than 2^30 because internally it would double the number (therefore anything greater than 2^30 results in something greater than 2^31 which would cause overflow of course) floatround behaviour is left exactly identical otherwise (although I find it very weird and wrong to deliberately avoid banker's rounding, it would be a bad idea to change this behaviour due to compatibility) Remember to reassemble amxexecn and amxjitsn
|
Great catch, I'd be curious how you found this. The intent is round-to-nearest, with the half-up tiebreaking method. I have no idea why, but that's how it's implemented upstream: https://github.com/compuphase/pawn/blob/master/amx/amxfloat.c#L185 I have only very vague memories of why we implemented this by doubling the value. My guess is performance - changing the FPU control word is very slow. See [1] for where this algorithm came from. If Arkshine & friends don't care about the performance difference, this seems like a reasonable change. Otherwise, you could switch to 64-bit integers. This bug would be present in SourceMod too and that's how we'd want to fix it there.[2] [1] http://ldesoras.free.fr/doc/articles/rounding_en.pd page 7 or 8 |
Well, to be honest literally half of the possible values floatround can return were erroneous 😝, it's just that you don't usually deal with numbers this big. Being serious though, in my mod players sometimes have ridiculous amounts of HP, and I noticed that their HP started printing incorrectly for 2^30 instead of 2^31 and I narrowed it down to floatround. I looked at amxmodx source code, found there were two codepaths for floatround (with or without optimizations), so I turned off optimizations in core.ini and floatround started working correctly. Therefore the issue had to be in the hand assembled floatround.
In this commit I tried to change the least code possible (mostly because I was tired and lazy), but both the old bugged and new fixed functions could be optimized quite a bit. Regarding perf differences, although as I said it could be optimized further, my fixed version should be faster than the old bugged version. The old version already channged FCW every single time, and had extra operations (it has an additional add, an additional shift, and extra branching) which I removed. So I see no downsides to this change, as this should be actually faster. Since I'm no expert in x87 wizardry and I'm kind of busy right now, it would be best if someone more knowledgeable or with more time came with a more optimized version, although as I said, at least for amxmodx this commit should actually be faster than before. |
|
Assembly is out of my knowledge atm, therefore, I can't measure the performance impact on the old and new code. Assuming new code is correct, if the performance is equal or superior, and no one is willing is to switch to 64 bits integers, then it seems reasonable to leave as it is. But considering this native is used quite often and everywhere if we could have the right code from the start, this would be preferred. |
|
By the way, now that I look at that that SourcePawn code, there is an additional "bug". If SSE is enabled, it uses cvtss2si, which doesn't support in any MXCSR rounding mode this rounding to nearest with half-up tie breaking. So basically floatround will produce different results if SSE is enabled or not (with SSE enabled, floatround(0.5) == 0, with SSE disabled floatround(0.5) == 1). Also I find it odd that you are not used the optimized fastfloor for OP_RND_TO_FLOOR About 64 bit ints, considering everything in amxmodx is compiled for 32 bits, introducing a x86_64 dependency would be a bad idea (since some guy in china who still uses XP will complain, and 32 and 64bit builds or a dispatcher would be needed, and that would be way beyond the scope of this PR). @Arkshine I don't think this could be much further optimized without either introducing new instruction set dependencies (x86_64 basically) or keeping it broken for ints > 2^30, other than a few little microoptimizations (FCW could be calculated from a lookup table, and I think some stack operations could be avoided too) which I have no time to do now. |
According to Intel docs, "when the conversion is inexact, the result is rounded according to the rounding mode selected in the MXCSR register". So SSE shouldn't make a difference.
We only optimized the float functions that get really heavy use. But it's certainly game too.
I meant use FISTP with encoding DF /7 - which stores the result as a 64-bit integer at the given address. Then you would divide that by two, and return 0x80000000 if the result is out of bounds. I'll try to fix this in SourcePawn today and see which approach is faster there. |
|
@IgnacioFDM Sorry, I misunderstood what you were saying. CVT indeed does not implement half-up, it's half-to-even. Apparently I totally misunderstood that when making the change[1]. Given that change was three years ago, I'm going to remove the half-up rounding entirely from SourceMod. It's dead code anyway since like 99% of machines are SSE2. For AMX Mod X, I still leave it up to Arkshine whether to break or take a small perf hit - it wouldn't be hard to get performance numbers for this, but on the other hand it's going to be in the single or double digits of extra cycles, so not the worst thing in the world. |
I think you don't understand what I mean. This is from sourcepawn case OP_RND_TO_NEAREST:
{
if (MacroAssemblerX86::Features().sse) {
// Assume no one is touching MXCSR.
__ cvtss2si(pri, Operand(stk, 0));
} else {
static float kRoundToNearest = 0.5f;
// From http://wurstcaptures.untergrund.net/assembler_tricks.html#fastfloorf
__ fld32(Operand(stk, 0));
__ fadd32(st0, st0);
__ fadd32(Operand(ExternalAddress(&kRoundToNearest)));
__ subl(esp, 4);
__ fistp32(Operand(esp, 0));
__ pop(pri);
__ sarl(pri, 1);
}
__ addl(stk, 4);
break;
}If the assembler detects SSE it will use the (Quote myself)
Dvander
Yeah, I just found it very odd considering the quoted assembly tricks doc was about "fastfloor" that you optimized everything (including ceil) but floor 😝
Oh okay I see. That could work. I really have no idea how slow setting FCW is, so this should definitely be benchmarked (since we're trading setting FCW for a few additional ops) I'll adapt your sourcepawn version to this PR when you're done. |
|
alliedmodders/sourcepawn#66 for SP patch. |
|
Ok didn't see your second post in time. Anyway, since the old amxmodx was unoptimized (even though it was using the fast floor method, it still unnecessarily was setting FCW every single time) my change here should actually make things faster than before, so that wouldn't be an issue. It could of course be optimized further both by doing what you suggested, or by breaking compatibility and using the half-even approach with my code which would be very slightly faster (no fadd, no branching, and remove an or) but this second one alone is not worth it in my opinion. Anyway, it's up to Arkshine, but since I have no time to do a 64 bit int version, and this commit should already make things faster than before, unless someone else wants to improve it further we should go ahead with this change. |
|
From alliedmodders/sourcepawn#67 I did some performance testing on my Core 2.
So I think AMX Mod X could take the control word method without any problem. |
Update the compiled object files
I'm surprised nobody found this earlier, it's been here for 10 years.
Very simple bug
floatround would overflow for floats greater than 2^30 because internally
it would double the number (therefore anything greater than 2^30 results
in something greater than 2^31 which would cause overflow of course)
floatround behaviour is left exactly identical otherwise (although I find
it very weird and wrong to deliberately avoid banker's rounding, it would
be a bad idea to change this behaviour due to compatibility)
Remember to reassemble amxexecn and amxjitsn. I have not included the updated obj files because it's best if arkshine assembles them by himself for security reasons.