-
Notifications
You must be signed in to change notification settings - Fork 7.1k
#improve node sort performance by std::sort + long long (_localZOrder) for 64-bit #16126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why switch to sort again? Does it have better performance? How much percent it improves? Thanks. |
The sort performance will improve 50%~100% |
cocos/2d/CCNode.cpp
Outdated
auto node = _children.at(i); | ||
|
||
if (node && node->_localZOrder < 0) | ||
if (node && node->getLocalZOrder() < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_loacalZOrder
is defined as unsigned long long
in this PR, then node->getLocalZOrder() < 0
will be false for ever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it is a unsigned long long, why it will less than 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, here is local z order, not s_globalOrderOfArrival.
Can you provide the test case? Thanks. |
test code:
|
@halx99 i think this PR will break compatibility, right? |
I don't think so, I already chagne stroage to long long, and use bit field for make _localZOrder storage & operation more clearly. |
Well, If developer call setOrderOfArrival/getOrderOfArrival manually, they will be affected. |
Does need provide setOrderOfArrival/getOrderOfArrival still for compatibility? |
I don't quite understand, you don't remove these two functions, why is there compatibility issue? |
I think no compatibility issue, because 3.12 already remove the two functions |
Ok, i will try your test case to check the result. It seems it is valuable if it can improve 50%-100%. @ricardoquesada what's your opinion? |
OK |
+1 for improving performance. But does this patch only improves the performance in 64-bit? if so, what will happen on 32-bit OSs (like Android)? I think we should run the test on Android 32-bit before merging it. |
{ | ||
if (_reorderChildDirty) | ||
{ | ||
std::stable_sort(std::begin(_children), std::end(_children), nodeComparisonLess); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you revert from stable sort? Did you refer to why stable was added in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nevermind, I see it's all about going back to std::sort :D
I should've read the original post instead of just the commit.
looks good to me then. Thanks for the PR! |
cocos/2d/CCNode.h
Outdated
* | ||
* @param orderOfArrival The arrival order. | ||
*/ | ||
CC_DEPRECATED_ATTRIBUTE void _updateOrderOfArrival(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is better to put into protected
if it is used internally and its sub classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better, but if mark as protected, The ProtectedNode chould access it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProtectedNode inherits Node, i think there is not problem. And
CC_DEPRECATED_ATTRIBUTE
is not needed- _updateOrderOfArrival -> updateOrderOfArrival
- remove
void
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, but please
- remove
CC_DEPRECATED_ATTRIBUTE
since it is not deprecated, it is used - _updateOrderOfArrival -> updateOrderOfArrival
- remove
void
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I done this.
@halx99 your test case is just for |
There we can use node->_localZOrder->detail.z < 0 instead. |
Actual, I think the compilier will optimize the code for release mode normally. |
Yep, i think so. What i mean is that, have you tested it with cocos2d-x? I think we need to test it in cocos2d-x. |
It not easy to test it in cocos2d-x, because it need large numbers of entity for distinguish the test result. |
|
Your last test result is approach me, chould upload your last apk let me try on my device firstly. You can upload to baidu cloud and share to me. |
@halx99 i think it is better you build it yourself. It is easy, just fetch my branch and build it. |
OK, let me do it. |
The test result:
My environment: |
@minggo Hi, I move the my empty test project test code from HelloWorldScene::init to same as you( before AppDelegate *pAppDelegate = new AppDelegate();) and test again, and the result is same with yours
|
Let me test and test. |
I test many times: your project result is all same:
|
@halx99 so i think we should
|
I test many times for my empty project:
|
What's your empty project? Will you share it? |
My project is based on cocos2d-x-3.9 and use gcc to compile |
Does the cocos2d-x-3.12 replace gcc by clang? |
yep, because gcc is deprecated since ndk r11+, and will be removed in future |
There other 2 difference with your project:
|
change to use |
bool nodeComparisonLess(Node* n1, Node* n2) | ||
{ | ||
return(n1->_localZOrder.value < n2->_localZOrder.value); | ||
#if defined(_M_X64) || defined(_LP64) || defined(__x86_64) || defined(_WIN64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which platform will define these macros?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
windows 64: _WIN64, _M_X64
linux 64: __x86_64, __LP64__, __amd64
apple (mac or ios) 64: _LP64, __LP64__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need it for Android 64 too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The android is based on linux, so I think linux 64 specific macros can handle it.
I just have checked the android arm64-v8a: the macro __LP64__
and _LP64
are worked well.
The check program:
proj.android-bits-detect.zip
You can just run jni/dobuild.bat and open the generated libs/arm64-v8a/libpseudo.so with UE or EmEditor to search the symbols:_LP64$defined and __LP64__$defined
My result is:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, have you tested the new codes, and what's the performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean change test structure to
union LocalZOrder {
...
};
struct SimpleNode{
LocalZOrder _localZOrder;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new test codes could be:
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <algorithm>
#include <vector>
#include <time.h>
#if 0 // for android
#define LOG_TAG "cpp-empty-test"
#define LOGD(...) __android_log_print(ANDROID_LOG_DEBUG,LOG_TAG,__VA_ARGS__)
#define logfunc(format,...) __android_log_print(ANDROID_LOG_DEBUG,LOG_TAG,(format "\n"),##__VA_ARGS__)
#else
#define logfunc(format,...) printf((format "\n"),##__VA_ARGS__)
#endif
void runPerfromanceFunc()
{
union LocalZOrder {
LocalZOrder(int z, unsigned a)
{
value = 0;
detail.z = z;
detail.a = a;
}
struct {
int z; // The original localZOrder
unsigned int a; // Order Of Arrival, for avoid sort problem with unstable_sort algorithm.
} detail;
long long value; // The value to be used in sort
}; ///< Local order (relative to its siblings) used to sort the node
class SimpleNode
{
public:
SimpleNode(int z, unsigned a) : _localZOrder(z, a)
{
}
LocalZOrder _localZOrder;
};
std::vector<SimpleNode*> vec2;
logfunc("test program is %d bits!", sizeof(void*) << 3);
for (auto i = 0; i < 1000000; ++i)
vec2.push_back(new SimpleNode(2, i + 1));
clock_t start = 0;
for (int i = 0; i < 10; ++i) {
start = clock();
std::stable_sort(vec2.begin(), vec2.end(), [](const SimpleNode* l, const SimpleNode* r) {
return l->_localZOrder.detail.z < r->_localZOrder.detail.z;
});
logfunc("std::stable_sort, test round:%d, %lf seconds used.", i + 1, (clock() - start) / (double)CLOCKS_PER_SEC);
}
for (int i = 0; i < 10; ++i) {
start = clock();
std::sort(vec2.begin(), vec2.end(), [](const SimpleNode* l, const SimpleNode* r) {
return l->_localZOrder.detail.z < r->_localZOrder.detail.z;
});
logfunc("std::sort(order not stable), test round:%d, %lf seconds used.", i + 1, (clock() - start) / (double)CLOCKS_PER_SEC);
}
for (int i = 0; i < 10; ++i) {
start = clock();
std::sort(vec2.begin(), vec2.end(), [](const SimpleNode* l, const SimpleNode* r) {
return l->_localZOrder.detail.z < r->_localZOrder.detail.z || (l->_localZOrder.detail.z == r->_localZOrder.detail.z && l->_localZOrder.detail.a < r->_localZOrder.detail.a);
});
logfunc("std::sort, no optimize orderOfArrival storage, original implement, test round:%d, %lf seconds used.", i + 1, (clock() - start) / (double)CLOCKS_PER_SEC);
}
for (int i = 0; i < 10; ++i) {
start = clock();
std::sort(vec2.begin(), vec2.end(), [](const SimpleNode* l, const SimpleNode* r) {
return l->_localZOrder.value < r->_localZOrder.value;
});
logfunc("std::sort, optimize orderOfArrival storage, test round:%d, %lf seconds used.", i + 1, (clock() - start) / (double)CLOCKS_PER_SEC);
}
}
int main(int, char**)
{
runPerfromanceFunc();
#if defined(_WIN32)
getchar();
#endif
return 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean you should test the performance on iOS and Android again with adding the macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why again: the test code already contains the different situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test if the macro takes effect. Especially for iOS and Android.
@minggo For andorid: you can use the simple project to detect it: proj.android-bits-detect.zip, my test result is |
Yep, it takes effect. |
std::uint32_t a; // Order Of Arrival, for avoid sort problem with unstable_sort algorithm. | ||
} detail; | ||
std::int64_t value; // The value to be used in sort | ||
} _localZOrder; ///< Local order (relative to its siblings) used to sort the node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The codes have problem. For example, let's assume there are two nodes:
node1: z = 0, a = 1, then value is 0x0000000100000000
node2: z = 1, a = 0, then value is 0x0000000000000001
node2.vlaue < node1.value
, then the render sequence is node2, node1
, it is wrong. The correct sequence should be node1, node2
because the node2.z
is bigger. You can test cpp-emptey-test
on Mac or iOS, you can see now the close item
is hidden by background.
If we change it like this,
union {
struct {
std::int32_t z; // The original localZOrder
std::uint32_t a; // Order Of Arrival, for avoid sort problem with unstable_sort algorithm.
} detail;
std::int64_t value; // The value to be used in sort
} _localZOrder;
then it may have issue of z
is negative, for example
node1: z = -1, a = 1, then value is -(2^32 + 1)
node2: z = -1, a = 2, then value is -(2^32 + 2)
node2.value < node1.value, so the render sequence is node2, node1
. It is wrong, because node2.z == node1.z
but node2.a > node2.a
.
The above analyzation only takes effect with little endian architecture(x86, arm). If the architecture is big endian, the result is different.
So we can not just use union here. If the union is the only effect for optimization, then we should revert the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, That's why I storage as a unsigned long long and simulate a int32 storage as follow:
#include <assert.h>
#define CC_LOCALZORDER_ZERO 0x7fffffffULL
class SimpleNode
{
public:
SimpleNode(int z, unsigned int a)
{
_localZOrder = 0;
setLocalZOrder(z);
setOrderOfArrival(a);
}
int getLocalZOrder() const
{
return static_cast<int>(((_localZOrder >> 32) & 0xffffffff) - CC_LOCALZORDER_ZERO);
}
unsigned int getOrderOfArrival() const
{
return static_cast<unsigned int>((_localZOrder) & 0xffffffff);
}
void setLocalZOrder(int z)
{
_localZOrder = ((CC_LOCALZORDER_ZERO + z) << 32) | (_localZOrder & 0xffffffff);
}
void setOrderOfArrival(unsigned int orderOfArrival)
{
_localZOrder = (_localZOrder & 0xffffffff00000000) | orderOfArrival;
}
std::uint64_t _localZOrder;
};
// check value storage
SimpleNode n1(0, 1);
SimpleNode n2(1, 0);
SimpleNode n3(-3, 500);
assert(n3.getLocalZOrder() == -3);
assert(n3.getOrderOfArrival() == 500);
assert(n1._localZOrder < n2._localZOrder);
assert(n3._localZOrder < n1._localZOrder);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should revert this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I think your analyzation have some problem:
then it may have issue of z is negative, for example
node1: z = -1, a = 1, then value is -(2^32 + 1)
node2: z = -1, a = 2, then value is -(2^32 + 2)
You chould not simple use '+' operation to check the long long value,
Actually:
node1 value is: 0xffffffff00000001 = -4294967295
node2 value is: 0xffffffff00000002 = -4294967294
so, for negative value, except sign bit, other bits specific more large then, acutal value more less.
The root casue is I made a incorrect order, then the correct shoud be follow(only work for little endian arch):
struct
{
std::uint32_t a;
std::int32_t z;
};
And for big endian arch, the original code is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check it by follow code
void checkStorageLE()
{
union LocalZOrder {
LocalZOrder(int z, unsigned a)
{
value = 0;
detail.z = z;
detail.a = a;
}
struct {
unsigned int a; // Order Of Arrival, for avoid sort problem with unstable_sort algorithm.
int z; // The original localZOrder
} detail;
long long value; // The value to be used in sort
}; ///< Local order (relative to its siblings) used to sort the node
class SimpleNode
{
public:
SimpleNode(int z, unsigned a) : _localZOrder(z, a)
{
}
LocalZOrder _localZOrder;
};
SimpleNode n1(0, 1);
SimpleNode n2(1, 0);
SimpleNode nn1(-1, 1);
SimpleNode nn2(-1, 2);
bool ok = n1._localZOrder.value < n2._localZOrder.value;
assert(ok);
ok = nn1._localZOrder.value < nn2._localZOrder.value;
assert(ok);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But i have to revert it first. Could you please send a new PR after i merge the reverted PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@halx99 i have merged the reverted PR. So you can send a new PR now, thanks. And because we start to test v3.13, and the PR will modify the basic codes of cocos2d-x, so i may not include your PR in v3.13 but next version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create v3.13 when start testing, and continue merging PR in v3. v3.13 only supports bug fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Yes, this PR is causing problems to our game. |
i reverted in #16260. |
No description provided.