-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
Description
I analyzed the performance of a native module that was converted to N-API using a profiling tool, and compared the results to the original module that used V8 APIs. While overall the overhead of N-API is fairly minimal already, I did manage to identify 3 potential optimizations. Each of these can substantially reduce the impact of N-API in certain scenarios. I have tested an early version of these fixes already to confirm that, but I wanted to give a chance for discussion before I submit a PR.
1. Creating integer values
N-API currently offers only one way to create a number value:
napi_status napi_create_number(napi_env env, double value, napi_value* result);
V8 has an optimized representation for 32-bit integer values, but because the value provided to N-API is always a double it always calls v8::Number::New()
(never v8::Integer::New
), so it does not create the optimal integer representation. Therefore these integer values are slower to create and slower to work with than they could be.
Instead of a single napi_create_number()
API, there should probably be one for each of: int32_t
, uint32_t
, int64_t
, double
. Note there are already napi_get_value_*()
functions for each of those 4 types, so having the same 4 napi_create_*()
variants is more natural anyway.
2. Getting integer values
The N-API functions that get integer values do some work to get a v8::Context
that is never actually used. The profiler data showed that the call to v8::Isolate::GetCurrentContext()
is actually somewhat expensive. (And it is apparently not optimized out by the compiler.)
The implementation of napi_get_value_int32()
includes this code:
RETURN_STATUS_IF_FALSE(env, val->IsNumber(), napi_number_expected);
v8::Local<v8::Context> context = isolate->GetCurrentContext();
*result = val->Int32Value(context).FromJust();
But v8::Value::Int32Value()
does not use the context
argument when the value is a number type (a condition that was already checked above):
Maybe<int32_t> Value::Int32Value(Local<Context> context) const {
auto obj = Utils::OpenHandle(this);
if (obj->IsNumber()) return Just(NumberToInt32(*obj));
PREPARE_FOR_EXECUTION_PRIMITIVE(context, Object, Int32Value, int32_t);
i::Handle<i::Object> num;
has_pending_exception = !i::Object::ToInt32(isolate, obj).ToHandle(&num);
RETURN_ON_FAILED_EXECUTION_PRIMITIVE(int32_t);
return Just(num->IsSmi() ? i::Smi::cast(*num)->value()
: static_cast<int32_t>(num->Number()));
}
I can think of two ways to make this faster:
- Call the
v8::Value::Int32Value()
overload that does not take a context (and does not return a maybe). The problem is it is marked as "to be deprecated soon". - Pass an empty
v8::Local<v8::Context>
value tov8::Value::Int32Value()
. This relies on the internal implementation detail that it does not use the context when the value is a number type. But in practice it should be safe, and will be easily caught by tests in the unlikely event V8 ever changes that API behavior.
I also considred caching the v8::Context
in the napi_env
structure, but that probably isn't valid because APIs can be called from different context scopes.
3. Allocating handle scopes
V8 handle scopes are normally stack-allocated. But the current N-API implementation puts them on the heap, which means every entry/exit of a scope involves expensive new
and delete
operations.
I can think of two ways to make this faster:
- Change the design of all the N-API handle scope APIs so that the caller must pass in a pointer to a (presumaly stack-allocated) handle scope structure to be initialized. The problem is that the size of that structure is VM-specific (and must be part of the ABI). While V8 is currently the only JS engine that uses handle scopes with an N-API implementation, defining a V8-specific structure would seem like a leak in the abstraction.
- Pre-allocate memory for some small fixed number of handle scopes (maybe only 1?), attached to the
napi_env
. Track which ones are used/freed, and allocate new handle scopes on the heap only if the pre-allocated ones are all in use.