Skip to content

Commit c93f924

Browse files
sstricklCommit Queue
authored and
Commit Queue
committed
[vm/compiler] Further optimize setRange on TypedData receivers.
When setRange is called on a TypedData receiver and the source is also a TypedData object with the same element size and clamping is not required, the VM implementation now calls _boundsCheckAndMemcpyN for element size N. The generated IL for these methods performs the copy using the MemoryCopy instruction (mostly, see the note below). Since the two TypedData objects might have the same underlying buffer, the CL adds a can_overlap flag to the MemoryCopy instruction which checks for overlapping regions. If can_overlap is set, then the copy is performed backwards instead of forwards when needed to ensure that elements of the source region are read before they are overwritten. The existing uses of the MemoryCopy instruction are adjusted as follows: * The IL generated for copyRangeFromUint8ListToOneByteString passes false for can_overlap, as all uses currently ensure that the OneByteString is non-external and thus cannot overlap. * The IL generated for _memCopy, used by the FFI library, passes true for can_overlap, as there is no guarantee that the regions pointed at by the Pointer objects do not overlap. The MemoryCopy instruction has also been adjusted so that all numeric inputs (the two start offsets and the length) are either boxed or unboxed instead of just the length. This exposed an issue in the inliner, where unboxed constants in the callee graph were replaced with boxed constants when inlining into the caller graph, since withList calls setRange with constant starting offsets of 0. Now the representation of constants in the callee graph are preserved when inlining the callee graph into the caller graph. Fixes #51237 by using TMP and TMP2 for the LDP/STP calls in the 16-byte element size case, so no temporaries need to be allocated for the instruction. On ARM when not unrolling the memory copy loop, uses TMP and a single additional temporary for LDM/STM calls in the 8-byte and 16-byte element cases, with the latter just using two LDM/STM calls within the loop, a different approach than the one described in #51229 . Note: Once the number of elements being copied reaches a certain threshold (1048576 on X86, 256 otherwise), _boundsCheckAndMemcpyN instead calls _nativeSetRange, which is a native call that uses memmove from the standard C library for non-clamped inputs. It does this because the code currently emitted for MemoryCopy performs poorly compared to the more optimized memmove implementation when copying larger regions of memory. Notable benchmark changes for dart-aot: * X64 * TypedDataDuplicate.*.fromList improvement from ~13%-~250% * Uf8Encode.*.10 improvement from ~50%-~75% * MapCopy.Map.*.of.Map.* improvement from ~13%-~65% * MemoryCopy.*.setRange.* improvement from ~13%-~500% * ARM7 * Uf8Encode.*.10 improvement from ~35%-~70% * MapCopy.Map.*.of.Map.* improvement from ~6%-~75% * MemoryCopy.*.setRange.{8,64} improvement from ~22%-~500% * Improvement of ~100%-~200% for MemoryCopy.512.setRange.*.Double * Regression of ~40% for MemoryCopy.512.setRange.*.Uint8 * Regression of ~85% for MemoryCopy.4096.setRange.*.Uint8 * ARM8 * Uf8Encode.*.10 improvement from ~35%-~70% * MapCopy.Map.*.of.Map.* improvement from ~7%-~75% * MemoryCopy.*.setRange.{8,64} improvement from ~22%-~500% * Improvement of ~75%-~160% for MemoryCopy.512.setRange.*.Double * Regression of ~40% for MemoryCopy.512.setRange.*.Uint8 * Regression of ~85% for MemoryCopy.4096.setRange.*.Uint8 TEST=vm/cc/IRTest_Memory, co19{,_2}/LibTest/typed_data, lib{,_2}/typed_data, corelib{,_2}/list_test Issue: #42072 Issue: b/294114694 Issue: b/259315681 Change-Id: Ic75521c5fe10b952b5b9ce5f2020c7e3f03672a9 Cq-Include-Trybots: luci.dart.try:vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-simriscv64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-linux-debug-ia32-try,vm-linux-debug-simriscv64-try,vm-linux-debug-x64-try,vm-linux-debug-x64c-try,vm-mac-debug-arm64-try,vm-mac-debug-x64-try,vm-aot-linux-release-simarm64-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-release-x64-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-ffi-qemu-linux-release-riscv64-try,vm-ffi-qemu-linux-release-arm-try,vm-aot-msan-linux-release-x64-try,vm-msan-linux-release-x64-try,vm-aot-tsan-linux-release-x64-try,vm-tsan-linux-release-x64-try,vm-linux-release-ia32-try,vm-linux-release-simarm-try,vm-linux-release-simarm64-try,vm-linux-release-x64-try,vm-mac-release-arm64-try,vm-mac-release-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-aot-android-release-arm64c-try,vm-ffi-android-debug-arm64c-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/319521 Reviewed-by: Daco Harkes <[email protected]> Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Tess Strickland <[email protected]>
1 parent b745fa8 commit c93f924

29 files changed

+1768
-983
lines changed

runtime/lib/typed_data.cc

Lines changed: 74 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -66,90 +66,93 @@ DEFINE_NATIVE_ENTRY(TypedDataView_typedData, 0, 1) {
6666
return TypedDataView::Cast(instance).typed_data();
6767
}
6868

69-
static BoolPtr CopyData(const TypedDataBase& dst_array,
70-
const TypedDataBase& src_array,
71-
const Smi& dst_start,
72-
const Smi& src_start,
73-
const Smi& length,
74-
bool clamped) {
75-
const intptr_t dst_offset_in_bytes = dst_start.Value();
76-
const intptr_t src_offset_in_bytes = src_start.Value();
77-
const intptr_t length_in_bytes = length.Value();
78-
ASSERT(Utils::RangeCheck(src_offset_in_bytes, length_in_bytes,
79-
src_array.LengthInBytes()));
80-
ASSERT(Utils::RangeCheck(dst_offset_in_bytes, length_in_bytes,
81-
dst_array.LengthInBytes()));
82-
if (length_in_bytes > 0) {
83-
NoSafepointScope no_safepoint;
84-
if (clamped) {
85-
uint8_t* dst_data =
86-
reinterpret_cast<uint8_t*>(dst_array.DataAddr(dst_offset_in_bytes));
87-
int8_t* src_data =
88-
reinterpret_cast<int8_t*>(src_array.DataAddr(src_offset_in_bytes));
89-
for (intptr_t ix = 0; ix < length_in_bytes; ix++) {
90-
int8_t v = *src_data;
91-
if (v < 0) v = 0;
92-
*dst_data = v;
93-
src_data++;
94-
dst_data++;
95-
}
96-
} else {
97-
memmove(dst_array.DataAddr(dst_offset_in_bytes),
98-
src_array.DataAddr(src_offset_in_bytes), length_in_bytes);
99-
}
100-
}
101-
return Bool::True().ptr();
102-
}
103-
10469
static bool IsClamped(intptr_t cid) {
105-
switch (cid) {
106-
case kTypedDataUint8ClampedArrayCid:
107-
case kExternalTypedDataUint8ClampedArrayCid:
108-
case kTypedDataUint8ClampedArrayViewCid:
109-
case kUnmodifiableTypedDataUint8ClampedArrayViewCid:
110-
return true;
111-
default:
112-
return false;
113-
}
70+
COMPILE_ASSERT((kTypedDataUint8ClampedArrayCid + 1 ==
71+
kTypedDataUint8ClampedArrayViewCid) &&
72+
(kTypedDataUint8ClampedArrayCid + 2 ==
73+
kExternalTypedDataUint8ClampedArrayCid) &&
74+
(kTypedDataUint8ClampedArrayCid + 3 ==
75+
kUnmodifiableTypedDataUint8ClampedArrayViewCid));
76+
return cid >= kTypedDataUint8ClampedArrayCid &&
77+
cid <= kUnmodifiableTypedDataUint8ClampedArrayViewCid;
11478
}
11579

11680
static bool IsUint8(intptr_t cid) {
117-
switch (cid) {
118-
case kTypedDataUint8ClampedArrayCid:
119-
case kExternalTypedDataUint8ClampedArrayCid:
120-
case kTypedDataUint8ClampedArrayViewCid:
121-
case kUnmodifiableTypedDataUint8ClampedArrayViewCid:
122-
case kTypedDataUint8ArrayCid:
123-
case kExternalTypedDataUint8ArrayCid:
124-
case kTypedDataUint8ArrayViewCid:
125-
case kUnmodifiableTypedDataUint8ArrayViewCid:
126-
return true;
127-
default:
128-
return false;
129-
}
81+
COMPILE_ASSERT(
82+
(kTypedDataUint8ArrayCid + 1 == kTypedDataUint8ArrayViewCid) &&
83+
(kTypedDataUint8ArrayCid + 2 == kExternalTypedDataUint8ArrayCid) &&
84+
(kTypedDataUint8ArrayCid + 3 ==
85+
kUnmodifiableTypedDataUint8ArrayViewCid) &&
86+
(kTypedDataUint8ArrayCid + 4 == kTypedDataUint8ClampedArrayCid));
87+
return cid >= kTypedDataUint8ArrayCid &&
88+
cid <= kUnmodifiableTypedDataUint8ClampedArrayViewCid;
13089
}
13190

132-
DEFINE_NATIVE_ENTRY(TypedDataBase_setRange, 0, 7) {
91+
DEFINE_NATIVE_ENTRY(TypedDataBase_setRange, 0, 5) {
13392
const TypedDataBase& dst =
13493
TypedDataBase::CheckedHandle(zone, arguments->NativeArgAt(0));
135-
const Smi& dst_start = Smi::CheckedHandle(zone, arguments->NativeArgAt(1));
136-
const Smi& length = Smi::CheckedHandle(zone, arguments->NativeArgAt(2));
94+
const Smi& dst_start_smi =
95+
Smi::CheckedHandle(zone, arguments->NativeArgAt(1));
96+
const Smi& dst_end_smi = Smi::CheckedHandle(zone, arguments->NativeArgAt(2));
13797
const TypedDataBase& src =
13898
TypedDataBase::CheckedHandle(zone, arguments->NativeArgAt(3));
139-
const Smi& src_start = Smi::CheckedHandle(zone, arguments->NativeArgAt(4));
140-
const Smi& to_cid_smi = Smi::CheckedHandle(zone, arguments->NativeArgAt(5));
141-
const Smi& from_cid_smi = Smi::CheckedHandle(zone, arguments->NativeArgAt(6));
99+
const Smi& src_start_smi =
100+
Smi::CheckedHandle(zone, arguments->NativeArgAt(4));
142101

143-
if (length.Value() < 0) {
144-
const String& error = String::Handle(String::NewFormatted(
145-
"length (%" Pd ") must be non-negative", length.Value()));
146-
Exceptions::ThrowArgumentError(error);
102+
const intptr_t element_size_in_bytes = dst.ElementSizeInBytes();
103+
ASSERT_EQUAL(src.ElementSizeInBytes(), element_size_in_bytes);
104+
105+
const intptr_t dst_start_in_bytes =
106+
dst_start_smi.Value() * element_size_in_bytes;
107+
const intptr_t dst_end_in_bytes = dst_end_smi.Value() * element_size_in_bytes;
108+
const intptr_t src_start_in_bytes =
109+
src_start_smi.Value() * element_size_in_bytes;
110+
111+
const intptr_t length_in_bytes = dst_end_in_bytes - dst_start_in_bytes;
112+
113+
if (!IsClamped(dst.ptr()->GetClassId()) || IsUint8(src.ptr()->GetClassId())) {
114+
// We've already performed range checking in _boundsCheckAndMemcpyN prior
115+
// to the call to _nativeSetRange, so just perform the memmove.
116+
//
117+
// TODO(dartbug.com/42072): We do this when the copy length gets large
118+
// enough that a native call to invoke memmove is faster than the generated
119+
// code from MemoryCopy. Replace the static call to _nativeSetRange with
120+
// a CCall() to a memmove leaf runtime entry and remove the possibility of
121+
// calling _nativeSetRange except in the clamping case.
122+
NoSafepointScope no_safepoint;
123+
memmove(dst.DataAddr(dst_start_in_bytes), src.DataAddr(src_start_in_bytes),
124+
length_in_bytes);
125+
return Object::null();
126+
}
127+
128+
// This is called on the fast path prior to bounds checking, so perform
129+
// the bounds check even if the length is 0.
130+
const intptr_t dst_length_in_bytes = dst.LengthInBytes();
131+
RangeCheck(dst_start_in_bytes, length_in_bytes, dst_length_in_bytes,
132+
element_size_in_bytes);
133+
134+
const intptr_t src_length_in_bytes = src.LengthInBytes();
135+
RangeCheck(src_start_in_bytes, length_in_bytes, src_length_in_bytes,
136+
element_size_in_bytes);
137+
138+
ASSERT_EQUAL(element_size_in_bytes, 1);
139+
140+
if (length_in_bytes > 0) {
141+
NoSafepointScope no_safepoint;
142+
uint8_t* dst_data =
143+
reinterpret_cast<uint8_t*>(dst.DataAddr(dst_start_in_bytes));
144+
int8_t* src_data =
145+
reinterpret_cast<int8_t*>(src.DataAddr(src_start_in_bytes));
146+
for (intptr_t ix = 0; ix < length_in_bytes; ix++) {
147+
int8_t v = *src_data;
148+
if (v < 0) v = 0;
149+
*dst_data = v;
150+
src_data++;
151+
dst_data++;
152+
}
147153
}
148-
const intptr_t to_cid = to_cid_smi.Value();
149-
const intptr_t from_cid = from_cid_smi.Value();
150154

151-
const bool needs_clamping = IsClamped(to_cid) && !IsUint8(from_cid);
152-
return CopyData(dst, src, dst_start, src_start, length, needs_clamping);
155+
return Object::null();
153156
}
154157

155158
// Native methods for typed data allocation are recognized and implemented

runtime/vm/bootstrap_natives.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,7 @@ namespace dart {
173173
V(TypedData_Int32x4Array_new, 2) \
174174
V(TypedData_Float64x2Array_new, 2) \
175175
V(TypedDataBase_length, 1) \
176-
V(TypedDataBase_setRange, 7) \
176+
V(TypedDataBase_setRange, 5) \
177177
V(TypedData_GetInt8, 2) \
178178
V(TypedData_SetInt8, 3) \
179179
V(TypedData_GetUint8, 2) \

runtime/vm/compiler/assembler/assembler_base.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -629,6 +629,23 @@ class AssemblerBase : public StackResource {
629629

630630
virtual void SmiTag(Register r) = 0;
631631

632+
// If Smis are compressed and the Smi value in dst is non-negative, ensures
633+
// the upper bits are cleared. If Smis are not compressed, is a no-op.
634+
//
635+
// Since this operation only affects the unused upper bits when Smis are
636+
// compressed, it can be used on registers not allocated as writable.
637+
//
638+
// The behavior on the upper bits of signed compressed Smis is undefined.
639+
#if defined(DART_COMPRESSED_POINTERS)
640+
virtual void ExtendNonNegativeSmi(Register dst) {
641+
// Default to sign extension and allow architecture-specific assemblers
642+
// where an alternative like zero-extension is preferred to override this.
643+
ExtendValue(dst, dst, kObjectBytes);
644+
}
645+
#else
646+
void ExtendNonNegativeSmi(Register dst) {}
647+
#endif
648+
632649
// Extends a value of size sz in src to a value of size kWordBytes in dst.
633650
// That is, bits in the source register that are not part of the sz-sized
634651
// value are ignored, and if sz is signed, then the value is sign extended.

runtime/vm/compiler/assembler/assembler_ia32.cc

Lines changed: 10 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1776,6 +1776,16 @@ void Assembler::cmpxchgl(const Address& address, Register reg) {
17761776
EmitOperand(reg, address);
17771777
}
17781778

1779+
void Assembler::cld() {
1780+
AssemblerBuffer::EnsureCapacity ensured(&buffer_);
1781+
EmitUint8(0xFC);
1782+
}
1783+
1784+
void Assembler::std() {
1785+
AssemblerBuffer::EnsureCapacity ensured(&buffer_);
1786+
EmitUint8(0xFD);
1787+
}
1788+
17791789
void Assembler::cpuid() {
17801790
AssemblerBuffer::EnsureCapacity ensured(&buffer_);
17811791
EmitUint8(0x0F);
@@ -3126,46 +3136,6 @@ Address Assembler::ElementAddressForIntIndex(bool is_external,
31263136
}
31273137
}
31283138

3129-
static ScaleFactor ToScaleFactor(intptr_t index_scale, bool index_unboxed) {
3130-
if (index_unboxed) {
3131-
switch (index_scale) {
3132-
case 1:
3133-
return TIMES_1;
3134-
case 2:
3135-
return TIMES_2;
3136-
case 4:
3137-
return TIMES_4;
3138-
case 8:
3139-
return TIMES_8;
3140-
case 16:
3141-
return TIMES_16;
3142-
default:
3143-
UNREACHABLE();
3144-
return TIMES_1;
3145-
}
3146-
} else {
3147-
// Note that index is expected smi-tagged, (i.e, times 2) for all arrays
3148-
// with index scale factor > 1. E.g., for Uint8Array and OneByteString the
3149-
// index is expected to be untagged before accessing.
3150-
ASSERT(kSmiTagShift == 1);
3151-
switch (index_scale) {
3152-
case 1:
3153-
return TIMES_1;
3154-
case 2:
3155-
return TIMES_1;
3156-
case 4:
3157-
return TIMES_2;
3158-
case 8:
3159-
return TIMES_4;
3160-
case 16:
3161-
return TIMES_8;
3162-
default:
3163-
UNREACHABLE();
3164-
return TIMES_1;
3165-
}
3166-
}
3167-
}
3168-
31693139
Address Assembler::ElementAddressForRegIndex(bool is_external,
31703140
intptr_t cid,
31713141
intptr_t index_scale,

runtime/vm/compiler/assembler/assembler_ia32.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -572,6 +572,9 @@ class Assembler : public AssemblerBase {
572572
void lock();
573573
void cmpxchgl(const Address& address, Register reg);
574574

575+
void cld();
576+
void std();
577+
575578
void cpuid();
576579

577580
/*

runtime/vm/compiler/assembler/assembler_x64.cc

Lines changed: 0 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -2683,46 +2683,6 @@ Address Assembler::ElementAddressForIntIndex(bool is_external,
26832683
}
26842684
}
26852685

2686-
static ScaleFactor ToScaleFactor(intptr_t index_scale, bool index_unboxed) {
2687-
if (index_unboxed) {
2688-
switch (index_scale) {
2689-
case 1:
2690-
return TIMES_1;
2691-
case 2:
2692-
return TIMES_2;
2693-
case 4:
2694-
return TIMES_4;
2695-
case 8:
2696-
return TIMES_8;
2697-
case 16:
2698-
return TIMES_16;
2699-
default:
2700-
UNREACHABLE();
2701-
return TIMES_1;
2702-
}
2703-
} else {
2704-
// Note that index is expected smi-tagged, (i.e, times 2) for all arrays
2705-
// with index scale factor > 1. E.g., for Uint8Array and OneByteString the
2706-
// index is expected to be untagged before accessing.
2707-
ASSERT(kSmiTagShift == 1);
2708-
switch (index_scale) {
2709-
case 1:
2710-
return TIMES_1;
2711-
case 2:
2712-
return TIMES_1;
2713-
case 4:
2714-
return TIMES_2;
2715-
case 8:
2716-
return TIMES_4;
2717-
case 16:
2718-
return TIMES_8;
2719-
default:
2720-
UNREACHABLE();
2721-
return TIMES_1;
2722-
}
2723-
}
2724-
}
2725-
27262686
Address Assembler::ElementAddressForRegIndex(bool is_external,
27272687
intptr_t cid,
27282688
intptr_t index_scale,

runtime/vm/compiler/assembler/assembler_x64.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1024,6 +1024,14 @@ class Assembler : public AssemblerBase {
10241024
Register scratch,
10251025
bool can_be_null = false) override;
10261026

1027+
#if defined(DART_COMPRESSED_POINTERS)
1028+
void ExtendNonNegativeSmi(Register dst) override {
1029+
// Zero-extends and is a smaller instruction to output than sign
1030+
// extension (movsxd).
1031+
orl(dst, dst);
1032+
}
1033+
#endif
1034+
10271035
// CheckClassIs fused with optimistic SmiUntag.
10281036
// Value in the register object is untagged optimistically.
10291037
void SmiUntagOrCheckClass(Register object, intptr_t class_id, Label* smi);

0 commit comments

Comments
 (0)