Skip to content

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Oct 11, 2016

Checklist
  • make -j8 test (UNIX), or vcbuild test nosign (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

buffer

Description of change

Add buffer.transcode(source, from, to) method. Primarily uses ICU to transcode a buffer's content from one of Node.js' supported encodings to another.

Originally part of a proposal to add a new unicode module. Decided to refactor the approach towrds individual PRs without a new module.

Refs: #8075
/cc @trevnorris @addaleax

@jasnell jasnell added the buffer Issues and PRs related to the buffer subsystem. label Oct 11, 2016
@nodejs-github-bot nodejs-github-bot added build Issues and PRs related to build files or the CI. c++ Issues and PRs that require attention from people who are familiar with C++. tools Issues and PRs related to the tools directory. labels Oct 11, 2016
@jasnell
Copy link
Member Author

jasnell commented Oct 11, 2016

@jasnell jasnell added the semver-minor PRs that contain new features and should be released in the next minor version. label Oct 11, 2016
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about possibly placing this in lib/internal/buffer-transcode.js and conditionally require()'ing it. purely cosmetic though, to prevent an extra level of indent. or you could just return early. :)

if (!process.binding('config').hasIntl)
  return;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We going to have a complaint about not supporting SharedArrayBuffer for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, perhaps. Not too worried about that for now.

@jasnell
Copy link
Member Author

jasnell commented Oct 12, 2016

Updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give an example of transcoding is not possible?

src/node_i18n.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm – this could be a ThrowICUError function, right?

src/node_i18n.cc Outdated
Copy link
Member

@addaleax addaleax Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine but at some point this might become a member of the MaybeStackBuffer class? I realize that would conflict a bit with the MaybeStackBuffer<UChar> overload, maybe leave a TODO here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that final to residue from editing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, slight brain malfunction there I think ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhhh this returns a string or a Buffer depending on the target encoding? I don’t think binary-to-text encodings should be allowed here, .toString() is the right method for them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. I'll pull these back out.

@jasnell
Copy link
Member Author

jasnell commented Oct 12, 2016

@addaleax ... updated! PTAL

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice!

src/node_i18n.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use SwapBytes16?

src/node_i18n.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ditto)

src/node_i18n.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1024 seem kind of magic here, although I realize that is largely my fault. 😄 (Not sure if there’s anything to do about that)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: s/from_enc/fromEncoding/ and s/to_enc/toEncoding/. Ditto for cnv_from and cnv_to.

src/node_i18n.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize you adapted this code from elsewhere but using snprintf() to format the error message will be much more efficient.

src/node_i18n.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a if (e.Empty()) return Local<Value>();.

src/node_i18n.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be if (!ret.Empty()) buf->Release(); - it's leaking memory now when the buffer can't be created.

src/node_i18n.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

src/node_i18n.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: strict aliasing violation and prone to crashing.

src/node_i18n.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there is ample opportunity to share code between Ucs2FromUtf8 and Utf8FromUcs2, they are 80% identical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. For now I'm more inclined to keep these separate as it makes finding and tweaking bugs a bit easier. I'll take another pass in a separate PR to condense things down.

src/util.h Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also reset length_?

src/util.h Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you move this into a common header, you might want to give it a slightly less generic name; e.g. SPREAD_BUFFER_ARG.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove the 'defines' block instead of commenting it out.

@jasnell jasnell force-pushed the buffer-transcode branch 3 times, most recently from 9e0464b to 2da087e Compare October 13, 2016 15:11
@jasnell
Copy link
Member Author

jasnell commented Oct 13, 2016

@jasnell
Copy link
Member Author

jasnell commented Oct 13, 2016

Another CI run after cleanups: https://ci.nodejs.org/job/node-test-pull-request/4507/ ... that last run was less than successful....
Trying another: https://ci.nodejs.org/job/node-test-pull-request/4508/

@jasnell
Copy link
Member Author

jasnell commented Oct 13, 2016

CI looks good. @bnoordhuis PTAL... LGTY?

@jasnell
Copy link
Member Author

jasnell commented Oct 17, 2016

ping @bnoordhuis

src/node_i18n.cc Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're returning empty handles as an indicator it's probably be more "V8-ish" have the return signature as a MaybeLocal<Value> instead. been trying to do that in other locations myself.

src/node_i18n.cc Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating Local<Value>'s but there's no HandleScope. if this callback is expected to always be called within an existing HandleScope (like MakeCallback), mind putting a comment at the top. also like MakeCallback (see src/node.h).

src/node_i18n.cc Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we know this is a v8::Object then can use e.As<Object>(). that's also more explicit that no extra handle is being created.

src/node_i18n.cc Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new v8::Maybe<T> API for v8::Object::Set() is annoying and ugly, but if if we're going to use some of the new API might as well use all of it.

src/node_i18n.cc Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're manipulating the original memory, why bother take a copy?

src/node_i18n.cc Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this operation fails, do we want to abort or throw?

src/node_i18n.cc Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future performance enhancement, detect the alignment of the pointer and perform as many swaps that can be done in a single go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a little confused by this whole object, but right here if we're converting ascii to latin1 shouldn't we be passing 'latin1' as the encoding argument to Buffer.from()?

@jasnell jasnell force-pushed the buffer-transcode branch 3 times, most recently from 6f70820 to 65144bb Compare October 18, 2016 21:43
@jasnell
Copy link
Member Author

jasnell commented Oct 18, 2016

@bnoordhuis ... ok, reworked the implementation with an eye towards simplification and reducing duplication. PTAL
@trevnorris and @addaleax ... if I could trouble each of you to take another look also, I'd appreciate it.

@jasnell
Copy link
Member Author

jasnell commented Oct 21, 2016

ping @bnoordhuis @trevnorris @addaleax

@jasnell
Copy link
Member Author

jasnell commented Oct 24, 2016

Thank you for the follow up review @addaleax. Will get this landed tomorrow if there are no further objections.

Add buffer.transcode(source, from, to) method. Primarily uses ICU
to transcode a buffer's content from one of Node.js' supported
encodings to another.

Originally part of a proposal to add a new unicode module. Decided
to refactor the approach towrds individual PRs without a new module.

Refs: nodejs#8075
@jasnell
Copy link
Member Author

jasnell commented Oct 25, 2016

New CI run after squashing: https://ci.nodejs.org/job/node-test-pull-request/4665/

@jasnell
Copy link
Member Author

jasnell commented Oct 25, 2016

green except for unrelated failures. landing

@jasnell jasnell dismissed bnoordhuis’s stale review October 25, 2016 17:11

PR updated after view

jasnell added a commit that referenced this pull request Oct 25, 2016
Add buffer.transcode(source, from, to) method. Primarily uses ICU
to transcode a buffer's content from one of Node.js' supported
encodings to another.

Originally part of a proposal to add a new unicode module. Decided
to refactor the approach towrds individual PRs without a new module.

Refs: #8075
PR-URL: #9038
Reviewed-By: Anna Henningsen <[email protected]>
@jasnell
Copy link
Member Author

jasnell commented Oct 25, 2016

Landed in e8eaaa7

@jasnell jasnell closed this Oct 25, 2016
@srl295 srl295 mentioned this pull request Oct 25, 2016
4 tasks
evanlucas pushed a commit that referenced this pull request Nov 3, 2016
Add buffer.transcode(source, from, to) method. Primarily uses ICU
to transcode a buffer's content from one of Node.js' supported
encodings to another.

Originally part of a proposal to add a new unicode module. Decided
to refactor the approach towrds individual PRs without a new module.

Refs: #8075
PR-URL: #9038
Reviewed-By: Anna Henningsen <[email protected]>
@addaleax
Copy link
Member

If this is backported to any of the other release lines, it needs to come with #9838

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

buffer Issues and PRs related to the buffer subsystem. build Issues and PRs related to build files or the CI. c++ Issues and PRs that require attention from people who are familiar with C++. semver-minor PRs that contain new features and should be released in the next minor version. tools Issues and PRs related to the tools directory.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants