-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Centralise code to map between UTF-8 and UTF-16 on Windows. #62605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci Please smoke test |
Since @grynspan and I both had essentially the same code in #62462 and #62577 respectively, it made sense to pull it out into a separate PR (before anyone else starts adding another copy to their PR :-D). While @compnerd suggested using the LLVM |
(Worth noting that presently these functions generate a fatal error when they fail. I'm not sure if that's desirable everywhere; maybe they shouldn't.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fatal error makes sense when malloc fails. It doesn't seem right when the conversion fails, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really centralise it? It seems that we should simultaneously update the symbolication codepath.
@grynspan is doing that in a separate PR. The point of this is to provide the functions in one place, in one PR. |
@swift-ci Please smoke test |
Fixed it to return |
I think we should return |
8ef7c84
to
18df4db
Compare
@swift-ci Please smoke test |
@swift-ci Please smoke test Windows platform |
18df4db
to
7f823b5
Compare
In various places we need to call the Windows API, and because Swift uses UTF-8 for its string representation, we can’t call the ANSI API functions (because the code page used for the ANSI functions varies depending on the system locale setting). Instead, we need to use the wide character APIs. This means that we need to convert UTF-8 to wide character and vice-versa in various places in the runtime. rdar://103397975
Instead of triggering a fatal error on failure, return `nullptr` and let the caller decide what to do about it. The CommandLine code should trigger a fatal error, of course. rdar://103397975
`SWIFT_RUNTIME_STDLIB_INTERNAL` does `extern "C"`, so we can't put these in a namespace and have to use a C-style prefix instead. Also slightly rearrange the code in `CommandLine.cpp`. rdar://103397975
It makes sense to move this function into the new Win32 header. rdar://103397975
7f823b5
to
e73de53
Compare
@swift-ci Please smoke test |
They're useful functions to have; I don't think we want them to be API, but having them as SPI is conceivably useful for other purposes, and avoids everyone rolling their own copy. rdar://103397975
@swift-ci Please smoke test |
How would you feel about exposing Edit: If we wanted something that was platform-agnostic, we could instead expose cross-platform API for interacting with wide C strings the way a developer would interact with "narrow" C strings: // MARK: - wchar_t
extension String {
public init<S>(wideCharacters: S) where S: Sequence, S.Element == wchar_t
public init(wideCString: UnsafePointer<wchar_t>)
public init<C>(wideCString: C) where C: Collection, C.Element == wchar_t
}
// MARK: - CWideChar
extension String {
public init<S>(wideCharacters: S) where S: Sequence, S.Element == CWideChar
public init(wideCString: UnsafePointer<CWideChar>)
public init<C>(wideCString: C) where C: Collection, C.Element == CWideChar
} Edit 2: As I thought about it more, I realized that was just a long way of spelling |
That's more a question for the standard library folks, I think. I don't think that's necessarily the right thing anyway. IMO it'd be better to allow the Windows wide character APIs to be imported in a way such that we can pass |
@swift-ci Please smoke test |
Since |
In various places we need to call the Windows API, and because Swift uses UTF-8 for its string representation, we can’t call the ANSI API functions (because the code page used for the ANSI functions varies depending on the system locale setting). Instead, we need to use the wide character APIs.
This means that we need to convert UTF-8 to wide character and vice-versa in various places in the runtime.
rdar://103397975