Skip to content

Commit 603c928

Browse files
rjmholtSean Wheeler
authored and
Sean Wheeler
committed
style/editoral changes and added to TOC
1 parent c97a14a commit 603c928

File tree

3 files changed

+338
-5
lines changed

3 files changed

+338
-5
lines changed
Lines changed: 314 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,314 @@
1+
# Understanding file encoding in VSCode and PowerShell
2+
3+
When using VS Code to create and edit PowerShell scripts, it is important that your files are saved
4+
using the correct character encoding format.
5+
6+
## What is file encoding and why is it important?
7+
8+
VSCode manages the interface between a human entering strings of characters into a buffer and
9+
reading/writing blocks of bytes to the filesystem. When VSCode saves the file, it uses a text
10+
encoding to do this.
11+
12+
Similarly, when PowerShell runs a script it must convert the bytes in a file to characters to
13+
reconstruct the file into a PowerShell program. Since VSCode writes the file and PowerShell reads
14+
the file, they need to use the same encoding system. This process of parsing a PowerShell script
15+
goes: *bytes* -> *characters* -> *tokens* -> *abstract syntax tree* -> *execution*.
16+
17+
Both VSCode and PowerShell are installed with a sensible default encoding configuration. However,
18+
the default encoding used by PowerShell has changed with the release of PowerShell Core (v6.x). To
19+
ensure you have no problems using PowerShell or the PowerShell extension in VSCode, you need to
20+
configure your VSCode and PowerShell settings properly.
21+
22+
## Common causes of encoding issues
23+
24+
Encoding problems occur when the encoding of VSCode or your script file does not match the expected
25+
encoding of PowerShell. There is no way for PowerShell to automatically determine the file encoding.
26+
27+
You're more likely to have encoding problems when you're using characters not in the [7-bit ASCII character set](https://ascii.cl/),
28+
such as accented latin characters (e.g. `É`, `ü`), or non-latin characters like Cyrillic (`Д`, `Ц`)
29+
or Han Chinese (``, ``).
30+
31+
Common reasons for encoding issues are:
32+
33+
- The encodings of VSCode and PowerShell have not been changed from their defaults. For PowerShell
34+
5.1 and below, the default encoding is different from VSCode's.
35+
- Another editor has opened and overwritten the file in a new encoding. This often happens with the
36+
ISE.
37+
- The file is checked into source control (like git) in a different encoding to what VSCode or
38+
PowerShell expects. This can happen when collaborators edit files with an editor with a different
39+
encoding configurations.
40+
41+
### How to tell when you have encoding issues
42+
43+
Often encoding errors present themselves as parse errors in scripts. If you find strange character
44+
sequences in your script, this can be the problem. In the example below, an en-dash (``) appears as
45+
the characters `–`:
46+
47+
```Output
48+
Send-MailMessage : A positional parameter cannot be found that accepts argument 'Testing FuseMail SMTP...'.
49+
At C:\Users\<User>\<OneDrive>\Development\PowerShell\Scripts\Send-EmailUsingSmtpRelay.ps1:6 char:1
50+
+ Send-MailMessage –From $from –To $recipient1 –Subject $subject ...
51+
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52+
+ CategoryInfo : InvalidArgument: (:) [Send-MailMessage], ParameterBindingException
53+
+ FullyQualifiedErrorId : PositionalParameterNotFound,Microsoft.PowerShell.Commands.SendMailMessage
54+
```
55+
56+
This problem occurs because VSCode encodes the character `` in UTF-8 as the bytes `0xE2 0x80 0x93`.
57+
When these bytes are decoded as Windows-1252, they are interpreted as the characters `–`.
58+
59+
Some strange character sequences that you might see include:
60+
61+
- `–` instead of ``
62+
- `—` instead of ``
63+
- `Ä2` instead of `Ä`
64+
- `Â` instead of ` ` (a non-breaking space)
65+
- `é` instead of `é`
66+
67+
This handy [reference](https://www.i18nqa.com/debug/utf8-debug.html) lists the common patterns that
68+
indicate a UTF-8/Windows-1252 encoding problem.
69+
70+
## How the PowerShell extension in VSCode interacts with encodings
71+
72+
The PowerShell extension interacts with scripts in a number of ways:
73+
74+
1. When scripts are edited in VSCode, the contents are sent by VSCode to the extension. The [Language Server Protocol][]
75+
mandates that this content is transferred in UTF-8. Therefore, it is not possible for the
76+
extension to get the wrong encoding.
77+
2. When scripts are executed directly in the Integrated Console, they are read off the filesystem by
78+
PowerShell directly. This means that if PowerShell's encoding differs from VSCode's, something
79+
may go wrong here.
80+
3. When a script that is open in VSCode references another script that is not open in VSCode, the
81+
extension falls back to loading that script's content from the file system. VSCode defaults to
82+
UTF-8 encoding, but uses [byte-order mark][], or BOM, detection to select the correct encoding.
83+
84+
The problem occurs when assuming the encoding of BOM-less formats (like [UTF-8] with no BOM and [Windows-1252]).
85+
In these cases, the extension defaults to UTF-8 rather than more complex logic. The PowerShell
86+
extension cannot change VSCode's encoding settings. For more information see [issue #824](https://github.com/Microsoft/vscode/issues/824).
87+
88+
## Choosing the right encoding
89+
90+
Choosing an encoding depends on the platforms and applications you use to read and write your
91+
PowerShell files.
92+
93+
On Windows, many applications have long used [Windows-1252]. Many .NET applications use [UTF-16]. In
94+
Windows, this is often called "Unicode", Unicode is a term that now refers to a broader [standard](https://en.wikipedia.org/wiki/Unicode).
95+
96+
In the Linux world, on the web, and .NET Standard, UTF-8 is now the dominant encoding.
97+
98+
Unicode encodings also have the concept of a byte-order mark (BOM). BOMs occur at the beginning of
99+
text to tell a decoder which encoding the text is using. In the case of multi-byte encodings, the
100+
BOM also indicates [endianness](https://en.wikipedia.org/wiki/Endianness) of the encoding. BOMs are
101+
designed to be bytes that rarely occur in non-Unicode text, allowing a reasonable guess that text is
102+
Unicode when a BOM is present.
103+
104+
BOMs are optional and their adoption has not caught on in the Linux world, due to a dependable
105+
convention of UTF-8 being used everywhere. This means that most Linux applications presume that text
106+
input is encoded in UTF-8. While many Linux applications will recognize and correctly handle a BOM,
107+
a number do not, leading to artifacts in text manipulated with those applications.
108+
109+
**Therefore**:
110+
111+
- If you work primarily with Windows applications and Windows PowerShell, you should prefer an
112+
encoding like UTF-8 with BOM or UTF-16.
113+
- If you work across platforms, you should prefer UTF-8 with BOM.
114+
- If you work mainly in Linux-associated contexts, you should prefer UTF-8 without BOM.
115+
- Windows-1252 and latin-1 are essentially legacy encodings that you should avoid if possible.
116+
However, some older Windows applications may depend on them.
117+
- It's also worth noting that script signing is [encoding-dependent](https://github.com/PowerShell/PowerShell/issues/3466),
118+
meaning a change of encoding on a signed script will require resigning.
119+
120+
## Configuring VSCode
121+
122+
VSCode's default encoding is UTF-8 without BOM.
123+
124+
To set [VSCode's encoding](https://code.visualstudio.com/docs/editor/codebasics#_file-encoding-support),
125+
go to the VSCode settings (Ctrl+,) and set the `"files.encoding"` setting:
126+
127+
```json
128+
"files.encoding": "utf8bom"
129+
```
130+
131+
Some possible values are:
132+
133+
- `utf8`: [UTF-8] without BOM
134+
- `utf8bom`: [UTF-8] with BOM
135+
- `utf16le`: Little endian [UTF-16]
136+
- `utf16be`: Big endian [UTF-16]
137+
- `windows1252`: [Windows-1252]
138+
139+
You should get a dropdown for this in the GUI view, or completions for it in the JSON view.
140+
141+
You can also add the following to auto-detect encoding when possible:
142+
143+
```json
144+
"files.autoGuessEncoding": true
145+
```
146+
147+
If you don't want these settings to affect all files types, VSCode also allows per-language
148+
configurations. Create a language specific setting by putting settings in a `[<language-name>]`
149+
field. For example:
150+
151+
```json
152+
"[powershell]": {
153+
"files.encoding": "utf8bom",
154+
"files.autoGuessEncoding": true
155+
}
156+
```
157+
158+
## Configuring PowerShell
159+
160+
PowerShell's default encoding varies depending on version:
161+
162+
- In PowerShell 6+, the default encoding is [UTF-8] without BOM on all platforms.
163+
- In Windows PowerShell, the default encoding is usually [Windows-1252], an extension of [latin-1],
164+
also known as ISO 8859-1.
165+
166+
In PowerShell 5+ you can find your default encoding with this:
167+
168+
```powershell
169+
[psobject].Assembly.GetTypes() | Where-Object { $_.Name -eq 'ClrFacade'} |
170+
ForEach-Object {
171+
$_.GetMethod('GetDefaultEncoding', [System.Reflection.BindingFlags]'nonpublic,static').Invoke($null, @())
172+
}
173+
```
174+
175+
The following [this script](https://gist.github.com/rjmholt/3d8dd4849f718c914132ce3c5b278e0e) can be
176+
used to determine what encoded your PowerShell session infers for a script without a BOM.
177+
178+
```powershell
179+
$badBytes = [byte[]]@(0xC3, 0x80)
180+
$utf8Str = [System.Text.Encoding]::UTF8.GetString($badBytes)
181+
$bytes = [System.Text.Encoding]::ASCII.GetBytes('Write-Output "') + [byte[]]@(0xC3, 0x80) + [byte[]]@(0x22)
182+
$path = Join-Path ([System.IO.Path]::GetTempPath()) 'encodingtest.ps1'
183+
184+
try
185+
{
186+
[System.IO.File]::WriteAllBytes($path, $bytes)
187+
188+
switch (& $path)
189+
{
190+
$utf8Str
191+
{
192+
return 'UTF-8'
193+
break
194+
}
195+
196+
default
197+
{
198+
return 'Windows-1252'
199+
break
200+
}
201+
}
202+
}
203+
finally
204+
{
205+
Remove-Item $path
206+
}
207+
```
208+
209+
If want to configure PowerShell to use a given encoding more generally, this is possible to do for
210+
some aspects with profile settings. See:
211+
212+
- [@mklement0]'s [answer about PowerShell encoding on StackOverflow](https://stackoverflow.com/a/40098904).
213+
- [@rkeithhill]'s [blog post about dealing with BOM-less UTF-8 input in PowerShell](https://rkeithhill.wordpress.com/2010/05/26/handling-native-exe-output-encoding-in-utf8-with-no-bom/).
214+
215+
It's not possible to force PowerShell to use a specific input encoding. PowerShell 5.1 and below
216+
default to Windows-1252 encoding when there is no BOM. For interoperability reasons, it's best to
217+
save scripts in a Unicode format with a BOM.
218+
219+
> [!IMPORTANT]
220+
> Any other tools you have that touch PowerShell scripts may be affected by your
221+
> encoding choices or re-encode your scripts to another encoding.
222+
223+
### Scripts
224+
225+
Scripts already on the file system may need to be re-encoded to your new chosen encoding. In the
226+
bottom bar of VSCode, you'll see the label UTF-8. Click it to open the action bar and select
227+
**Save with encoding**. You can now pick a new encoding for that file.
228+
229+
If you need to re-encode multiple files, you can use the following script:
230+
231+
```powershell
232+
Get-ChildItem *.ps1 -Recurse | ForEach-Object {
233+
$content = Get-Content -Path $_
234+
Set-Content -Path $_.Fullname -Value $content -Encoding UTF8 -PassThru -Force
235+
}
236+
```
237+
238+
### The PowerShell Integrated Scripting Environment (ISE)
239+
240+
If you also edit scripts using the PowerShell ISE, you will need to synchronize your encoding
241+
settings there.
242+
243+
The ISE should honor a BOM, but it is also possible to use reflection to
244+
[set the encoding](https://bensonxion.wordpress.com/2012/04/25/powershell-ise-default-saveas-encoding/).
245+
Note that this would not be persisted between startups.
246+
247+
### Source control software
248+
249+
Some source control tools, such as git, ignore encodings; git just tracks the bytes.
250+
Others, like TFS or Mercurial, may not. Even some git-based tools rely on decoding text.
251+
252+
When this is the case, make sure you:
253+
254+
- Configure the text encoding in your source control to match your VSCode configuration.
255+
- Ensure all your files are checked into source control in the relevant encoding.
256+
- Be wary of changes to the encoding received through source control. A key sign of this is a diff
257+
indicating changes but where nothing seems to have changed (because bytes have but characters have
258+
not).
259+
260+
### Collaborators' environments
261+
262+
On top of configuring source control, ensure that your collaborators on any files you share don't
263+
have settings that override your encoding by re-encoding PowerShell files.
264+
265+
### Other programs
266+
267+
Any other program that reads or writes a PowerShell script may re-encode it.
268+
269+
Some examples are:
270+
271+
- Using the clipboard to copy and paste a script. This is common in scenarios like:
272+
- Copying a script into a VM
273+
- Copying a script out of an email or webpage
274+
- Copying a script into or out of an Microsoft Word or PowerPoint document
275+
- Other text editors, such as:
276+
- Notepad
277+
- vim
278+
- Any other PowerShell script editor
279+
- Text editing utilities, like:
280+
- `Get-Content`/`Set-Content`/`Out-File`
281+
- PowerShell redirection operators like `>` and `>>`
282+
- `sed`/`awk`
283+
- File transfer programs, like:
284+
- A web browser, when downloading scripts
285+
- A file share
286+
287+
Some of these deal in bytes rather than text, but others offer encoding configurations. In those
288+
cases where you need to configure an encoding, you need to make it the same as your editor encoding
289+
to prevent problems.
290+
291+
## Other resources on encoding in PowerShell
292+
293+
There are a few other nice posts on encoding and configuring encoding in PowerShell that are worth a
294+
read:
295+
296+
- [@mklement0]'s [summary of PowerShell encoding on StackOverflow](https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8)
297+
- Previous issues opened on vscode-PowerShell for encoding problems:
298+
- [#1308](https://github.com/PowerShell/vscode-powershell/issues/1308)
299+
- [#1628](https://github.com/PowerShell/vscode-powershell/issues/1628)
300+
- [#1680](https://github.com/PowerShell/vscode-powershell/issues/1680)
301+
- [#1744](https://github.com/PowerShell/vscode-powershell/issues/1744)
302+
- [#1751](https://github.com/PowerShell/vscode-powershell/issues/1751)
303+
- [The classic *Joel on Software* writeup about Unicode](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/)
304+
- [Encoding in .NET Standard](https://github.com/dotnet/standard/issues/260#issuecomment-289549508)
305+
306+
307+
[@mklement0]: https://github.com/mklement0
308+
[@rkeithhill]: https://github.com/rkeithhill
309+
[Windows-1252]: https://wikipedia.org/wiki/Windows-1252
310+
[latin-1]: https://wikipedia.org/wiki/ISO/IEC_8859-1
311+
[UTF-8]: https://wikipedia.org/wiki/UTF-8
312+
[byte-order mark]: https://wikipedia.org/wiki/Byte_order_mark
313+
[UTF-16]: https://wikipedia.org/wiki/UTF-16
314+
[Language Server Protocol]: https://microsoft.github.io/language-server-protocol/

reference/docs-conceptual/components/vscode/using-vscode.md

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ To exit Visual Studio Code, **File->Exit**.
6565

6666
Some systems are set up in a way that requires all code signatures to be checked and thus requires
6767
PowerShell Editor Services to be manually approved to run on the system.
68-
A Group Policy update that changes execution policy is a likely cause if you have installed the
68+
A Group Policy update that changes execution policy is a likely cause if you have installed the
6969
PowerShell extension but are reaching an error like:
7070

7171
```
@@ -117,17 +117,34 @@ We recommend the following configuration settings for Visual Studio Code:
117117
"editor.renderWhitespace": "all",
118118
"editor.renderControlCharacters": true,
119119
"omnisharp.projectLoadTimeout": 120,
120-
"files.trimTrailingWhitespace": true
120+
"files.trimTrailingWhitespace": true,
121+
"files.encoding": "utf8bom",
122+
"files.autoGuessEncoding": true
121123
}
122124
```
123125

126+
If you don't want these settings to affect all files types, VSCode also allows per-language
127+
configurations. Create a language specific setting by putting settings in a `[<language-name>]`
128+
field. For example:
129+
130+
```json
131+
"[powershell]": {
132+
"files.encoding": "utf8bom",
133+
"files.autoGuessEncoding": true
134+
}
135+
```
136+
137+
For more information about file encoding in VS Code, see [Understanding file encoding](understanding-file-encoding.md).
138+
124139
## Debugging with Visual Studio Code
125140

126141
### No-workspace debugging
127142

128-
As of Visual Studio Code version 1.9 you can debug PowerShell scripts without having to open the folder containing the PowerShell script.
129-
Simply open the PowerShell script file with **File->Open File...**, set a breakpoint on a line (press F9) and then press F5 to start debugging.
130-
You should see the Debug actions pane appear which allows you to break into the debugger, step, resume and stop debugging.
143+
As of Visual Studio Code version 1.9 you can debug PowerShell scripts without having to open the
144+
folder containing the PowerShell script. Open the PowerShell script file with **File->Open
145+
File...**, set a breakpoint on a line (press F9) and then press F5 to start debugging. You should
146+
see the Debug actions pane appear which allows you to break into the debugger, step, resume and stop
147+
debugging.
131148

132149
### Workspace debugging
133150

reference/docs-conceptual/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,8 @@
138138
href: components/vscode/How-To-Replicate-the-ISE-Experience-In-VSCode.md
139139
- name: Using Visual Studio Code for remote editing and debugging
140140
href: components/vscode/Using-VSCode-for-Remote-Editing-and-Debugging.md
141+
- name: Understanding file encoding in VSCode and PowerShell
142+
href: components/vscode/understanding-file-encoding.md
141143
- name: Web Access
142144
href: ''
143145
items:

0 commit comments

Comments
 (0)