Skip to content

Commit f6d5a26

Browse files
rjmholtsdwheeler
authored andcommitted
Add documentation for configuring encoding with VSCode and PowerShell (#3743)
* style/editoral changes and added to TOC * Acrolinx edit * added metadata * review feedback * fix typo
1 parent c97a14a commit f6d5a26

File tree

3 files changed

+347
-5
lines changed

3 files changed

+347
-5
lines changed
Lines changed: 323 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,323 @@
1+
---
2+
title: Understanding file encoding in VSCode and PowerShell
3+
description: Configure file encoding in VSCode and PowerShell
4+
ms.date: 02/28/2019
5+
---
6+
# Understanding file encoding in VSCode and PowerShell
7+
8+
When using VS Code to create and edit PowerShell scripts, it is important that your files are saved
9+
using the correct character encoding format.
10+
11+
## What is file encoding and why is it important?
12+
13+
VSCode manages the interface between a human entering strings of characters into a buffer and
14+
reading/writing blocks of bytes to the filesystem. When VSCode saves a file, it uses a text
15+
encoding to do this.
16+
17+
Similarly, when PowerShell runs a script it must convert the bytes in a file to characters to
18+
reconstruct the file into a PowerShell program. Since VSCode writes the file and PowerShell reads
19+
the file, they need to use the same encoding system. This process of parsing a PowerShell script
20+
goes: *bytes* -> *characters* -> *tokens* -> *abstract syntax tree* -> *execution*.
21+
22+
Both VSCode and PowerShell are installed with a sensible default encoding configuration. However,
23+
the default encoding used by PowerShell has changed with the release of PowerShell Core (v6.x). To
24+
ensure you have no problems using PowerShell or the PowerShell extension in VSCode, you need to
25+
configure your VSCode and PowerShell settings properly.
26+
27+
## Common causes of encoding issues
28+
29+
Encoding problems occur when the encoding of VSCode or your script file does not match the expected
30+
encoding of PowerShell. There is no way for PowerShell to automatically determine the file encoding.
31+
32+
You're more likely to have encoding problems when you're using characters not in the [7-bit ASCII character set](https://ascii.cl/). For example:
33+
34+
- Accented latin characters (`É`, `ü`)
35+
- Non-latin characters like Cyrillic (`Д`, `Ц`)
36+
- Han Chinese (``, ``)
37+
38+
Common reasons for encoding issues are:
39+
40+
- The encodings of VSCode and PowerShell have not been changed from their defaults. For PowerShell
41+
5.1 and below, the default encoding is different from VSCode's.
42+
- Another editor has opened and overwritten the file in a new encoding. This often happens with the
43+
ISE.
44+
- The file is checked into source control in an encoding that is different from what VSCode or
45+
PowerShell expects. This can happen when collaborators use editors with different encoding
46+
configurations.
47+
48+
### How to tell when you have encoding issues
49+
50+
Often encoding errors present themselves as parse errors in scripts. If you find strange character
51+
sequences in your script, this can be the problem. In the example below, an en-dash (``) appears as
52+
the characters `–`:
53+
54+
```Output
55+
Send-MailMessage : A positional parameter cannot be found that accepts argument 'Testing FuseMail SMTP...'.
56+
At C:\Users\<User>\<OneDrive>\Development\PowerShell\Scripts\Send-EmailUsingSmtpRelay.ps1:6 char:1
57+
+ Send-MailMessage –From $from –To $recipient1 –Subject $subject ...
58+
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
59+
+ CategoryInfo : InvalidArgument: (:) [Send-MailMessage], ParameterBindingException
60+
+ FullyQualifiedErrorId : PositionalParameterNotFound,Microsoft.PowerShell.Commands.SendMailMessage
61+
```
62+
63+
This problem occurs because VSCode encodes the character `` in UTF-8 as the bytes `0xE2 0x80 0x93`.
64+
When these bytes are decoded as Windows-1252, they are interpreted as the characters `–`.
65+
66+
Some strange character sequences that you might see include:
67+
68+
- `–` instead of ``
69+
- `—` instead of ``
70+
- `Ä2` instead of `Ä`
71+
- `Â` instead of ` ` (a non-breaking space)
72+
- `é` instead of `é`
73+
74+
This handy [reference](https://www.i18nqa.com/debug/utf8-debug.html) lists the common patterns that
75+
indicate a UTF-8/Windows-1252 encoding problem.
76+
77+
## How the PowerShell extension in VSCode interacts with encodings
78+
79+
The PowerShell extension interacts with scripts in a number of ways:
80+
81+
1. When scripts are edited in VSCode, the contents are sent by VSCode to the extension. The [Language Server Protocol][]
82+
mandates that this content is transferred in UTF-8. Therefore, it is not possible for the
83+
extension to get the wrong encoding.
84+
2. When scripts are executed directly in the Integrated Console, they're read from the file by
85+
PowerShell directly. Tf PowerShell's encoding differs from VSCode's, something can go wrong here.
86+
3. When a script that is open in VSCode references another script that is not open in VSCode, the
87+
extension falls back to loading that script's content from the file system. The PowerShell
88+
extension defaults to UTF-8 encoding, but uses [byte-order mark][], or BOM, detection to select
89+
the correct encoding.
90+
91+
The problem occurs when assuming the encoding of BOM-less formats (like [UTF-8][] with no BOM and [Windows-1252][]).
92+
The PowerShell extension defaults to UTF-8. The extension cannot change VSCode's encoding settings.
93+
For more information, see [issue #824](https://github.com/Microsoft/vscode/issues/824).
94+
95+
## Choosing the right encoding
96+
97+
Different systems and applications can use different encodings:
98+
99+
- In .NET Standard, on the web, and in the Linux world, UTF-8 is now the dominant encoding.
100+
- Many .NET Framework applications use [UTF-16][]. For historical reasons, this is sometimes called
101+
"Unicode", a term that now refers to a broad [standard](https://en.wikipedia.org/wiki/Unicode)
102+
that includes both UTF-8 and UTF-16.
103+
- On Windows, many native applications that predate Unicode continue to use Windows-1252 by default.
104+
105+
Unicode encodings also have the concept of a byte-order mark (BOM). BOMs occur at the beginning of
106+
text to tell a decoder which encoding the text is using. For multi-byte encodings, the BOM also
107+
indicates [endianness](https://en.wikipedia.org/wiki/Endianness) of the encoding. BOMs are designed
108+
to be bytes that rarely occur in non-Unicode text, allowing a reasonable guess that text is Unicode
109+
when a BOM is present.
110+
111+
BOMs are optional and their adoption isn't as popular in the Linux world because a dependable
112+
convention of UTF-8 is used everywhere. Most Linux applications presume that text input is
113+
encoded in UTF-8. While many Linux applications will recognize and correctly handle a BOM, a number
114+
do not, leading to artifacts in text manipulated with those applications.
115+
116+
**Therefore**:
117+
118+
- If you work primarily with Windows applications and Windows PowerShell, you should prefer an
119+
encoding like UTF-8 with BOM or UTF-16.
120+
- If you work across platforms, you should prefer UTF-8 with BOM.
121+
- If you work mainly in Linux-associated contexts, you should prefer UTF-8 without BOM.
122+
- Windows-1252 and latin-1 are essentially legacy encodings that you should avoid if possible.
123+
However, some older Windows applications may depend on them.
124+
- It's also worth noting that script signing is [encoding-dependent](https://github.com/PowerShell/PowerShell/issues/3466),
125+
meaning a change of encoding on a signed script will require resigning.
126+
127+
## Configuring VSCode
128+
129+
VSCode's default encoding is UTF-8 without BOM.
130+
131+
To set [VSCode's encoding][], go to the VSCode settings (<kbd>Ctrl<kbd>+</kbd>,</kbd>) and set the
132+
`"files.encoding"` setting:
133+
134+
```json
135+
"files.encoding": "utf8bom"
136+
```
137+
138+
Some possible values are:
139+
140+
- `utf8`: [UTF-8] without BOM
141+
- `utf8bom`: [UTF-8] with BOM
142+
- `utf16le`: Little endian [UTF-16]
143+
- `utf16be`: Big endian [UTF-16]
144+
- `windows1252`: [Windows-1252]
145+
146+
You should get a dropdown for this in the GUI view, or completions for it in the JSON view.
147+
148+
You can also add the following to autodetect encoding when possible:
149+
150+
```json
151+
"files.autoGuessEncoding": true
152+
```
153+
154+
If you don't want these settings to affect all files types, VSCode also allows per-language
155+
configurations. Create a language-specific setting by putting settings in a `[<language-name>]`
156+
field. For example:
157+
158+
```json
159+
"[powershell]": {
160+
"files.encoding": "utf8bom",
161+
"files.autoGuessEncoding": true
162+
}
163+
```
164+
165+
## Configuring PowerShell
166+
167+
PowerShell's default encoding varies depending on version:
168+
169+
- In PowerShell 6+, the default encoding is UTF-8 without BOM on all platforms.
170+
- In Windows PowerShell, the default encoding is usually Windows-1252, an extension of [latin-1][],
171+
also known as ISO 8859-1.
172+
173+
In PowerShell 5+ you can find your default encoding with this:
174+
175+
```powershell
176+
[psobject].Assembly.GetTypes() | Where-Object { $_.Name -eq 'ClrFacade'} |
177+
ForEach-Object {
178+
$_.GetMethod('GetDefaultEncoding', [System.Reflection.BindingFlags]'nonpublic,static').Invoke($null, @())
179+
}
180+
```
181+
182+
The following [script](https://gist.github.com/rjmholt/3d8dd4849f718c914132ce3c5b278e0e) can be
183+
used to determine what encoding your PowerShell session infers for a script without a BOM.
184+
185+
```powershell
186+
$badBytes = [byte[]]@(0xC3, 0x80)
187+
$utf8Str = [System.Text.Encoding]::UTF8.GetString($badBytes)
188+
$bytes = [System.Text.Encoding]::ASCII.GetBytes('Write-Output "') + [byte[]]@(0xC3, 0x80) + [byte[]]@(0x22)
189+
$path = Join-Path ([System.IO.Path]::GetTempPath()) 'encodingtest.ps1'
190+
191+
try
192+
{
193+
[System.IO.File]::WriteAllBytes($path, $bytes)
194+
195+
switch (& $path)
196+
{
197+
$utf8Str
198+
{
199+
return 'UTF-8'
200+
break
201+
}
202+
203+
default
204+
{
205+
return 'Windows-1252'
206+
break
207+
}
208+
}
209+
}
210+
finally
211+
{
212+
Remove-Item $path
213+
}
214+
```
215+
216+
It's possible to configure PowerShell to use a given encoding more generally using profile
217+
settings. See the following articles:
218+
219+
- [@mklement0]'s [answer about PowerShell encoding on StackOverflow](https://stackoverflow.com/a/40098904).
220+
- [@rkeithhill]'s [blog post about dealing with BOM-less UTF-8 input in PowerShell](https://rkeithhill.wordpress.com/2010/05/26/handling-native-exe-output-encoding-in-utf8-with-no-bom/).
221+
222+
It's not possible to force PowerShell to use a specific input encoding. PowerShell 5.1 and below
223+
default to Windows-1252 encoding when there's no BOM. For interoperability reasons, it's best to
224+
save scripts in a Unicode format with a BOM.
225+
226+
> [!IMPORTANT]
227+
> Any other tools you have that touch PowerShell scripts may be affected by your
228+
> encoding choices or re-encode your scripts to another encoding.
229+
230+
### Existing scripts
231+
232+
Scripts already on the file system may need to be re-encoded to your new chosen encoding. In the
233+
bottom bar of VSCode, you'll see the label UTF-8. Click it to open the action bar and select **Save
234+
with encoding**. You can now pick a new encoding for that file. See [VSCode's encoding][] for full
235+
instructions.
236+
237+
If you need to re-encode multiple files, you can use the following script:
238+
239+
```powershell
240+
Get-ChildItem *.ps1 -Recurse | ForEach-Object {
241+
$content = Get-Content -Path $_
242+
Set-Content -Path $_.Fullname -Value $content -Encoding UTF8 -PassThru -Force
243+
}
244+
```
245+
246+
### The PowerShell Integrated Scripting Environment (ISE)
247+
248+
If you also edit scripts using the PowerShell ISE, you need to synchronize your encoding
249+
settings there.
250+
251+
The ISE should honor a BOM, but it's also possible to use reflection to
252+
[set the encoding](https://bensonxion.wordpress.com/2012/04/25/powershell-ise-default-saveas-encoding/).
253+
Note that this wouldn't be persisted between startups.
254+
255+
### Source control software
256+
257+
Some source control tools, such as git, ignore encodings; git just tracks the bytes.
258+
Others, like TFS or Mercurial, may not. Even some git-based tools rely on decoding text.
259+
260+
When this is the case, make sure you:
261+
262+
- Configure the text encoding in your source control to match your VSCode configuration.
263+
- Ensure all your files are checked into source control in the relevant encoding.
264+
- Be wary of changes to the encoding received through source control. A key sign of this is a diff
265+
indicating changes but where nothing seems to have changed (because bytes have but characters have
266+
not).
267+
268+
### Collaborators' environments
269+
270+
On top of configuring source control, ensure that your collaborators on any files you share don't
271+
have settings that override your encoding by re-encoding PowerShell files.
272+
273+
### Other programs
274+
275+
Any other program that reads or writes a PowerShell script may re-encode it.
276+
277+
Some examples are:
278+
279+
- Using the clipboard to copy and paste a script. This is common in scenarios like:
280+
- Copying a script into a VM
281+
- Copying a script out of an email or webpage
282+
- Copying a script into or out of a Microsoft Word or PowerPoint document
283+
- Other text editors, such as:
284+
- Notepad
285+
- vim
286+
- Any other PowerShell script editor
287+
- Text editing utilities, like:
288+
- `Get-Content`/`Set-Content`/`Out-File`
289+
- PowerShell redirection operators like `>` and `>>`
290+
- `sed`/`awk`
291+
- File transfer programs, like:
292+
- A web browser, when downloading scripts
293+
- A file share
294+
295+
Some of these tools deal in bytes rather than text, but others offer encoding configurations. In
296+
those cases where you need to configure an encoding, you need to make it the same as your editor
297+
encoding to prevent problems.
298+
299+
## Other resources on encoding in PowerShell
300+
301+
There are a few other nice posts on encoding and configuring encoding in PowerShell that are worth a
302+
read:
303+
304+
- [@mklement0]'s [summary of PowerShell encoding on StackOverflow](https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8)
305+
- Previous issues opened on vscode-PowerShell for encoding problems:
306+
- [#1308](https://github.com/PowerShell/vscode-powershell/issues/1308)
307+
- [#1628](https://github.com/PowerShell/vscode-powershell/issues/1628)
308+
- [#1680](https://github.com/PowerShell/vscode-powershell/issues/1680)
309+
- [#1744](https://github.com/PowerShell/vscode-powershell/issues/1744)
310+
- [#1751](https://github.com/PowerShell/vscode-powershell/issues/1751)
311+
- [The classic *Joel on Software* write up about Unicode](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/)
312+
- [Encoding in .NET Standard](https://github.com/dotnet/standard/issues/260#issuecomment-289549508)
313+
314+
315+
[@mklement0]: https://github.com/mklement0
316+
[@rkeithhill]: https://github.com/rkeithhill
317+
[Windows-1252]: https://wikipedia.org/wiki/Windows-1252
318+
[latin-1]: https://wikipedia.org/wiki/ISO/IEC_8859-1
319+
[UTF-8]: https://wikipedia.org/wiki/UTF-8
320+
[byte-order mark]: https://wikipedia.org/wiki/Byte_order_mark
321+
[UTF-16]: https://wikipedia.org/wiki/UTF-16
322+
[Language Server Protocol]: https://microsoft.github.io/language-server-protocol/
323+
[VSCode's encoding]: https://code.visualstudio.com/docs/editor/codebasics#_file-encoding-support

reference/docs-conceptual/components/vscode/using-vscode.md

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ To exit Visual Studio Code, **File->Exit**.
6565

6666
Some systems are set up in a way that requires all code signatures to be checked and thus requires
6767
PowerShell Editor Services to be manually approved to run on the system.
68-
A Group Policy update that changes execution policy is a likely cause if you have installed the
68+
A Group Policy update that changes execution policy is a likely cause if you have installed the
6969
PowerShell extension but are reaching an error like:
7070

7171
```
@@ -117,17 +117,34 @@ We recommend the following configuration settings for Visual Studio Code:
117117
"editor.renderWhitespace": "all",
118118
"editor.renderControlCharacters": true,
119119
"omnisharp.projectLoadTimeout": 120,
120-
"files.trimTrailingWhitespace": true
120+
"files.trimTrailingWhitespace": true,
121+
"files.encoding": "utf8bom",
122+
"files.autoGuessEncoding": true
121123
}
122124
```
123125

126+
If you don't want these settings to affect all files types, VSCode also allows per-language
127+
configurations. Create a language specific setting by putting settings in a `[<language-name>]`
128+
field. For example:
129+
130+
```json
131+
"[powershell]": {
132+
"files.encoding": "utf8bom",
133+
"files.autoGuessEncoding": true
134+
}
135+
```
136+
137+
For more information about file encoding in VS Code, see [Understanding file encoding](understanding-file-encoding.md).
138+
124139
## Debugging with Visual Studio Code
125140

126141
### No-workspace debugging
127142

128-
As of Visual Studio Code version 1.9 you can debug PowerShell scripts without having to open the folder containing the PowerShell script.
129-
Simply open the PowerShell script file with **File->Open File...**, set a breakpoint on a line (press F9) and then press F5 to start debugging.
130-
You should see the Debug actions pane appear which allows you to break into the debugger, step, resume and stop debugging.
143+
As of Visual Studio Code version 1.9 you can debug PowerShell scripts without having to open the
144+
folder containing the PowerShell script. Open the PowerShell script file with **File->Open
145+
File...**, set a breakpoint on a line (press F9) and then press F5 to start debugging. You should
146+
see the Debug actions pane appear which allows you to break into the debugger, step, resume and stop
147+
debugging.
131148

132149
### Workspace debugging
133150

reference/docs-conceptual/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,8 @@
138138
href: components/vscode/How-To-Replicate-the-ISE-Experience-In-VSCode.md
139139
- name: Using Visual Studio Code for remote editing and debugging
140140
href: components/vscode/Using-VSCode-for-Remote-Editing-and-Debugging.md
141+
- name: Understanding file encoding in VSCode and PowerShell
142+
href: components/vscode/understanding-file-encoding.md
141143
- name: Web Access
142144
href: ''
143145
items:

0 commit comments

Comments
 (0)