-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Don't change file encoding unnecessarily. #2009
Conversation
When a file doesn't exist locally, don't change it's encoding to `Encoding.Default`, just leave it be. Fixes #1908.
Codecov Report
@@ Coverage Diff @@
## master #2009 +/- ##
==========================================
+ Coverage 39.34% 39.36% +0.01%
==========================================
Files 410 410
Lines 17573 17577 +4
Branches 2425 2427 +2
==========================================
+ Hits 6914 6919 +5
- Misses 10112 10114 +2
+ Partials 547 544 -3
|
This LGTM 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. 👍
I believe we can simplify this by returning new UTF8Encoding(false)
. That way GetEncoding
is responsible for returning UTF8Encoding
either with or without preamble.
@@ -769,7 +769,7 @@ public Encoding GetEncoding(ILocalRepositoryModel repository, string relativePat | |||
} | |||
} | |||
|
|||
return Encoding.Default; | |||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe can we can simply replace this with the following and avoid having to spacial case null
.
return null; | |
return new UTF8Encoding(false); |
This makes the GetEncoding
responsible for returning Encoding.UTF8
if the target file has a preamble or new UTF8Encoding(false)
if the target file doesn't exist or have a preamble. UTF8 without preamble is Git's default text encoding.
Here is what I see if we return new UTF8Encoding(false)
:
It's possible we could always use new UTF8Encoding(false)
and avoid this issue completely. The question is what a diff looks like between UTF8 files with and without preamble. I didn't know new UTF8Encoding(false)
was a thing until today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: wondering if we could always use new UTF8Encoding(false)
. Alas, this doesn't work. I tried the following:
File.WriteAllText("foo.txt", "foobar", new UTF8Encoding(true));
File.WriteAllText("bar.txt", "foobar", new UTF8Encoding(false));
When I diff them together, I see:
I think returning return new UTF8Encoding(false)
when there is no preamble or file is probably the cleanest solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been tested and works. 👍
It would be nice if we could simplify it at some point, because it isn't very obvious what's going on (or why it's necessary). We could use new UTF8Encoding(false)
encoding, which is UTF8 a encoder that doesn't write a preamble (this is similar to git's default).
When comparing files in a PR. we're currently taking a look at the file on disk and converting the encoding of the LHS of the comparison to match the on-disk file. However, if the file doesn't exist locally, we were returning
Encoding.Default
for this value, which was causing the unicode errors seen in #1908.This PR changes to behavior to return
null
if the file doesn't exist locally, and if anull
encoding is passed toPullRequestService.ExtractToTempFile
then just write the contents out without any encoding.Related
Fixes #1908.