Skip to content
This repository was archived by the owner on Jun 21, 2023. It is now read-only.

Don't change file encoding unnecessarily. #2009

Merged
merged 2 commits into from
Oct 29, 2018
Merged

Conversation

grokys
Copy link
Contributor

@grokys grokys commented Oct 23, 2018

When comparing files in a PR. we're currently taking a look at the file on disk and converting the encoding of the LHS of the comparison to match the on-disk file. However, if the file doesn't exist locally, we were returning Encoding.Default for this value, which was causing the unicode errors seen in #1908.

This PR changes to behavior to return null if the file doesn't exist locally, and if a null encoding is passed to PullRequestService.ExtractToTempFile then just write the contents out without any encoding.

Related

Fixes #1908.

When a file doesn't exist locally, don't change it's encoding to `Encoding.Default`, just leave it be. Fixes #1908.
@codecov
Copy link

codecov bot commented Oct 23, 2018

Codecov Report

Merging #2009 into master will increase coverage by 0.01%.
The diff coverage is 33.33%.

@@            Coverage Diff             @@
##           master    #2009      +/-   ##
==========================================
+ Coverage   39.34%   39.36%   +0.01%     
==========================================
  Files         410      410              
  Lines       17573    17577       +4     
  Branches     2425     2427       +2     
==========================================
+ Hits         6914     6919       +5     
- Misses      10112    10114       +2     
+ Partials      547      544       -3
Impacted Files Coverage Δ
src/GitHub.App/Services/PullRequestService.cs 35.29% <33.33%> (-0.24%) ⬇️
...nlineReviews/Services/PullRequestSessionService.cs 17.1% <0%> (+0.4%) ⬆️
...c/GitHub.InlineReviews/Tags/InlineCommentTagger.cs 80.53% <0%> (+0.88%) ⬆️
...nlineReviews/Services/PullRequestSessionManager.cs 83.75% <0%> (+1.01%) ⬆️

@meaghanlewis
Copy link
Contributor

This LGTM 👍

Copy link
Collaborator

@jcansdale jcansdale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. 👍

I believe we can simplify this by returning new UTF8Encoding(false). That way GetEncoding is responsible for returning UTF8Encoding either with or without preamble.

@@ -769,7 +769,7 @@ public Encoding GetEncoding(ILocalRepositoryModel repository, string relativePat
}
}

return Encoding.Default;
return null;
Copy link
Collaborator

@jcansdale jcansdale Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe can we can simply replace this with the following and avoid having to spacial case null.

Suggested change
return null;
return new UTF8Encoding(false);

This makes the GetEncoding responsible for returning Encoding.UTF8 if the target file has a preamble or new UTF8Encoding(false) if the target file doesn't exist or have a preamble. UTF8 without preamble is Git's default text encoding.

Here is what I see if we return new UTF8Encoding(false):

image

It's possible we could always use new UTF8Encoding(false) and avoid this issue completely. The question is what a diff looks like between UTF8 files with and without preamble. I didn't know new UTF8Encoding(false) was a thing until today.

Copy link
Collaborator

@jcansdale jcansdale Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: wondering if we could always use new UTF8Encoding(false). Alas, this doesn't work. I tried the following:

File.WriteAllText("foo.txt", "foobar", new UTF8Encoding(true));
File.WriteAllText("bar.txt", "foobar", new UTF8Encoding(false));

When I diff them together, I see:
image

I think returning return new UTF8Encoding(false) when there is no preamble or file is probably the cleanest solution.

@github github deleted a comment from StanleyGoldman Oct 29, 2018
Copy link
Collaborator

@jcansdale jcansdale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been tested and works. 👍

It would be nice if we could simplify it at some point, because it isn't very obvious what's going on (or why it's necessary). We could use new UTF8Encoding(false) encoding, which is UTF8 a encoder that doesn't write a preamble (this is similar to git's default).

@jcansdale jcansdale merged commit ee3a651 into master Oct 29, 2018
@jcansdale jcansdale deleted the fixes/1908-unicode-error branch October 29, 2018 15:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants