UpdateAssemblyInfo strips byte order mark #1074

kll · 2016-10-28T13:43:40Z

The AssemblyInfo files get overwritten using File.WriteAllText(file, fileContents) which writes in UTF8 without a byte-order mark. All of my source files include the BOM to comply with default StyleCop rules and this strips it out causing the files to get flagged.

The simple fix for me would be to use the WriteAllText overload that takes an encoding and to pass in Encoding.UTF8, but then people that specifically don't want byte order marks would get them. The encoding of the existing file could be detected and re-used to write the new contents of the file, but based on my initial research into that it is a bit tricky to get right and may be overkill for this. Perhaps a configuration setting to set the encoding to use?

I can make the change and submit a PR but wanted to get opinions before deciding which way to go.

The text was updated successfully, but these errors were encountered:

asbjornu · 2016-10-31T16:18:52Z

I think the byte order mark shouldn't be touched. Whatever it is (or isn't) shouldn't be changed by GitVersion. The best solution would be to build some heuristic detecting the current BOM before patching the file. Is it really that tricky to do?

kll · 2016-10-31T18:49:29Z

When there is a BOM present its fairly simple and straight forward. It only gets tricky when there is no BOM at all and you must look deeper into the file and essentially "guess" what the encoding is based on what the data looks like. Here is a GIST where someone did a fairly thorough attempt at encoding detection to show how tricky it really is.
https://gist.github.com/TaoK/945127

wahmedswl · 2017-01-03T07:05:46Z

Hi,
GitVersion when updated AssemblyInfo.cs, it changed its encoding. This might be related to this issue as it should just only update necessary information without altering anything else.

Attaching screenshot for reference

asbjornu · 2017-01-03T09:18:58Z

@wahmedswl: The encoding is the same (UTF-8), but as @kll has mentioned, the "signature" (byte order mark; BOM) has been removed by GitVersion in the right file.

wahmedswl · 2017-01-03T09:29:51Z

@asbjornu thanks for your response. When this gonna be fixed?

asbjornu · 2017-01-03T10:25:30Z

@wahmedswl: When someone figures out how to fix it. 😃

matteo-mosca · 2017-01-10T13:59:09Z

I'm using GitVersion with c# Cake and this bug is breaking my build. After removing the BOM from the AssemblyInfo.cs files the .NetCore compiler gives me an error:

error SA1412: Store files as UTF-8 with byte order mark

And the build breaks.

kll · 2017-02-01T12:45:33Z

I'm finally back to taking a look at this. I had decided to try a minimal effort of just preserving the encoding if it was easily detectable with byte order marks and falling back to the existing behavior if not, rather than going into all the crazy English language heuristic based detection. I think it is overkill and not worth the complexity and speed loss for a use case that may never even be needed. I would be willing to bet that everyone that actually cares about the encoding being used is using byte order marks to make it explicit. This proposed change would still be an improvement and the heuristic detection could always be added later if there is demand for it.

However, once I started to dig into it I realized there is two different existing behaviors depending on if you are using GitVersionExe or GitVersionTask. The exe creates UTF8 without BOM and the task creates UTF8 with BOM! That led me to finding issue #883 that points out the need to consolidate the two implementations so I'm trying to decide if I want to tackle that one while I'm into this or not.

@asbjornu What do you think of this middle ground approach? And depending on if I tackle #883 or not how should I reconcile the two existing behaviors for the fallback? If I leave them separate I can either preserve both existing behaviors or standardize them. If I combine them obviously one or the other will have to be chosen. I vote for standardizing on falling back to UTF8+BOM across the board because it is more explicit.

asbjornu · 2017-02-01T14:52:00Z

@kll: If you could smack both this bug and fix #883 in the same PR, that would be awesome!

kll · 2017-02-01T17:49:46Z

I chickened out and did not attempt to refactor anything for #883. I simply don't have enough experience with MSBuild or have even used GitVersionTask so I'm not sure how many of the differences between the two code paths are important or arbitrary. I might still look at that one later for fun but I don't want it to hold up this fix.

matteo-mosca · 2017-02-02T11:07:45Z

I vote for saving with the BOM, as stripping the BOM generates SA1412 violation in code analysis, and most people use "warnings as errors" thus being unable to build successfully for any violation.

UTF-8 with BOM is also a more explicit format, so I see no reason to not use it.

JakeGinnivan · 2017-02-25T01:31:00Z

Nice investigations and PR everyone. Thanks!

kll mentioned this issue Feb 1, 2017

Preserve existing file encoding if it can be easily determined otherwise use UTF8 with BOM. #1149

Merged

JakeGinnivan closed this as completed in #1149 Feb 25, 2017

Tankatronic mentioned this issue Oct 17, 2017

Publish beta versions to VSTS marketplace #1150

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UpdateAssemblyInfo strips byte order mark #1074

UpdateAssemblyInfo strips byte order mark #1074

kll commented Oct 28, 2016

asbjornu commented Oct 31, 2016

Uh oh!

kll commented Oct 31, 2016

Uh oh!

wahmedswl commented Jan 3, 2017

Uh oh!

asbjornu commented Jan 3, 2017

Uh oh!

wahmedswl commented Jan 3, 2017

Uh oh!

asbjornu commented Jan 3, 2017

Uh oh!

matteo-mosca commented Jan 10, 2017

Uh oh!

kll commented Feb 1, 2017

Uh oh!

asbjornu commented Feb 1, 2017

Uh oh!

kll commented Feb 1, 2017

Uh oh!

matteo-mosca commented Feb 2, 2017

Uh oh!

JakeGinnivan commented Feb 25, 2017

Uh oh!

UpdateAssemblyInfo strips byte order mark #1074

UpdateAssemblyInfo strips byte order mark #1074

Comments

kll commented Oct 28, 2016

asbjornu commented Oct 31, 2016

Uh oh!

kll commented Oct 31, 2016

Uh oh!

wahmedswl commented Jan 3, 2017

Uh oh!

asbjornu commented Jan 3, 2017

Uh oh!

wahmedswl commented Jan 3, 2017

Uh oh!

asbjornu commented Jan 3, 2017

Uh oh!

matteo-mosca commented Jan 10, 2017

Uh oh!

kll commented Feb 1, 2017

Uh oh!

asbjornu commented Feb 1, 2017

Uh oh!

kll commented Feb 1, 2017

Uh oh!

matteo-mosca commented Feb 2, 2017

Uh oh!

JakeGinnivan commented Feb 25, 2017

Uh oh!