Skip to content

UpdateAssemblyInfo strips byte order mark #1074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kll opened this issue Oct 28, 2016 · 12 comments · Fixed by #1149
Closed

UpdateAssemblyInfo strips byte order mark #1074

kll opened this issue Oct 28, 2016 · 12 comments · Fixed by #1149

Comments

@kll
Copy link
Contributor

kll commented Oct 28, 2016

The AssemblyInfo files get overwritten using File.WriteAllText(file, fileContents) which writes in UTF8 without a byte-order mark. All of my source files include the BOM to comply with default StyleCop rules and this strips it out causing the files to get flagged.

The simple fix for me would be to use the WriteAllText overload that takes an encoding and to pass in Encoding.UTF8, but then people that specifically don't want byte order marks would get them. The encoding of the existing file could be detected and re-used to write the new contents of the file, but based on my initial research into that it is a bit tricky to get right and may be overkill for this. Perhaps a configuration setting to set the encoding to use?

I can make the change and submit a PR but wanted to get opinions before deciding which way to go.

@asbjornu
Copy link
Member

I think the byte order mark shouldn't be touched. Whatever it is (or isn't) shouldn't be changed by GitVersion. The best solution would be to build some heuristic detecting the current BOM before patching the file. Is it really that tricky to do?

@kll
Copy link
Contributor Author

kll commented Oct 31, 2016

When there is a BOM present its fairly simple and straight forward. It only gets tricky when there is no BOM at all and you must look deeper into the file and essentially "guess" what the encoding is based on what the data looks like. Here is a GIST where someone did a fairly thorough attempt at encoding detection to show how tricky it really is.
https://gist.github.com/TaoK/945127

@wahmedswl
Copy link

Hi,
GitVersion when updated AssemblyInfo.cs, it changed its encoding. This might be related to this issue as it should just only update necessary information without altering anything else.

Attaching screenshot for reference

gitversion

@asbjornu
Copy link
Member

asbjornu commented Jan 3, 2017

@wahmedswl: The encoding is the same (UTF-8), but as @kll has mentioned, the "signature" (byte order mark; BOM) has been removed by GitVersion in the right file.

@wahmedswl
Copy link

@asbjornu thanks for your response. When this gonna be fixed?

@asbjornu
Copy link
Member

asbjornu commented Jan 3, 2017

@wahmedswl: When someone figures out how to fix it. 😃

@matteo-mosca
Copy link

I'm using GitVersion with c# Cake and this bug is breaking my build. After removing the BOM from the AssemblyInfo.cs files the .NetCore compiler gives me an error:

error SA1412: Store files as UTF-8 with byte order mark

And the build breaks.

@kll
Copy link
Contributor Author

kll commented Feb 1, 2017

I'm finally back to taking a look at this. I had decided to try a minimal effort of just preserving the encoding if it was easily detectable with byte order marks and falling back to the existing behavior if not, rather than going into all the crazy English language heuristic based detection. I think it is overkill and not worth the complexity and speed loss for a use case that may never even be needed. I would be willing to bet that everyone that actually cares about the encoding being used is using byte order marks to make it explicit. This proposed change would still be an improvement and the heuristic detection could always be added later if there is demand for it.

However, once I started to dig into it I realized there is two different existing behaviors depending on if you are using GitVersionExe or GitVersionTask. The exe creates UTF8 without BOM and the task creates UTF8 with BOM! That led me to finding issue #883 that points out the need to consolidate the two implementations so I'm trying to decide if I want to tackle that one while I'm into this or not.

@asbjornu What do you think of this middle ground approach? And depending on if I tackle #883 or not how should I reconcile the two existing behaviors for the fallback? If I leave them separate I can either preserve both existing behaviors or standardize them. If I combine them obviously one or the other will have to be chosen. I vote for standardizing on falling back to UTF8+BOM across the board because it is more explicit.

@asbjornu
Copy link
Member

asbjornu commented Feb 1, 2017

@kll: If you could smack both this bug and fix #883 in the same PR, that would be awesome!

@kll
Copy link
Contributor Author

kll commented Feb 1, 2017

I chickened out and did not attempt to refactor anything for #883. I simply don't have enough experience with MSBuild or have even used GitVersionTask so I'm not sure how many of the differences between the two code paths are important or arbitrary. I might still look at that one later for fun but I don't want it to hold up this fix.

@matteo-mosca
Copy link

I vote for saving with the BOM, as stripping the BOM generates SA1412 violation in code analysis, and most people use "warnings as errors" thus being unable to build successfully for any violation.

UTF-8 with BOM is also a more explicit format, so I see no reason to not use it.

@JakeGinnivan
Copy link
Contributor

Nice investigations and PR everyone. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants