-
Notifications
You must be signed in to change notification settings - Fork 655
UpdateAssemblyInfo strips byte order mark #1074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think the byte order mark shouldn't be touched. Whatever it is (or isn't) shouldn't be changed by GitVersion. The best solution would be to build some heuristic detecting the current BOM before patching the file. Is it really that tricky to do? |
When there is a BOM present its fairly simple and straight forward. It only gets tricky when there is no BOM at all and you must look deeper into the file and essentially "guess" what the encoding is based on what the data looks like. Here is a GIST where someone did a fairly thorough attempt at encoding detection to show how tricky it really is. |
@wahmedswl: The encoding is the same (UTF-8), but as @kll has mentioned, the "signature" (byte order mark; BOM) has been removed by GitVersion in the right file. |
@asbjornu thanks for your response. When this gonna be fixed? |
@wahmedswl: When someone figures out how to fix it. 😃 |
I'm using GitVersion with c# Cake and this bug is breaking my build. After removing the BOM from the AssemblyInfo.cs files the .NetCore compiler gives me an error: error SA1412: Store files as UTF-8 with byte order mark And the build breaks. |
I'm finally back to taking a look at this. I had decided to try a minimal effort of just preserving the encoding if it was easily detectable with byte order marks and falling back to the existing behavior if not, rather than going into all the crazy English language heuristic based detection. I think it is overkill and not worth the complexity and speed loss for a use case that may never even be needed. I would be willing to bet that everyone that actually cares about the encoding being used is using byte order marks to make it explicit. This proposed change would still be an improvement and the heuristic detection could always be added later if there is demand for it. However, once I started to dig into it I realized there is two different existing behaviors depending on if you are using GitVersionExe or GitVersionTask. The exe creates UTF8 without BOM and the task creates UTF8 with BOM! That led me to finding issue #883 that points out the need to consolidate the two implementations so I'm trying to decide if I want to tackle that one while I'm into this or not. @asbjornu What do you think of this middle ground approach? And depending on if I tackle #883 or not how should I reconcile the two existing behaviors for the fallback? If I leave them separate I can either preserve both existing behaviors or standardize them. If I combine them obviously one or the other will have to be chosen. I vote for standardizing on falling back to UTF8+BOM across the board because it is more explicit. |
I chickened out and did not attempt to refactor anything for #883. I simply don't have enough experience with MSBuild or have even used GitVersionTask so I'm not sure how many of the differences between the two code paths are important or arbitrary. I might still look at that one later for fun but I don't want it to hold up this fix. |
I vote for saving with the BOM, as stripping the BOM generates SA1412 violation in code analysis, and most people use "warnings as errors" thus being unable to build successfully for any violation. UTF-8 with BOM is also a more explicit format, so I see no reason to not use it. |
Nice investigations and PR everyone. Thanks! |
The AssemblyInfo files get overwritten using
File.WriteAllText(file, fileContents)
which writes in UTF8 without a byte-order mark. All of my source files include the BOM to comply with default StyleCop rules and this strips it out causing the files to get flagged.The simple fix for me would be to use the WriteAllText overload that takes an encoding and to pass in Encoding.UTF8, but then people that specifically don't want byte order marks would get them. The encoding of the existing file could be detected and re-used to write the new contents of the file, but based on my initial research into that it is a bit tricky to get right and may be overkill for this. Perhaps a configuration setting to set the encoding to use?
I can make the change and submit a PR but wanted to get opinions before deciding which way to go.
The text was updated successfully, but these errors were encountered: