Skip to content
This repository was archived by the owner on Apr 20, 2023. It is now read-only.

Conversation

adamgorMSFT
Copy link

@adamgorMSFT adamgorMSFT commented Jun 10, 2016

Currently collected telemetry has no ability to accurately differentiate users or machines.

Adding a hashed machine ID to collected telemetry to improve ability to differentiate distinct machines(users)

"exclude": "compile"
}
},
"System.Net.NetworkInformation": "4.1.0-rc2-*"

This comment was marked as spam.

This comment was marked as spam.

@kevinchalet
Copy link

Am I really the only guy having a problem with all these crazy telemetry-related PRs?

Just like command line arguments, sending machine/environment-specific things like MAC addresses sucks really hard from a privacy perspective, even if they are hashed.

If you really want to be able to correlate collected traces, why not simply generating a unique identifier when installing .NET CLI and storing it somewhere in the user profile? Not perfect, but still much better than using MAC addresses for that.

@guardrex
Copy link

@PinpointTownes You're not the only one upset about these developments. It doesn't bother me that the program exists and provides MS with basic usage information, but it does bother me when sensitive, machine/location-identifiable information is sent to MS and the program is still being operated covertly.

The combination of ...

  1. Automatic activation of telemetry with no notice to the dev/sysop, and
  2. No explicit notice in the installer (or option to set the opt-out env var on install)

... makes the way that the program is being managed dangerous to Microsoft's objectives.

@benaadams
Copy link
Member

Just store an reuse a guid? So its specific but more anonymous

@Rutix
Copy link

Rutix commented Jun 12, 2016

I have to agree with the ones before me. If the only purpose really is only to correlate data, storing a guid and using that would be way more privacy friendly than using the MAC address.

@attilah
Copy link

attilah commented Jun 12, 2016

It should be OPT_IN and not OPT_OUT. See the case of the Visual C++ team.

https://www.reddit.com/r/cpp/comments/4ibauu/visual_studio_adding_telemetry_function_calls_to/

Usually people takes telemetry as spying on them.

@Yantrio
Copy link

Yantrio commented Jun 12, 2016

Why does microsoft need to identify distinct machines ? Is there any documentation or discussion we can see to find out how and why this conclusion was drawn?

@benaadams
Copy link
Member

benaadams commented Jun 12, 2016

@Yantrio I'd guess because ip address isn't a very good measure as you will get large activation clumping behind NATs/gateways/proxies. Much like you use a cookie in a browser rather than ip address to derive any sensible website per user usage stats.

@Yantrio
Copy link

Yantrio commented Jun 13, 2016

My question isn't so much as to why it's required to use mac addresses vs IP, it's more as to why at all? is it important to know what a specific user is doing?

@adamgorMSFT adamgorMSFT changed the title Adamgor/telemetry machine ids telemetry machine ids Jun 13, 2016
@adamgorMSFT adamgorMSFT changed the title telemetry machine ids add machine ids to telemetry Jun 13, 2016
@adamgorMSFT adamgorMSFT changed the title add machine ids to telemetry Adding machine ids to telemetry Jun 13, 2016
@adamgorMSFT
Copy link
Author

adamgorMSFT commented Jun 13, 2016

@Yantrio, as already said in the other discussions linked below, "We're only interested in aggregate data that we can use to identify trends". Aggregated data can be quite useful for many different reasons. For example, it can help engineers prioritize features that will make the product better.

Also, there are several other discussions and documentation on this and other topics mentioned.
#2145, https://github.com/dotnet/cli/issues/3093, https://github.com/dotnet/cli/issues/3404
https://blogs.msdn.microsoft.com/dotnet/2016/05/16/announcing-net-core-rc2/#telemetry

@kevinchalet
Copy link

@Yantrio, as already said in the other discussions linked below, "We're only interested in aggregate data that we can use to identify trends". Aggregated data can be quite useful for many different reasons.

That doesn't explain why a random identifier wouldn't work in this case.

For example, it can help engineers prioritize features that will make the product better.

You have GitHub, Uservoice and a bunch of other channels to hear what your community needs or wants 😄

@benaadams
Copy link
Member

benaadams commented Jun 17, 2016

You have GitHub, Uservoice and a bunch of other channels to hear what your community needs or wants

That doesn't capture what people use most, only what a louder subset talks about most :)

@adamgorMSFT
Copy link
Author

adamgorMSFT commented Jun 17, 2016

@PinpointTownes, Github and other community/user feedback sources are indeed leveraged quite a lot already. Though they typically convey a different set of information. It is more informative and accurate to have a complete picture from multiple sources, instead of just 1. Having just 1 source can sometimes paint an inaccurate and biased scenario.

You have GitHub, Uservoice and a bunch of other channels to hear what your community needs or wants 😄

Proof that the community feedback already influences decisions; you mentioned previously about "command line arguments". They were ultimately dropped from that earlier pull request. The decision to drop it was largely influenced because of community response.

That doesn't explain why a random identifier wouldn't work in this case.

Though it would for the core usages, it won't for all. But there are other scenarios it doesn't and that is a contributing factor why its done this specific way. For 1 example, there is interest in VsCode/CLI correlation, as they are both popular to be used together. So if we want to correlate against that already collected data, that would be 1 reason to collect a comparable value.

@richlander
Copy link
Member

richlander commented Jun 30, 2016

Thanks everyone for the feedback. I'm closing this PR now. I'll tell you why.

A little bit of context first. We added telemetry to the .NET Core Tools in RC2. I appreciated the feedback that folks made on that, which helped us to create a better product. #1 focus on the telemetry front is sharing usage data with you. We will not even consider making any more changes until that job is done. That's a promise.

Now, to this change. You can see that the change came in 20 days ago. This was during the height of shipping .NET Core 1.0. As a result, the PM team was super focussed on shipping 1.0 and didn't talk to the folks working on this PR. This is a poor excuse. I know you expect more from us.

With 1.0 out the door, we're going to take a closer look at our telemetry plan, both in terms of the actual telemetry and community engagement. We need telemetry for this product, however, "community engagement by PR" is not working. We need to adopt a different model for engagement. This is your .NET, and that needs to be more obvious from our engagement.

I ask for your continued engagement on this topic and to extend your patience a little bit longer.

Thanks everyone.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants