Adding machine ids to telemetry #3494

adamgorMSFT · 2016-06-10T17:56:11Z

Currently collected telemetry has no ability to accurately differentiate users or machines.

Adding a hashed machine ID to collected telemetry to improve ability to differentiate distinct machines(users)

src/dotnet/project.json

      "exclude": "compile"
-    }
+    },
+	"System.Net.NetworkInformation": "4.1.0-rc2-*"


kevinchalet · 2016-06-12T17:56:27Z

Am I really the only guy having a problem with all these crazy telemetry-related PRs?

Just like command line arguments, sending machine/environment-specific things like MAC addresses sucks really hard from a privacy perspective, even if they are hashed.

If you really want to be able to correlate collected traces, why not simply generating a unique identifier when installing .NET CLI and storing it somewhere in the user profile? Not perfect, but still much better than using MAC addresses for that.

guardrex · 2016-06-12T18:29:34Z

@PinpointTownes You're not the only one upset about these developments. It doesn't bother me that the program exists and provides MS with basic usage information, but it does bother me when sensitive, machine/location-identifiable information is sent to MS and the program is still being operated covertly.

The combination of ...

Automatic activation of telemetry with no notice to the dev/sysop, and
No explicit notice in the installer (or option to set the opt-out env var on install)

... makes the way that the program is being managed dangerous to Microsoft's objectives.

benaadams · 2016-06-12T19:13:54Z

Just store an reuse a guid? So its specific but more anonymous

Rutix · 2016-06-12T19:17:45Z

I have to agree with the ones before me. If the only purpose really is only to correlate data, storing a guid and using that would be way more privacy friendly than using the MAC address.

attilah · 2016-06-12T21:17:58Z

It should be OPT_IN and not OPT_OUT. See the case of the Visual C++ team.

https://www.reddit.com/r/cpp/comments/4ibauu/visual_studio_adding_telemetry_function_calls_to/

Usually people takes telemetry as spying on them.

Yantrio · 2016-06-12T21:43:20Z

Why does microsoft need to identify distinct machines ? Is there any documentation or discussion we can see to find out how and why this conclusion was drawn?

benaadams · 2016-06-12T23:06:11Z

@Yantrio I'd guess because ip address isn't a very good measure as you will get large activation clumping behind NATs/gateways/proxies. Much like you use a cookie in a browser rather than ip address to derive any sensible website per user usage stats.

Yantrio · 2016-06-13T07:52:31Z

My question isn't so much as to why it's required to use mac addresses vs IP, it's more as to why at all? is it important to know what a specific user is doing?

adamgorMSFT · 2016-06-13T21:22:49Z

@Yantrio, as already said in the other discussions linked below, "We're only interested in aggregate data that we can use to identify trends". Aggregated data can be quite useful for many different reasons. For example, it can help engineers prioritize features that will make the product better.

Also, there are several other discussions and documentation on this and other topics mentioned.
#2145, https://github.com/dotnet/cli/issues/3093, https://github.com/dotnet/cli/issues/3404
https://blogs.msdn.microsoft.com/dotnet/2016/05/16/announcing-net-core-rc2/#telemetry

kevinchalet · 2016-06-17T16:53:20Z

@Yantrio, as already said in the other discussions linked below, "We're only interested in aggregate data that we can use to identify trends". Aggregated data can be quite useful for many different reasons.

That doesn't explain why a random identifier wouldn't work in this case.

For example, it can help engineers prioritize features that will make the product better.

You have GitHub, Uservoice and a bunch of other channels to hear what your community needs or wants 😄

benaadams · 2016-06-17T18:51:35Z

You have GitHub, Uservoice and a bunch of other channels to hear what your community needs or wants

That doesn't capture what people use most, only what a louder subset talks about most :)

adamgorMSFT · 2016-06-17T21:44:32Z

@PinpointTownes, Github and other community/user feedback sources are indeed leveraged quite a lot already. Though they typically convey a different set of information. It is more informative and accurate to have a complete picture from multiple sources, instead of just 1. Having just 1 source can sometimes paint an inaccurate and biased scenario.

You have GitHub, Uservoice and a bunch of other channels to hear what your community needs or wants 😄

Proof that the community feedback already influences decisions; you mentioned previously about "command line arguments". They were ultimately dropped from that earlier pull request. The decision to drop it was largely influenced because of community response.

That doesn't explain why a random identifier wouldn't work in this case.

Though it would for the core usages, it won't for all. But there are other scenarios it doesn't and that is a contributing factor why its done this specific way. For 1 example, there is interest in VsCode/CLI correlation, as they are both popular to be used together. So if we want to correlate against that already collected data, that would be 1 reason to collect a comparable value.

richlander · 2016-06-30T23:59:39Z

Thanks everyone for the feedback. I'm closing this PR now. I'll tell you why.

A little bit of context first. We added telemetry to the .NET Core Tools in RC2. I appreciated the feedback that folks made on that, which helped us to create a better product. #1 focus on the telemetry front is sharing usage data with you. We will not even consider making any more changes until that job is done. That's a promise.

Now, to this change. You can see that the change came in 20 days ago. This was during the height of shipping .NET Core 1.0. As a result, the PM team was super focussed on shipping 1.0 and didn't talk to the folks working on this PR. This is a poor excuse. I know you expect more from us.

With 1.0 out the door, we're going to take a closer look at our telemetry plan, both in terms of the actual telemetry and community engagement. We need telemetry for this product, however, "community engagement by PR" is not working. We need to adopt a different model for engagement. This is your .NET, and that needs to be more obvious from our engagement.

I ask for your continued engagement on this topic and to extend your patience a little bit longer.

Thanks everyone.

adamgorMSFT added 3 commits June 8, 2016 16:25

Adding MachineIds info.

2c250d0

Remove tabs, clean up, and fix casing bug

b7a32b7

Merge branch 'rel/1.0.0' into adamgor/TelemetryMachineIds

4a60d98

dnfclas added the cla-already-signed label Jun 10, 2016

Fixing previous bad refactor of names

ae78039

eerhardt reviewed Jun 10, 2016
View reviewed changes

src/dotnet/project.json Outdated

"exclude": "compile"

}

},

"System.Net.NetworkInformation": "4.1.0-rc2-*"

This comment was marked as spam.

Sign in to view

This comment was marked as spam.

Sign in to view

adamgorMSFT added 3 commits June 10, 2016 15:04

Fix version dependency, encoding, and property

6ef307d

Adding comment for reason of less optimal approach.

311acdc

Changing sequential string change to a StringBuilder

f4f5ab3

adamgorMSFT changed the title ~~Adamgor/telemetry machine ids~~ telemetry machine ids Jun 13, 2016

adamgorMSFT changed the title ~~telemetry machine ids~~ add machine ids to telemetry Jun 13, 2016

adamgorMSFT changed the title ~~add machine ids to telemetry~~ Adding machine ids to telemetry Jun 13, 2016

Merge branch 'rel/1.0.0' into adamgor/TelemetryMachineIds

829ff3b

richlander closed this Jun 30, 2016

mmdriley mentioned this pull request Jul 3, 2016

Stop using sha256(MAC address) for telemetry machine ID microsoft/vscode#8688

Closed

Adding machine ids to telemetry #3494

Adding machine ids to telemetry #3494

Uh oh!

Conversation

adamgorMSFT commented Jun 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

kevinchalet commented Jun 12, 2016

Uh oh!

guardrex commented Jun 12, 2016

Uh oh!

benaadams commented Jun 12, 2016

Uh oh!

Rutix commented Jun 12, 2016

Uh oh!

attilah commented Jun 12, 2016

Uh oh!

Yantrio commented Jun 12, 2016

Uh oh!

benaadams commented Jun 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yantrio commented Jun 13, 2016

Uh oh!

adamgorMSFT commented Jun 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevinchalet commented Jun 17, 2016

Uh oh!

benaadams commented Jun 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamgorMSFT commented Jun 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richlander commented Jun 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

adamgorMSFT commented Jun 10, 2016 •

edited

Loading

benaadams commented Jun 12, 2016 •

edited

Loading

adamgorMSFT commented Jun 13, 2016 •

edited

Loading

benaadams commented Jun 17, 2016 •

edited

Loading

adamgorMSFT commented Jun 17, 2016 •

edited

Loading

richlander commented Jun 30, 2016 •

edited

Loading