diff --git a/src/_data/sidenav/main.yml b/src/_data/sidenav/main.yml index 328e8041a5..fde195b726 100644 --- a/src/_data/sidenav/main.yml +++ b/src/_data/sidenav/main.yml @@ -180,10 +180,12 @@ sections: title: Identity Resolution Use Cases - path: /personas/identity-resolution/externalids title: Identity Resolution External IDs - - path: /personas/identity-resolution/identity-graph-rules - title: Identity Graph Rules + - path: /personas/identity-resolution/identity-resolution-settings + title: Identity Resolution Settings - path: /personas/identity-resolution/ecommerce-example title: Identity Resolution E-Commerce Example + - path: /personas/identity-resolution/personas-space-set-up + title: Personas Space Set Up - path: /personas/activation title: Activation - path: /personas/warehouses diff --git a/src/personas/identity-resolution/externalids.md b/src/personas/identity-resolution/externalids.md index e04ea78739..9b7802af0d 100644 --- a/src/personas/identity-resolution/externalids.md +++ b/src/personas/identity-resolution/externalids.md @@ -9,20 +9,20 @@ The Identity Graph creates or merges profiles based on externalIDs. ExternalIDs We automatically promote the following traits and IDs in track and identify calls to externalIDs: -| External ID Type | Message Location in Track or Identify Call | -|-----------------------|-----------------------------------------------------------------------| -| anonymous_id | anonymousId | -| user_id | userId | -| group_id | groupId | -| cross_domain_id | cross_domain_id | -| email | traits.email or context.traits.email | -| android.id | context.device.id when context.device.type = 'android' | -| ios.id | context.device.id when context.device.type = 'ios' | -| android.push_token | context.device.token when context.device.type = 'android' | -| ios.push_token | context.device.token when context.device.type = 'ios' | -| android.idfa | context.device.advertisingId when context.device.type = 'android' AND context.device.adTrackingEnabled = true | -| ios.idfa | context.device.advertisingId when context.device.type = 'ios' AND context.device.adTrackingEnabled = true -| ga_client_id | context.integrations['Google Analytics'].clientId when explicitly captured by users | +| External ID Type | Message Location in Track or Identify Call | +| ------------------ | ------------------------------------------------------------------------------------------------------------- | +| user_id | userId | +| email | traits.email or context.traits.email | +| android.id | context.device.id when context.device.type = 'android' | +| android.idfa | context.device.advertisingId when context.device.type = 'android' AND context.device.adTrackingEnabled = true | +| android.push_token | context.device.token when context.device.type = 'android' | +| anonymous_id | anonymousId | +| cross_domain_id | cross_domain_id | +| ga_client_id | context.integrations['Google Analytics'].clientId when explicitly captured by users | +| group_id | groupId | +| ios.id | context.device.id when context.device.type = 'ios' | +| ios.idfa | context.device.advertisingId when context.device.type = 'ios' AND context.device.adTrackingEnabled = true | +| ios.push_token | context.device.token when context.device.type = 'ios' | ## Custom ExternalIDs diff --git a/src/personas/identity-resolution/identity-graph-rules.md b/src/personas/identity-resolution/identity-graph-rules.md deleted file mode 100644 index cd06f04255..0000000000 --- a/src/personas/identity-resolution/identity-graph-rules.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -title: Identity Graph Rules ---- -## Searching for Matching Profiles - -When a new Track or Identify call flows through a source to Personas, the Identity Graph automatically searches for any pre-existing user profiles with one or more matching externalIds. - -There are three cases that can occur: - -**Case One: Create New Profile** -When there are no pre-existing profiles that have matching identifiers to the event, we create a new user profile. - -**Case Two: Add Event to Existing Profile** -When there is only one profile that matches all identifiers in an event, we map the traits, identifiers and events on the call to that existing profile. - -**Case Three: Merge Existing Profiles** -When there are multiple profiles that match the identifiers in an event, we attempt to merge profiles. However, we first check the three rules we have in place to protect the identity graph from inaccurate merges. - -One common example of a use-case that can cause inaccurate merges is the Shared iPad setup. For example, many companies now have iPads available in-store for customers to register for an account or submit order information. If different users submit information on the same device, there will now be multiple events sent with the same deviceID. Without merge protection rules in place, we would see all these different users merged into the same user profile based on this common identifier. - -Thus, we have default checks set in place to ensure that user profiles maintain their integrity. - -First, we check to make sure that the profiles matching the identifiers won't have more than 100 merges in total. - -Next, we check to make sure that user_id is unique on these merged profiles. When there exists more than one user_id, we demote the lowest priority identifier from the event, and look again for profiles that now match the remaining identifiers. By default, the order of trust priority from highest to lowest is user_id, followed by email and then followed by all other identifiers. - -Lastly, we check to make sure that there are no more than 5 values per identifier on the event in the final profile. For example, if an event is sent in with email and ios.id, we'll make sure that we don't have more than five emails or ios.ids on the final merged profile. If we find that we have more than 5 values per matched identifier, we'll demote that inputted externalId and try to find matching profiles with the remaining identifiers. diff --git a/src/personas/identity-resolution/identity-resolution-settings.md b/src/personas/identity-resolution/identity-resolution-settings.md new file mode 100644 index 0000000000..8cf1eb87de --- /dev/null +++ b/src/personas/identity-resolution/identity-resolution-settings.md @@ -0,0 +1,157 @@ +--- +title: Identity Resolution Settings +--- +# Setting Up Identity Graph Rules +Before connecting a source to a Personas space, we recommend first reviewing our default Identity settings and configuring custom rules as needed. Updates to configurations will only be applied to all new data flowing through the space after the changes have been saved. Thus, if this is your first time setting up your Identity Graph, we recommend getting started with a *Dev* space [here](/docs/src/personas/identity-resolution/personas-space-set-up.md). + +> note "" +> **Note:** The Identity Resolution table can only be edited by workspace owners and users with the Identity Admin role. + +## ExternalIDs +Segment creates and merges user profiles based on externalIDs we use as identifiers. You can view these externalIDs in the Identities tab of a User Profile in the User Explorer: + +![](images/jane_doe_new_identities.png) + +By default, Segment promotes the following traits and IDs in track and identify calls to be used as externalIDs: + +| External ID Type | Message Location in Track or Identify Call | +| ------------------ | ------------------------------------------------------------------------------------------------------------- | +| user_id | userId | +| email | traits.email or context.traits.email | +| android.id | context.device.id when context.device.type = 'android' | +| android.idfa | context.device.advertisingId when context.device.type = 'android' AND context.device.adTrackingEnabled = true | +| android.push_token | context.device.token when context.device.type = 'android' | +| anonymous_id | anonymousId | +| cross_domain_id | cross_domain_id | +| ga_client_id | context.integrations['Google Analytics'].clientId when explicitly captured by users | +| group_id | groupId | +| ios.id | context.device.id when context.device.type = 'ios' | +| ios.idfa | context.device.advertisingId when context.device.type = 'ios' AND context.device.adTrackingEnabled = true | +| ios.push_token | context.device.token when context.device.type = 'ios' | + +You'll notice that these identifiers have the *Provided by Segment* label next to it under *Identifier Type*. + +To create your own custom externalID, click on *Add Identifier*. + +These custom identifiers must be sent in the custom `externalIds` in the `context` object of any call to our API. The four fields below (id, type, collection, encoding) are all required: + +| Key | Value | +| ---------- | ---------------------------------------------------------------------------- | +| id | value of the externalID | +| type | name of externalID type (`app_id`, `ecommerce_id`, `shopify_id`, etc) | +| collection | `users` if a user-level identifier or `accounts` if a group-level identifier | +| encoding | `none` | + +The following example payload adds a custom `phone` externalID type: + +``` js +analytics.track('Subscription Upgraded', { + plan: 'Pro', + mrr: 99.99 +}, { + externalIds: [ + { + id: '123-456-7890', + type: 'phone', + collection: 'users', + encoding: 'none' + } + ] +}) +``` +We recommend adding custom externalIDs to the Identity Resolution table *before* events containing this identifier flow through the space. Once an event with a new type of externalID flows into the space, the externalID will automatically be added to the table if it wasn't manually added. However, when the externalID is automatically added, it will default to our preset priority and limit, as explained below. + +## Flat Matching Logic +When a new event flows into Personas, we look for any profiles that match any of the identifiers on the event. + +There are three cases that can occur: + +**Case One: Create New Profile** +When there are no pre-existing profiles that have matching identifiers to the event, we create a new user profile. + +**Case Two: Add Event to Existing Profile** +When there is only one profile that matches all identifiers in an event, we attempt to map the traits, identifiers and events on the call to that existing profile. If there is an excess of any identifier on the final profile, we defer to our merge protection rules outlined below. + +**Case Three: Merge Existing Profiles** +When there are multiple profiles that match the identifiers in an event, we attempt to merge profiles and first check our merge protection rules as outlined below. + +## Merge Protection Rules +Identity Admins should review and configure the merge protection rules in the Identity Resolution Settings page to protect the identity graph from inaccurate merges. + +One common example of a use-case that can cause inaccurate merges is the Shared iPad setup. For example, many companies now have iPads available in-store for customers to register for an account or submit order information. If different users submit information on the same device, there will now be multiple events sent with the same deviceID. Without merge protection rules in place, we might see all these different users merged into the same user profile based on this common identifier. + +Our three merge protection rules allow Identity Admins to block incorrect values from causing incorrect merges, to set the maximum number of values allowed per externalID, and to customize the priority of these externalIDs. + +### Blocked Values +We recommend proactively blocking certain values from ever been used as identifiers. While these values will remain in the payload on the event itself, it will not be promoted to the externalID object Segment uses to determine user profiles. + +This is extremely important when developers have a hard-coded value for fields like user_id during QA or development that then erroneously makes it production. This can cause hundreds of profiles to merge incorrectly and can have costly consequences when these spaces are already feeding data into a production email marketing tool or push notification tool downstream. + +In the past, we've seen certain default values across many different customers cause mass amounts of profiles to merge incorrectly. Segment suggests that for every externalID, customers opt into automatically blocking the following suggested values: + +| Value | Type | +| ----------------------------- | --------------- | +| Zeroes and Dashes (`^[0\-]*$) | Pattern (REGEX) | +| -1 | Exact Match | +| null | Exact Match | +| anonymous | Exact Match | + +Before sending data through, we also recommend adding any default hard-coded values that your team uses during the development process. In the UI today, the Identity Admin can add blocks against only exact matches to the inputted value. + +However, we currently have Limited Availability for a feature that allows customers to create custom REGEX blockers as well. This can be useful for cases where an externalID was incorrectly implemented. To enable this, please reach out to your customer success manager. + +### Limit + +Identity Admins can specify the total number of values allowed per externalID type on a profile. This will vary depending on how companies define a user today. In most cases, companies rely on `user_id` to distinguish user profiles and Segment recommends the following default configurations: + +| ExternalID | Limit | +| --------------------- | ----- | +| user_id | 1 | +| all other identifiers | 5 | + +There are specific cases that will deviate from this default. For example, a case where a user can have more than one user_id but only one email, like when a user is defined by both their shopify_id and an internal UUID. In this case, an example setup may be: + +| ExternalID | Limit | +| --------------------- | ----- | +| email | 1 | +| user_id | 2 | +| all other identifiers | 5 | + +We offer a Limited Availability release of a new type of limit called "Time-Based Limits". These limits allow any profile to have a specified limit of values within a certain time range. This is particularly useful for `externalIDs` such as `anonymousID` and `ga_client_id` which are constantly renewed and collected over a lifetime. We know that customers can easily collect over 100 `anonymous_ids` over a few years of app usage. However, rather than setting the absolute limit of `anonymous_ids` to 100, we can now create a sliding range that intelligently gates how many `anonymous_ids` any user should reasonably collect within a specific time period. + +### Priority + +The priority of an identifier is taken into consideration once we exceed the limit of any identifier on the final profile. + +Let's take as an example a Personas space with the following Identity Resolution configurations: + +| ExternalID | Limit | Priority | +| ------------ | ----- | -------- | +| user_id | 1 | 1 | +| email | 5 | 2 | +| anonymous_id | 5 | 3 | + +A profile already exists with user_id `abc123` and email `jane@example1.com`. A new event comes in with new user_id `abc456` but the same email `jane@example1.com`. If we mapped this event to this profile, the resulting profile would then contain two user_ids and one email. Given that user_id has a limit of 1, we've now exceeded the limit of an identifier so check the priority of these identifiers. Because email and user_id are the only two identifiers on the event and email is ranked lower than user_id, we demote email as an identifier on the incoming event and try again. + +At this point, the event searches for any profiles that match just the identifier user_id `abc456`. Now there are no existing profiles with this identifier, so a new profile is created with user_id `abc456`. + +By default, we explicitly order user_id and email as rank `1` and `2`, respectively. All other identifiers are in alphabetical order beginning from rank `3`. This means that if the only identifiers ever sent in on events flowing into personas are user_id, email, anonymous_id and ga_client_id, the rank would be as follows: + +| ExternalID | Priority | +| ------------ | -------- | +| user_id | 1 | +| email | 2 | +| anonymous_id | 3 | +| ga_client_id | 4 | + +If a new android.id identifier appeared without first giving it explicit order, the order would automatically reshuffle to: + +| ExternalID | Priority | +| ------------ | -------- | +| user_id | 1 | +| email | 2 | +| android.id | 3 | +| anonymous_id | 4 | +| ga_client_id | 5 | + +Thus, if you require an explicit order for all identifiers, configure this in the Identity Resolution settings page before sending in events. diff --git a/src/personas/identity-resolution/identity-warehouse.md b/src/personas/identity-resolution/identity-warehouse.md index 403ea435b9..dd32fffeab 100644 --- a/src/personas/identity-resolution/identity-warehouse.md +++ b/src/personas/identity-resolution/identity-warehouse.md @@ -35,7 +35,7 @@ To see all the identifiers associated with a certain user, first look up the `se with t1 AS (SELECT segment_id FROM personas_identities.users_identities - WHERE external_id_value = 'jane.doe@gmail.com') + WHERE external_id_value = 'jane.doe@example1.com') SELECT u.segment_id, created_source, external_id_type, external_id_value FROM personas_identities.users_identities u JOIN t1 on u.segment_id = t1.segment_id diff --git a/src/personas/identity-resolution/personas-space-set-up.md b/src/personas/identity-resolution/personas-space-set-up.md new file mode 100644 index 0000000000..5e3e08a0be --- /dev/null +++ b/src/personas/identity-resolution/personas-space-set-up.md @@ -0,0 +1,33 @@ +--- +title: Personas Space Set Up +--- +## Step One: Create a New Dev Space + +When starting with Personas, begin by creating a *Dev* space. This will be your sandbox instance of Personas to test new Identity settings, audiences and traits before applying the same changes to a *Prod* space that would immediately affect production data flowing to downstream destinations. + +## Step Two: Configure Identity Settings + +Before you connect any source to the Dev space, we recommend that you first start by reviewing and configuring your Identity settings, as changes to the Identity rules will only be applied to new events received following any updates. Read more on those settings [here](/docs/personas/identity-resolution/identity-graph-rules). + +## Step Three: Set Up a Connection Policy + +If you haven't already, we highly recommend labeling all your sources with *Dev* or *Prod* [environments](/docs/segment-app/iam/labels). Once your sources have been labeled, navigate to the Connection Policy page in the Personas space settings. Here, you can enforce that only sources labeled *Dev* can be connected to your *Dev* Personas instance. + +[](images/connection-policy.png) + +> note "" +> **Note:** The Identity Resolution table can only be edited by workspace owners and users with the Identity Admin role. + +## Step Four: Connect Sources and Create Test Audiences + +Once your Connection Policy is in place, click on the Sources tab in space settings. Now you can connect a few Sources that will automatically begin to replay. + +Once the Sources have finished replaying, check user profiles to ensure that profiles are merging as expected. This would also be an ideal time to create test audiences and confirm that these populate the expected number of users. + +## Step Five: Connect Audiences to a Dev Instance of a Downstream Destination + +Connect test audiences or traits to a dev instance of your downstream destination. Confirm that users are appearing as expected. + +## Step Six: Apply Changes to Prod sources + +Once everything looks good to go, create a new *Prod* space, following all the same steps above, and connect a live instance of your downstream destination to your *Prod* space. diff --git a/src/personas/images/connection-policy.png b/src/personas/images/connection-policy.png new file mode 100644 index 0000000000..c34dd554e5 Binary files /dev/null and b/src/personas/images/connection-policy.png differ