A Guide to Salesforce Duplicate Management in the Age of Data Cloud

Despite the abundance of resources on Salesforce duplicate management, Customer 360, and Data Cloud, organizations continue to face recurring challenges.

Often, they dive into duplicate management without fully understanding their data, leading to mistakes. Many assume that consolidating information into a single record is the answer, but this approach can result in lost context and increased operational costs. Additionally, they tend to assess duplicates in isolation, missing out on potentially superior solutions within the broader Salesforce platform. How can these issues be mitigated?

Salesforce Match Rules

To manage duplicates effectively, it’s crucial to understand the basics of matching rules. These rules apply across Salesforce’s native tools, third-party solutions, and Data Cloud’s identity resolution features. By defining criteria using two or more fields, you can identify records as duplicates if they match or are sufficiently similar.

For example, names can be matched using Exact or Fuzzy logic (e.g., Bob vs. Robert). Tools like Salesforce Data Cloud intelligently handle formatting differences, recognizing similar phone numbers as identical, despite variations in how they are written (e.g., +1 415-555-1212 vs. (415) 555-1212).

The design of your match rules significantly impacts the outcome. Using multiple fields can enhance match accuracy, but overly stringent criteria may cause legitimate matches to be overlooked. On the other hand, minimal criteria can lead to false positives, particularly if the data in those fields is unreliable or incorrect, such as common placeholder emails like ‘noemail@noemail.com’ or ‘idk@idk.com.’

Consider a simple example: If you have multiple email or phone fields in your org, or need to account for regional or ethnic naming conventions (e.g., Anglican vs. Latin vs. Middle-Eastern), you’ll need to evaluate the outcomes for different match rule scenarios:

Match Rule Scenario	Outcome
Exact First Name, Exact Last Name, Exact Email	No records match. Omitting fields that could be used in matching leads to lost opportunities.
Fuzzy First Name, Exact Last Name, Exact Email, Exact Phone, Exact Mobile	No records match. Comparing data only within the same fields means a denormalized data model (i.e., Phone vs. Mobile fields), leading to lost opportunities.
Fuzzy First Name, Exact Last Name, Exact Email Or Fuzzy First Name, Exact Last Name, Exact Phone (normalized) Or Fuzzy First Name, Exact Last Name, Exact Address (normalized)	The first two records match even though the email addresses differ. This may or may not be a correct match, depending on whether the address or phone number are personal or corporate contact points. The scenario you most want to avoid is one where three records match and merge, resulting in lost underlying details.

DO ensure that your match rules are comprehensive yet flexible enough to capture all potential duplicates without being overly restrictive.

DO NOT create overly stringent rules that might exclude valid matches or fail to distinguish between unique entries.

Identify Fields Impactful in Matching

As a CRM admin or architect, it’s important to familiarize yourself with the various contact points within Contact, Lead, or Account records. Beyond standard Email and Phone fields, additional URL or String fields might be useful for matching. Data profiling techniques, as detailed in this article, can help you identify the most effective fields for matching by analyzing data types, distinct ratios, and PII classifications.

Here’s an example of a Custom Dashboard based on Cuneiform for CRM data profiling statistics:

DO use data profiling to identify and utilize the most effective fields for duplicate matching.

DO NOT overlook additional fields that could provide crucial matching data.

Once you’ve identified your key fields, the next step is to understand the type of data stored within them. For instance, if you’re working with a Contact or Lead record, is the Phone or Email field for the individual, the associated Account, or a mix of both?

If there are clear rules you can establish (e.g., for a specific record type, it’s 90% about the individual, but for another record type, it’s about the organization), you can define more precise matching rules based on the technology you’re using.

Identify Problematic Field Values in Matching

Common issues, like defaulting business Contact or Lead information to company addresses, can increase the risk of incorrect matches. Mandatory validation rules that force users to fill fields often result in invalid entries, while personal email addresses instead of official ones can further compromise data integrity.

Data profiling can uncover frequently occurring field values and guide efforts to clean up or reevaluate these entries.

DO identify contact point values that disproportionately appear in Phone, Email, or Address fields.

DO assess and classify field values as invalid (verifiable junk), wrong context (about the organization vs. person), or valid.

DO clean up your data in the system of record when possible.

DO NOT perform these actions in production first without a backup for recovery.

To Merge or Not to Merge

Deciding whether to merge duplicates depends on several factors:

If duplicates represent the same individual in different contexts or roles.
The accuracy of the matching outcomes.
The potential data loss from merging accurate matches.

In cases where merging could obscure or lose critical data, consider maintaining separate records or using a unified profile approach, like Data Cloud’s Key Ring, which preserves original records while linking them to a unified profile.

DO carefully evaluate each potential duplicate case to determine the appropriate action.

DO NOT rush into merging records without considering the broader implications for data integrity and user needs.

For example, you may want to have a unified understanding of your interactions with Sam and/or Samantha Smith. However, you must do this without losing various email addresses, phone numbers, or address details. If you merge the records, you’d need to choose what to keep. Adding more email or phone number fields can quickly become unwieldy.

If you don’t have Data Cloud, you can use in-platform matching rules to identify records that appear related without merging. You can then use the match-link instead of the match-merge pattern to show the related transactional records based on the dedupe key. While this approach works, it may involve development that requires investment. Data Cloud offers a productized alternative, where the keyring approach keeps the source records intact, creates a unified profile, maintains the source record-to-unified profile relationship, and re-establishes the relationships as new information becomes available.

Data Cloud’s Unified Profile Approach – A Better Alternative to Merging Records

Creating a unified profile enables the maintenance of multiple contexts for the same entity, which is essential when dealing with the same person or organization across different scenarios. This approach allows for the easy mapping of various contact points from a denormalized record to a normalized profile.

Data Cloud uses match rules similar to those in Salesforce CRM, but with a key distinction: matched records contribute to a unified profile, and updates to source records are dynamically reflected in this profile. For example, new information can correct associations automatically, ensuring accuracy in profile management. This method ensures that only verified contact points remain, leading to clean and reliable data profiles.

In our earlier example, the practical implications of this data model would be:

We can map any denormalized contact point field in our source data model to the normalized contact point model for matching.
Data Cloud will normalize phone numbers whenever possible, providing a standardized, unified view from various input formats, resulting in a complete and consistent data set.
As new information becomes available at the source record level, Data Cloud will re-match updated source records, continuously providing the highest quality unified profile possible.

DO follow Data Cloud contact point mapping best practices to maintain data lineage while applying robust, scalable logic for data cleansing or standardization.

DO NOT lose other valid contact point information, even if it’s irrelevant for matching, such as Business Phone or Address details.

DO consider filtering out repeat values using a formula field in Data Streams.

Do You Really Need to Merge Records?

Often, the impulse to merge CRM records stems from the need for automation interfaces to correctly associate transactions with CRM records or to address user complaints about dealing with incomplete or inconsistent duplicate records. However, it’s crucial to understand your data thoroughly before merging to avoid errors:

DO assess whether a single-record approach is feasible without introducing errors or losing valuable information.

DO NOT overlook the complexity required to correct or prevent data loss from previous integration mistakes.

DO consider shifting your automation and profile linking processes to utilize unified profiles in Data Cloud.

DO NOT neglect the security protocols that require data segregation, often preventing merging.

DO deliver a holistic view of business transactions in your CRM using Data Cloud Related Lists.

What Happens After Data Cleansing?

After cleansing your data and determining which fields are reliable for match rules, you might find that some records no longer have enough populated fields for effective matching. It’s essential to identify and classify these records based on their matchability and importance to your business operations.

To determine if a record has valid or invalid contact points, you can implement count or categorization formulas. This is applicable both in Salesforce CRM and Data Cloud, as both support formula fields.

Use your data profiling insights to design formulas that continuously identify bad data when it shows up in your org. Use lengths and string patterns as appropriate.

DO utilize the matchability formula as a critical filter in your duplicate management strategy. This approach