Data Silos: Joins
The good news about combining data from different sources is that every part of your business should have something in common. Perhaps it’s your product names, employees or customers, but the same pieces of information should appear across many different systems (if not all of them). We can use the common elements of data to join (hence “joins”) the two data sources together.
For example, let’s say we work at an e-commerce company and want to understand how our email marketing campaigns affect customer support request. Do some of our promotions create more support problems than others? To answer that, we’ll need to combine the records between our customer support system and email marketing system so we know which customers received emails and contacted customer support. Below are two example records, one from each system:
While much of the data between the two systems is different, you will notice that both use email address as part of their record. Since email addresses should be unique to a specific customer, we can use that information to combine everything we know from both systems together.
That was easy!
Unfortunately, you will rarely have the luxury of a unique identifier like email address available across systems by default. To ensure these identifiers are available, you will need to make sure that identifier is used as broadly as possible in both your workflows and in your data systems. It’s no coincidence that companies assign “Employee ID Numbers” and hospitals use “Patient ID Numbers” since they have high incentive to guarantee there is a unique identifier across every data system.
Tomorrow we’ll talk about more advanced ways to break down data silos even when there is no unique identifier available. Get out your detective kits, it’s time to track some fingerprints!
Quote of the Day: “I’d never join a club that would allow a person like me to become a member.” ― Woody Allen