This blog is written by Data3 Founder and CEO, Helen Tanner.


Most data projects will have multiple data sources and often they are disconnected and disparate. If you leave them disconnected, you’ll only be able to analyse them in isolation. Which could create valuable insight in isolation. But, by connecting the data sources, so they can be analysed together in combination, there is far more potential for you to generate greater and more valuable insight.

Think about being able to combine your financial data with your customer data to understand profit per customer or customer value. Think about connecting your sales data with your marketing data to understand cost per acquisition and ROI. If you don’t connect the data, you’ll never be able to generate cross-departmental business insight.


But where do you begin?

Here are 5 stages to connect your data together to help your business to make quicker, smarter decisions…

  1. DATA CONNECTOR STRATEGY
  2. IDENTIFYING THE UNIQUE IDENTIFIERS
  3. MATCHING DATA
  4. EXTRACTING, TRANSFORMING & LOADING DATA
  5. DEFINING NEW METRICS

Firstly, you need to decide how you want to approach your data project – you can either take a business-led or data-led approach:

Business-led approach – a business-led approach starts with the business requirements. You select all the data sources that seem relevant. You incorporate all data fields within these data sources. You end up with a large dataset. The risk with this approach is that you include more data than you need, increasing your data storage and data processing time. But it is more likely that you’ll deliver results quickly with this approach.

Data-led approach – a data-led approach starts with the specific data fields that are required in the final output. You only include the specific data fields. You, therefore, have a focused dataset. The risk with this approach is that the narrow scope restricts the flexibility to add more data fields in the future. But it is more likely that you’ll end up with a lean, speedy, low-cost end result.

Which one is the best data connector strategy for your business? It will depend on your data project and your business requirements. If you want the quickest solution, go for the business-led approach. If you want the leanest dataset you can get, go for the data-led approach.

Need a template or some help with this? Drop us a line.


Next, you need to determine what your unique identifiers are – these are data fields that have a unique number or code for each asset, whether they represent a unique customer, employee, partner, project, part or other business assets.

Usually a business will have unique identifiers such as:

  • CUSTOMER REFERENCE NUMBER
  • EMPLOYEE ID
  • CUSTOMER ORDER NUMBER
  • INVOICE NUMBER
  • TRANSACTION NUMBER
  • CUSTOMER SERVICE NUMBER
  • PART #

These unique identifiers must do what the name suggests – they must be unique for each asset that they represent. There can be no duplicates. So there needs to be a robust process, with a single source of unique identifiers, issued in your business. If you don’t have any, set them up, as you’ll need them to match and combine data sources together to generate advanced insight from your data. And using a unique identifier is by far the best way to do this.

Need a process or some help with this? Drop us a line.


Where there is a unique identifier available, data can be combined and matched across data sources. Where the unique identifier is missing, business logic can be applied instead including:

• Manual matching process – for instance, it’s possible to match business names but they could be spelt differently, so there might need to be a combination of matching rules covering the business name and the business postcode. It’s also possible to match customer names but, again, they could be spelt differently, so there might need to be a combination of matching rules covering the customer’s surname and their date of birth.

Fuzzy matching process – for instance, when matching names, you can set up fuzzy (or estimated) logic to allow different spellings to be permitted (eg John, Johnathon, Jonathon, J). Also, when matching postcodes, you can set up fuzzy logic to allow different lengths of the postcode to be permitted (eg BS1 XYZ, BS1, BS).

Master data sources – for instance, if there is a difference in data between the different data sources, business logic should be enforced to always expect one data source to be assumed to be more accurate than the other/s – this can be defined as the primary/secondary data source.

For some projects, this can be a quick process and can be delivered in hours. For a large business, with multiple data sources and legacy systems, this can take days or weeks to accomplish. And sometimes a manual process is required, particularly if the volumes are low and the matching rules are hard to define.


The next step is to extract data from each of the required data sources, transform it into a processed output and load it into a new database. This will enable you to analyse the combined dataset. This is ETL – extract, transform, load.

There are several things to consider at this stage:

a. Timeliness – does data need to be live and updated in real-time for the purpose of your data project? Or is daily/weekly/monthly sufficient?

b. Consistency – will the data be consistent for the life of your data project? Or will some data sources or data fields change?

c. Security – is the data source within your control with no security access procedures, such as internal data? Or are you connecting to a third-party data source where authentication is required?

d. Formatting – what formats are you going to use for dates, times, locations, and similar categorisation?

e. History – do you need to be able to show time-stamped data and changes in data over time?

The answers to these questions will impact how much data is extracted and stored, and how frequently. This will influence the speed and cost of your data project, both for setup and ongoing costs. So, it’s important to challenge the answers to these questions and not just do what the business has always done. You could save your business tens, hundreds, or thousands of pounds per month.

Need a template or some help with this? Drop us a line.


Now you know how to connect all your data sources together, you can now do interesting things with your combined dataset. You can create valuable business metrics including:

  • LTV – lifetime value of your customers…do you know which ones make/lose you money?
  • ROI – return on investment…do you know what your investment is delivering for your business?
  • CPA – cost per acquisition….do you know if your sales & marketing process is cost-effective?

Don’t worry. Most businesses aren’t calculating these metrics. Because their data sources are disconnected. Yet these metrics are absolutely key to you being able to track the performance of your business.

By connecting your data together, you can report on:

  • revenue/profit per client/employee/product/location
  • customer lifetime value
  • ranked clients by revenue/profit
  • and much more

This will enable you to power your business performance and make smarter, quicker decisions. Ultimately helping you to make or save more money.

Need a template or some help with this? Drop us a line.


Are you struggling to connect your data?

Don’t worry. We can help you. Businesses we work with have improved their business results by a minimum of 10% using data and see a minimum of 5 x ROI on their investments into data.

Visit data-cubed.co.uk, schedule a call with us or email us at hello@data-cubed.co.uk today.