#4 Let’s design the technology stack

This blog is one of an eight-part series of blogs – read our introduction to see how this blog fits into the series.

Now that you’ve understood your business requirements for data, what data sources you use today, and how data is presented, visualised, and shared today, you can consider how this can evolve and develop in future:

What new things need to be set up?
What things needs to be changed?
And what things need to be stopped?

No matter how scary this might seem, it will never be easier to sort out your data challenges, and tackle your data opportunities, than today. This is because the one truth, when it comes to data, is that your data problem is only getting bigger…as you’re collecting more and more data every second/hour/day, so any efficiencies or gaps are only getting bigger every second/hour/day. So, don’t delay – today is the easiest day it will ever be to sort out your data.

What is the Art of the Possible?

First, we always start with the ultimate end goal – the big vision for collecting, processing, visualising, and sharing data within your organisation. Depending on the business requirements, this could include:

The use of best-in-class technology
The ability to automate all your manual processes end-to-end
The ability to offer new data-driven services to your clients
The ability to incorporate artificial intelligence into your business
The ability to monetise your data in a compliant and customer-friendly way
And the ability to lead your market and create unique competitive advantages

This vision is a great way to inspire the team and give everyone a clear view on where the business could end up in future. It excites people. It energises them. It provokes discussion and debate. And, most of all, it sets up the story for the detailed technology stack recommendations that come next.

In the cloud. And, in the UK, there are three main options for cloud hosting including:

Amazon Web Services (AWS) – this is the highest rated by independent experts like Gartner and comparison sites like G2, it has the highest market share (c. 60%), it’s great value for money, and it’s the most intuitive to use of the three leading cloud providers listed here
Microsoft Azure – for Microsoft fans, this is often the first choice for businesses who want to maintain everything within the Microsoft suite, but tends to be more expensive and less easy to use
Google Cloud Platform (GCP) – this has the lowest market share (9%), but it’s backed by Google and offers good support

Which one is right for your business or project, will depend on:

Whether you have any of these services setup already – if you have, it may make sense to stick with what you’ve got
Your business requirements – you should compare each provider’s capabilities, costs, pros, and cons, to inform your decision

How could you extract the data from all the Data Sources?

Most organisations will have 20+ data sources that they may want to connect to analyse and report on business patterns and trends. So, how can you connect 20+ data sources in the most efficient way? There are two options:

Use an Extract, Transform & Load (ETL) tool – this is a low-code approach and will enable a team of non-programmers to be able to connect data sources, receive automated alerts when there is a problem, send data wherever required, and make changes in future. But these tools have ongoing licence costs. So, whilst they are quicker to set up, and can be supported in-house by non-coders, they incur external spend ongoing in future. ETL tools include Matillion and Fivetran which we rate highly.
Create bespoke code to perform the ETL process for you – this is a hand-crafted approach requiring programmers to connect data sources, send data wherever required, and make changes in future. Whilst more technical, time-consuming, and resource-intensive to set up, there are no third-party licence fees in future. Although the benefit of no licence fees needs to be weighed against a solution that will always require expensive programmers for maintenance and future development. Code can be developed in Python, which is our preferred language.

Which one is right for your business will depend on:

Your technology strategy – do you have a buy vs build strategy? Or vice versa?
The skills currently available in-house and the plan for future recruitment – for instance, if you don’t have Python developers today, then a bespoke coding approach might be too big a jump to make.
Your business requirements – for instance, if there is a business requirement for low-code, self-serve tools that can be supported in-house in future, then an ETL tool is probably the best approach.

Once you’ve extracted, transformed, and loaded all the data, from the 20+ data sources, where does it all go? We usually consider a few options:

A data lake – this is a great place to send tons of unprocessed data to, as it’s low cost, but it doesn’t allow you to organise your data and prepare it for the creation of data outputs like reports and analytics.
A database – this is a great place to send data to when there are smaller data volumes.
A data warehouse – this is a great place to organise data, process it, and make it ready for data analytics, reporting and data science.

Which one is right for your business will depend on:

The volume of data involved – some data solutions are specifically designed for ‘big data’ where there are high volumes, high frequency, and/or high complexity of data.
The latency of data required – some data solutions are specifically designed to perform better when there is a need for real-time data access and minimal latency.
The skills currently available in-house and the plan for future recruitment – for instance, if you have experts in certain data solutions already, you may want to stick with those solutions to remove the need for re-training or recruitment.
Your business requirements – specifically, how your data requirements may change in future, in terms of volumes, frequency, latency and similar. Whilst you may not need a data warehouse today, you might want to set one up in anticipation of future business requirements.

Where do you begin?

Start with a pilot project. It can feel overwhelming to think about 20+ data sources and get all the data connected, processed, and combined into multiple reports, analytics, and data science outputs. So, start with one. Choose a mini project – this could be for a high priority business area, it could be for an area of the business that is simple and self-contained, or it could be in an area where you have the best stakeholder engagement and support. Choose one:

Focus on one data output (eg a Board report)
Setup and configure the technology stack you’ve selected – for instance, an ETL tool and data warehouse
Connect only the data you need for that one report, from 1-2 data sources
Prepare the data for modelling and visualisation
Create the first report
Test, iterate, then repeat for other business areas

This process means you can deliver valuable business benefits quickly, in weeks. This will prove invaluable to win the trust and confidence of your stakeholders. You can then build on this as you connect more data sources and deliver more data solutions.

Now that you have designed the technology stack options, it’s time to design the reporting solution. So, check out our next blog in this series for some simple tips on how to compare, contrast and select the right reporting tool for your business.

Do YOU need an independent, objective review of your technology stack?

Well, you’re in the right place. We can run the Discovery & Design programme for your business. The benefits of outsourcing to us are:

OBJECTIVITY – we bring a fresh pair of eyes to your business and we’re unhindered by office politics, historical decisions, and legacy systems
INDEPENDENCE – we’re technology-agnostic, so we can give you an independent view, with no vested interest in you selecting, or staying with, a certain vendor, tool, or platform
AWARD-WINNING DATA CONSULTANTS – we’ve done this before…for 75+ projects and for 50+ businesses, so we can bring our wider experience to the mix

When we run a Discovery & Design programme for one of our clients, it typically takes 4 weeks and costs £9,950, depending on the scope of the project. Most businesses want results quickly and simply…so that’s what we do – we worry about the complexity, so you don’t have to.

Schedule a call with us for a free initial chat to see if/how/when we can help you to fast-track your data transformation or find out more at https://data-cubed.co.uk/services/.