Data transformation is defined as the process of converting data from one format to another. Typically, this will take place as data is moved from one system to another and is commonly a key component of Business Intelligence (BI) systems.
Data transformation can be one of four variants:
We’ve recently blogged about Data Warehouses, and there is a crossover between the two. Most data will undergo some degree of data transformation as it moves from its source, a transactional database for example, to a data warehouse where it is then ready for analysis.
The purpose of data transformation is to ensure that data from disparate sources can be aggregated into common formats, structures and values that are more easily readable by a system or user. In the context of data warehouses, data transformation is the process that allows data from different sources to fit into the designated schema of the warehouse.
How data transformation will work depends on the infrastructure of the organisation as well as the system of which it is a component.
The most common form of data transformation, the Extract, Transform, Load (ETL) process, sources and transforms data from one location to a data warehouse. As the name suggests, data transformation is the middle part of this process.
With more organisations using cloud-based data warehouses, some are transforming the data at the point of query, and so the process changes to Extract, Load, Transform (ELT). The scalability of the cloud allows raw data to be loaded into the data warehouse, although this is not always best practice.
An oversimplified version of data transformation would look like:
Data discovery assesses which data from which source needs to be extracted and transformed. Data mapping is where the actual transformation process is planned out, including the destination; usually a data warehouse. Code is then generated to automatically transform the data, which is then reviewed before the job is done.
Each stage of this process will likely split down into further stages, where data can be filtered, enriched, indexed, anonymised, encrypted, formatted and renamed. How much of that is necessary will depend on the individual data map and then end goal of the entire process.
Put very simply, data transformation is an essential process for any data analytics system. Without it, the data quality would be compromised, and systems could produce errors through null values, incorrect indexing and incompatible formats.
Transformed data is generally cleansed and checked for quality at the same time, so it is another failsafe to ensure that any decisions made from it are sound.
Instead of considering whether the benefits of data transformation are worth it, it should instead be viewed as an essential component of a larger process. Data analytics can’t happen without data transformation for example. It is simply one part of the journey that data goes on to provide your business with better insights and more clarity.