The difference between Transforming and Formatting Data
Google organizes Data Analysis into these seven stages: ask, prepare, process, analyze, share, and act. Within the “Analysis” step of the process we have four phases: 1. Organize data 2. Format and adjust data 3. Get input from Others 4. Transform data.
I don’t know if you are like me, but I tend to confuse formatting data with tranforming data, because they overlap especially when it comes number formatting.
Formatting Data is the act of structuring data tables in formats that are consistent, easily readable and understandable, and that are structured within the context of our Analysis Goals. Transforming data applies to performing calculations on our data in order to get a focused result for our Analysis.
Formatting & adjusting data includes the following steps:
- Table properties such as well-defined borders, readable and sticky headers, column names that are self-explanatory, columns that make sense in their sequence,
- Ordering Data to adjust to the analysis context, in ascending or descending order
- Combining cell values and strings to combine them to make more sense using functions with the help of LIMIT, CONVERT, ROUND, JOIN, CONCAT
- Filtering data with Pivot Tables, Conditional Formatting
- Unifying Date formats, currency, address formats,
- Changing numerics, strings, and dates, such as using the CAST() and CONVERT() functions
- Transforming Data includes the following actions:
- Making calculations such as counting, conditional counting, sums, etc and placing the results in new columns or temporary tables, using Aliasing and creating Summary Tables
- Finding Maximums and Minimums, Averages,
- Using Conditional formatting to highlight calculations
- Using Pivot Tables to perform and summarize calculations
- Perform Data aggregation using JOIN()(although this is also a Data Formatting action),