Concepts Comparison in Data Analytics

Eva Wang
May 26, 2021
3 min read

During work and study, I learned that comparison is a good technique for learning new concepts/techniques and understanding nuances. If you can tell others the definition of a term but could not explain the differences between that term and another similar one, you don't really understand them. Recently I came across some concepts related to data analytics. They could be confusing, so I organize the comparisons as below.

Structured and Unstructured Data

Structured data has a fixed format. It is usually saved in a relational database or data warehouse. No matter whether the data is text or number, it is easy to search and analyze.

Unstructured data can be in various formats. It needs more work to process and understand. Data from social media usually is unstructured.

Data Warehouse and Data Lake

A data warehouse is a large collection of organizational data for reporting and data analytics. The data is usually cleaned and saved in dimensional format to facilitate slice and dice. To ensure data quality, ETL is often done before data moving into a data warehouse.

Data Lake is a system to store raw data. Data lakes are born with the need to store big data, low granular, and unstructured data for machine learning. Therefore, data Lakes are usually used by data scientists, while data warehouses are usually used by business users.

OLAP and OTLP

Online transaction processing (OLTP) captures, stores, and processes data from transactions in real-time, so OLTP is a system helping manage transaction-oriented applications.

Online analytical processing (OLAP) uses complex queries to analyze aggregated historical data from OLTP systems. OLAP is a system helping analyze data saved in multidimensional format.

OLTP can modify systems, while OLAP could not query answers from existing systems.

Data Model, Data Modeling, and Data Mining Model

Data model describes how data is supposed to reflect reality, be structured, and interact with other system components. You can create a data model without any data in it. You could understand that a data model is a structure only if you want to interpret it in a simple way.

Data modeling is the process of creating a model for an information system using certain techniques. Data modeling builds a structure to connect data points.

Data mining model is the result of the data analysis process that analyzes a subset of data and produces useful information and insights about a subject matter. Data mining model must be used with data, as a data mining model has two processes: 1) training/building the model; 2) applying the model to new data.

Star Schema and Dimensional Model

Star schema is the simplest form of a dimensional model, with data organized into facts and dimensions. Because the structure looks like a star, so it is called a star schema.

A dimensional model is a database structure that is optimized for online queries and data warehousing tools. So, a star schema is a type of dimensional model.

Data Analysis and Data Analytics

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.

Data Analytics is a broader term denoting an area of human endeavor to analyze data, transform data into information, and transform information into insight. Data Analysis is an important component of data analytics.

Schema and Metadata

A schema is a collection of database objects associated with a database. A schema is like a blueprint of how the database is constructed.

Metadata is data that describes other data. Metadata summarizes basic information about data, which can make finding and working with particular instances of data easier.

Metadata is "data about data", while schema is the structure/layout of the system.

Concepts Comparison in Data Analytics

Recent Posts

Comments

Subscribe Form