100 Top Data Warehouse Job Interview Questions and Answers

Data Warehouse Interview Questions with Answers:-

1. Define a data warehouse?

A data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management’s decision-making process.

2. Define what is Data warehousing?

A Data warehouse is the repository of a data and it is used for Management decision support system. Data warehouse consists of a wide variety of data that has a high level of business conditions at a single point in time.

In a single sentence, it is a repository of integrated information which can be available for queries and analysis.

3. Define what is Business Intelligence?

Business Intelligence is also known as DSS – Decision support system which refers to the technologies, application, and practices for the collection, integration, and analysis of the business-related information or data. Even, it helps to see the data on the information itself.

4. What is the Dimension Table?

Dimension table is a table which contains attributes of measurements stored in fact tables. This table consists of hierarchies, categories, and logic that can be used to traverse in nodes.

5. Define what are the stages of Datawarehousing?

There are four stages of Datawarehousing:

  1. Offline Operational Database
  2. Offline Data Warehouse
  3. Real-Time Datawarehouse
  4. Integrated Datawarehouse

6. Define what is Data Mining?

Data Mining is set to be a process of analyzing the data in different dimensions or perspectives and summarizing into useful information. Can be queried and retrieved the data from the database in their own format.

7. Define what is OLTP?

OLTP is abbreviated as On-Line Transaction Processing, and it is an application that modifies the data whenever it received and has a large number of simultaneous users.

8. Define what is OLAP?

OLAP is abbreviated as Online Analytical Processing, and it is set to be a system which collects, manages, processes multi-dimensional data for analysis and management purposes.

9. What is the difference between OLTP and OLAP?

Following are the differences between OLTP and OLAP:

OLTP :

  • Data is from the original data source
  • Simple queries by users
  • Normalized small database
  • Fundamental business tasks

OLAP :

  • Data is from various data sources
  • Complex queries by system
  • De-normalized Large Database
  • Multi-dimensional business tasks

10. Define what is ODS?

ODS is abbreviated as Operational Data Store and it is a repository of real-time operational data rather than long term trend data.

11. Define what is the difference between View and Materialized View?

A view is nothing but a virtual table which takes the output of the query and it can be used in place of tables.

A materialized view is nothing but indirect access to the table data by storing the results of a query in a separate schema.

12. What is ETL?

ETL is abbreviated as Extract, Transform and Load. ETL is a software which is used to reads the data from the specified data source and extracts a desired subset of data. Next, it transforms the data using rules and lookup tables and converts it to the desired state.

Then, load function is used to load the resulting data to the target database.

13. Define what is VLDB?

VLDB is abbreviated as Very Large Database and its size is set to be more than one terabyte database. These are decision support systems which are used to serve a large number of users.

14. Explain what is real-time data warehousing?

Real-time data warehousing captures the business data whenever it occurs. When there is business activity gets completed, that data will be available in the flow and become available for use instantly.

15. Define what are Aggregate tables?

Aggregate tables are the tables which contain the existing warehouse data which has been grouped to a certain level of dimensions. It is easy to retrieve data from the aggregated tables than the original table which has more number of records.

This table reduces the load in the database server and increases the performance of the query.

16. What is factless fact tables?

Factless fact tables are the fact table which doesn’t contain numeric fact column in the fact table.

17. How can we load the time dimension?

Time dimensions are usually loaded through all possible dates in a year and it can be done through a program. Here, 100 years can be represented with one row per day.

18. Define what are Non-additive facts?

Non-Addictive facts are said to be facts that cannot be summed up for any of the dimensions present in the fact table. If there are changes in the dimensions, the same facts can be useful.

19. What is confirmed fact?

Confirmed fact is a table which can be used across multiple data marts in combined with the multiple fact tables.

20. Define Datamart?

A Datamart is a specialized version of Data warehousing and it contains a snapshot of operational data that helps the business people to decide with the analysis of past trends and experiences. A data mart helps to emphasize on easy access to relevant information.

21. What is Active Data warehousing?

An active data warehouse is a data warehouse that enables decision makers within a company or organization to manage customer relationships effectively and efficiently.

22. Difference between Data warehouse and OLAP?

A data warehouse is a place where the whole data is stored for analyzing, but OLAP is used for analyzing the data, managing aggregations, information partitioning into minor level information.

23. What is the ER Diagram?

An ER diagram is abbreviated as Entity-Relationship diagram which illustrates the interrelationships between the entities in the database. This diagram shows the structure of each table and the links between the tables.

24. What are the key columns in Fact and dimension tables?

Foreign keys of dimension tables are the primary keys of entity tables. Foreign keys of fact tables are the primary keys of the dimension tables.

25. Define what is SCD?

SCD is defined as slowly changing dimensions, and it applies to the cases where record changes over time.

DATA WAREHOUSE Questions pdf free download::

26. Define what are the types of SCD?

There are three types of SCD and they are as follows:

SCD 1 – The new record replaces the original record

SCD 2 – A new record is added to the existing customer dimension table

SCD 3 – A original data is modified to include new data

27. Explain what is BUS Schema?

BUS schema consists of a suite of confirmed dimension and standardized definition if there is a fact table.

28. Define what is Star Schema?

Star schema is nothing but a type of organizing the tables in such a way that result can be retrieved from the database quickly in the data warehouse environment.

29. Define what is Snowflake Schema?

Snowflake schema which has primary dimension table to which one or more dimensions can be joined. The primary dimension table is the only table that can be joined with the fact table.

30. Define what is a core dimension?

Core dimension is nothing but a Dimension table which is used as dedicated for a single fact table or data mart.

31. Define what is called data cleaning?

Name itself implies that it is a self-explanatory term. Cleaning of Orphan records, Data breaching business rules, Inconsistent data and missing information in a database.

32. What is Metadata?

Metadata is defined as data about the data. The metadata contains information like a number of columns used, fix width and limited width, ordering of fields and data types of the fields.

33. Define what are loops in Data warehousing?

In data warehousing, loops are existing between the tables. If there is a loop between the tables, then the query generation will take more time and it creates ambiguity. It is advised to avoid loop between the tables.

34. Whether Dimension table can have numeric value?

Yes, the dimension table can have numeric value as they are the descriptive elements of our business.

35. Define what is the definition of Cube in Data warehousing?

Cubes are a logical representation of multidimensional data. The edge of the cube has the dimension members, and the body of the cube contains the data values.

36. Define what is called Dimensional Modelling?

Dimensional Modeling is a concept which can be used by Dataware house designers to build their own data warehouse. This model can be stored in two types of tables – Facts and Dimension table.

The fact table has facts and measurements of the business and dimension table contains the context of measurements.

37. Define what are the types of Dimensional Modeling?

There are three types of Dimensional Modeling and they are as follows:

  • Conceptual Modeling
  • Logical Modeling
  • Physical Modeling

38. Define what is the surrogate key?

A surrogate key is nothing but a substitute for the natural primary key. It is set to be a unique identifier for each row that can be used for the primary key to a table.

39. Define what is the difference between ER Modeling and Dimensional Modeling?

ER modeling will have a logical and physical model but Dimensional modeling will have only Physical model.

ER Modeling is used for normalizing the OLTP database design whereas Dimensional Modeling is used for de-normalizing the ROLAP and MOLAP design.

40. What are the steps to build the data warehouse?

Following are the steps to be followed to build the data warehouse:

  1. Gathering business requirements
  2. Identifying the necessary sources
  3. Identifying the facts
  4. Defining the dimensions
  5. Defining the attributes
  6. Redefine the dimensions and attributes if required
  7. Organize the Attribute hierarchy
  8. Define Relationships
  9. Assign unique Identifiers

41. Define what are the different types of data warehousing?

Following are the different types of Datawarehousing:

  • Enterprise Data warehousing
  • Operational Data Store
  • Data Mart

42. Define what needs to be done while starting the database?

Following need to be done to start the database:

  1. Start an Instance
  2. Mount the database
  3. Open the database

43. Define what needs to be done when the database is shut down?

The following needs to be done when the database is shut down:

  • Close the database
  • Dismount the database
  • Shutdown the Instance

44. Can we take a backup when the database is opened?

No, We cannot take a full backup when the database is opened.

45. What is defined as Partial Backup?

A Partial backup in an operating system is a backup short of a full backup and it can be done while the database is opened or shutdown.

46. Define what is the goal of Optimizer?

The goal of Optimizer is to find the most efficient way to execute the SQL statements.

47. Define what is the Execution Plan?

Execution Plan is a plan which is used to the optimizer to select the combination of the steps.

48. What are the approaches used by Optimizer during execution plan?

There are two approaches:

  1. Rule-Based
  2. Cost-Based

49. What are the tools available for ETL?

Following are the ETL tools available:

  1. Informatica
  2. Data Stage
  3. Oracle
  4. Warehouse Builder
  5. Ab Initio
  6. Data Junction

50. Define what is the difference between metadata and data dictionary?
Metadata is defined as data about the data. But, Data dictionary contains information about the project information, graphs, abinito commands, and server information.

51. Explain what is Fact Table?

Fact table contains the measurement of business processes, and it contains foreign keys for the dimension tables.

Example – If the business process is the manufacturing of bricks

The average number of bricks produced by one person/machine – a measure of the business process

DATA WAREHOUSE Interview Questions

DATA WAREHOUSE Faqs::