In a large company, with large databases or large systems for separate functions, such as manufacturing, sales, and accounting, special capabilities and tools are required for analyzing vast quantities of data and for accessing data from multiple systems. These capabilities include data warehousing, data mining, and tools for accessing internal databases through the Web.
A data warehouse is a repository of multiple heterogeneous data sources organized under a unified schema at a single site to facilitate management decision making. A data warehouse is a subject-oriented, integrated, time-variant and nonvolatile collection of data in support of management‘s decision-making process.
- Subject-Oriented: A data warehouse can be used to analyze a particular subject For example, "sales" can be a particular subject.
- Integrated: A data warehouse integrates data from multiple data For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.
- Time-Variant: Historical data is kept in a data For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.
- Non-volatile: Once data is in the data warehouse, it will not So, historical data in a data warehouse should never be altered.
A data warehouse is a repository of current and historical data of an organization that are organized to facilitate reporting and analysis. The data originate in many core operational transaction systems, such as systems for sales, customer accounts, and manufacturing, and may include data from Web site transactions. The data warehouse consolidates and standardizes information from different operational databases so that the information can be used across
the enterprise for management analysis and decision making. Figure below illustrates how a data warehouse works. The data warehouse makes the data available for anyone to access as needed, but it cannot be altered. A data warehouse system also provides a range of ad hoc and standardized query tools, analytical tools, and graphical reporting facilities. Many firms use intranet portals to make the data warehouse information widely available throughout the firm.
How does a data warehouse differ from a database?
There are a number of fundamental differences which separate a data warehouse from a database. The biggest difference between them is that most database place an emphasis on a single application, and this application will generally be one that is based on transaction. If the data is analyzed, it will be done within a single domain. In contrast, data warehouses deal with multiple domains simultaneously.
Because data warehouse deals with multiple subject areas, the data warehouse finds connections between them. This allows the data warehouse to show how the company is performing as a whole, rather than in individual areas.
Another powerful aspect of data warehouse is their ability to support the analysis of trends. They are not volatile, and the information stored in them doesn‘t change as much as it would in a common database. Some of the major differences between them are listed below:
1. In database tables and joins of different tables are complex since they are normalized for RDBMS. This is done to reduce redundant data and to save storage space.
2. Entity Relational modeling techniques are used for RDBMS database design.
3. Performance is low for analysis queries.
4. Database is the place where the data is taken as a base and managed to get available fast and efficient access.
5. Optimized for write operation.
6. Used for Online Transaction Processing (OLTP) but can be used for other purpose such as data warehousing. This records the data from the user for history.
1. In data warehouse tables and joins are simple since they are de- normalized. This is done to reduce the response time for analytical queries.
2. Data modeling techniques are used for Data Warehouse design.
3. High performance for analytical queries
4. Data warehouse is the place where the application data is managed for analysis and reporting purpose.
5. Optimized for read operations.
6. Used for Online Analytical Processing (OLAP). This reads the historical data for the users for business decision.
Data mart is a database that contains a subset of data present in a data warehouse. We can divide a data warehouse into data marts after the data warehouse has been created. A data mart is a subset of a data warehouse in which a summarized or highly focused portion of the organization‘s data is placed in a separate database for a specific population of users. For example, a company might develop marketing and sales data marts to deal with customer information. A data mart typically focuses on a single subject area or line of business, so it usually can be constructed more rapidly and at lower cost than an enterprise-wide data warehouse.
Reasons for creating a data mart
- Creates collective view by a group of users
- Easy access to frequently needed data
- Ease of creation
- Improves end-user response time
- Lower cost than implementing a full data warehouse
- Potential users are more clearly defined than in a full data warehouse
- Contains only business essential data and is less cluttered