Bài giảng nhà kho dữ liệu (Data warehouse)
Bài giảng bằng powerpoint bao gồm khái niệm chung, khái niệm cơ bản, thiết kế triển khai, thách thức khi xây dựng data warehouse, vận hành 1 data warehouse và Oracle data warehouse
25 trang |
Chia sẻ: lvcdongnoi | Lượt xem: 3881 | Lượt tải: 1
Bạn đang xem trước 20 trang tài liệu Bài giảng nhà kho dữ liệu (Data warehouse), để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Lesson 4 Data Warehouse Architecture and Models Objectives Differentiate between an enterprise-wide data warehouse and localized data marts Recognize the difference between independent and dependent data marts Identify the data that is stored in a data warehouse Explain the features of each type of data by examining where and why it is used List the data models that may already exist in a company and describe where they may be useful to the warehouse model Explain the two common data warehouse models Warehouse Architectures Enterprise-wide solution Data mart solution Combined solution Enterprise Data Warehouse Solution Large warehouse containing all business data Incremental implementation First increment provides proof-of-concept Funded at corporate level Data Mart Solution Independent Use a consistent approach Avoid disjointed development Consider the big picture Dependent Localization Subsets of summary data Oracle Data Mart Suite for a pre-configured solution Funded departmentally Independent Dependent Create a Project Team Staff with experts Database designers Database administrators Network specialists ETT specialists Project managers Consider training plans Identify the Data Requirements Successful warehouses provide the right information Analyze business users’ needs Interview users Examine data needed Ascertain data availability Determine data frequency Decide the refresh cycle Types of Warehouse Data Fact data - Measures Dimension data - Query drivers Summary data - Pre-calculated data Fact Data and Tables Many fact tables in the warehouse The bulk of the warehouse data Measures (units sold, sales figures, calls) Millions of rows Multi-part primary keys Summaries Normalized data Partitioning Data Breaking tables into smaller units Horizontal or vertical Horizontal Partitioning Time Region Geography Organization Line of business Vertical Partitioning Vertical partitioning by column Oracle servers support both types of partitioning Dimension Data Provides query criteria Links to fact tables with keys Needs update strategy Dimension Tables Are determined by user requirements Vary in number May contain hierarchies May share fact tables Time Dimension Important required dimension Contains special dates Provides flexible and accurate analysis by time Summary Data Aggregated facts Lightly summarized Highly summarized Cumulative or rolling data Key columns to dimensions Summary Tables Provide immediate answers to a query Improve query performance Requirements change and need managing Modeling Define the model Use graphical modeling tools Use a tool capable of prototyping Proof of concept essential New design techniques The Enterprise Model Defines an overall scope A good start point for analysis An information framework for the warehouse A guide to integration Never use directly for the warehouse The Corporate Data Model Current operational data structures Source of mapping rules for the warehouse Never use directly for the warehouse A Typical Modeling Approach 1. Analyze the subject area. 2. Gather the requirements. 3. Develop the models. 4. Map the entities, attributes, and relationships. 5. Re-engineer the source data. 6. Design the database. 7. Integrate the model into the warehouse architecture repository. 8. Review with client and revise. Oracle Data Warehouse Method supports an iterative approach. Physical model Denormalized Fast query response Flexible design Many query tools Star Model Snowflake Model Sales Fact Table Item_id Store_id Sales_dollars Sales_units Store Table Store_id Store_desc District_id Item Table Item_id Item_desc Dept_id Time Table Week_id Period_id Year_id Totsales Month_id Store_id Item_id Total_dollars District Table District_id District_desc Dept Table Dept_id Dept_desc Mgr Table Dept_id Mgr_id Snowflake Model A normalized star schema Easier to model requirements Flexible dimension structures Readily maps to existing data Used directly by tools Database servers Star queries Star joins VLDB support Which Model Do you Use? Design for simplicity Design for relevance Simple star model Flexible snowflake model Summary Enterprise data warehouses Data marts Independent Dependant Warehouse data Data models Models compared