MCA 202, Data Warehousing & Data Mining
UNIT-1
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
1
Learning Objective • Escalating need for strategic information • Building blocks of data warehouse • Defining the business requirements
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
2
Why do enterprise really need data warehouses?
•
Operational computer
•
Information to run day to day business Event driven Not directly suitable for review from different point
Executives
Different kind of information for Strategic decisions eg which product line to expand, which market should be strength Trend over time Review – Sales quantities by product, salesperson, region etc.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
3
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.1
MCA 202, Data Warehousing & Data Mining
Organizations’ use of data warehousing • Retail
• Manufacturing
Customer loyalty Market planning
Cost reduction Logistics management
• Financial
• Utilities
Risk management Fraud detection
• Airlines
Asset management Resource management
• Government
Route profitability Yield management
Manpower planning Cost control
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
4
Escalating Need for strategic information
• Failures of Past decision-support systems, • Operational versus decision-support systems • Data warehousing – the only viable solution
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
5
U1.
6
Need for strategic information • • •
• • •
After 1990s,business grew more complex. Corporate spread globally More competition is there Operational systems did provide info. To run day-to-day operations but managers,executives needed diff. Kinds of info. That could be used to make strategic decisions. DW is a new paradigm specifically intended to provide vital strategic info. Why do enterprises really need dw? ESCALATING NEED FOR STRATEGIC INFO. The executives & managers who are responsible for keeping the enterprise competitive need info. to make proper decisions.they need info to formulate the business strategies ,establish goals ,set objectives & monitor results.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.2
MCA 202, Data Warehousing & Data Mining
Escalating need for Strategic Information • Who needs strategic information in an enterprise? Executives and managers To make proper decision For keeping the enterprise competitive To formulate and execute business strategies Establish goals, Set objectives Monitor results.
• What exactly information?
do
we
mean
by
strategic
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
7
Some business Objectives • Retain the present customer base • Increase the customer base by 15% over the next 5 years. • Bring new product in 2 yrs • Improve product quality levels in top 5 product group • Gain market share by 10% in next 3 years • Increase sale by 10% in East division
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
8
Cont.. • For making business objectives managers needs information for the following purpose: depth knowledge of company’s operations. Monitor how the business factor change over time. Compare company’s performance relative to competition and industry bench marks.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
9
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.3
MCA 202, Data Warehousing & Data Mining
Strategic information •
Executives and managers
•
This type of information needed to make decisions in formulation and execution of business strategies and objectives :
• •
need to focus their attention on customers’ need and preferences, emerging technologies, sales and marketing results, quality levels of product and services.
All these essentials information in one group is called Strategic Information
Strategic information is not for running the day to day operations of the business. It is important for the continued growth and survival of corporation.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
10
Characteristics of Strategic Information • Integrated • Must have a single, enterprise wide view
• Data Integrity • Information must be accurate and must conform to business rule.
• Accessible • Easily accessible with responsive for analysis.
intuitive
access
path
and
• Credible • Every business factor must have one and only one value.
• Timely • Information must be available with in the stipulated time frame.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
11
U1.
12
Escalating need for strategic information • • • •
Information Crisis Technology trends Opportunities and risks Failure of past decision support systems
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.4
MCA 202, Data Warehousing & Data Mining
Information Crisis. • In IT Dept. of big or small organization. various computer applications in company. data bases and the Quantities of data that support the operation of company. • How many year’s worth of customer data is saved and available? • How many years’ worth of financial data is kept in storage? 10years or 15 years • Where is all this data ? On one platform? In legacy systems? In Client/server applications?
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
13
Information Crisis cont.. • Facts faced by organization Organizations have lots of data. IT systems are NOT effective at turning all the data into useful strategic information.
• In organization we have lot of data, then why executives and managers uses this data for making strategic decisions? Information Crisis Data available not accessible Old technology/different platform
For proper decision making on over all corporate strategies and objectives Information integrated from all systems. Data needed for strategic decision making must be in a format suitable for analyzing trends.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
14
U1.
15
Technology Trends Computing Technology Main Frame
Mini
PC | Networking
Client/Server
Human/Machine Interface Punch Card
Video Display
GUI
VOICE
Processing Options Batch
1950
Online
60
70
Networked
80
90
2000
Growth of Information Technology
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.5
MCA 202, Data Warehousing & Data Mining
Opportunities and Risks • Examples of the opportunities made available to companies through the use of strategic information: • A community- based pharmacy competes on a national scale with more than 800 franchised pharmacies coast to coast gains in-depth understanding of what customers buy, reduced inventory levels,
improved effectiveness of promotions and marketing campaigns improved profitability for the company.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
16
Opportunities and Risks cont.. • Consider the cases where risks and threats of failures existed before strategic information was made available for analysis and decision making. • Example: • For a world leading supplier of systems and components to automobile and light truck equipment manufacturer across nearly 100 plants, inability to benchmark quality matrices and time consuming manual collection of data. Reports needed to support decision making tool weeks. Not easy for company to get company wide integrated information
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
17
Failures of Past Decision Support System • A marketing department is concern about performance of the west cost region. The marketing Vice President wants to get some reports from the IT department to analyze the performance over the past two years, Product by Product, and compared to monthly targets. CEO wants to deliver as soon as possible to manager and manager immediately go to the sub ordinate, to give marketing report. There is no report available gather the data from multiple application (different platform) and start from scratch These reports lacks the actual agenda, which causes in consistencies among the data obtained from different applications.
It is also possible the person from IT dept. create a report from single application for his/her convenience, so such information may not be helpful in strategic decisions making.
So, from the scenario we come to know that when information is scattered in different places with forms, it is difficult to use the available information in strategic Decisions. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
18
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.6
MCA 202, Data Warehousing & Data Mining
Operational Vs Decision Support Systems • The fundamental reason for the in ability to provide strategic information is that we have been trying all along to provide strategic information from the operational systems. • These operational systems such as order processing, inventory control, claims processing, out patient billing , and so on are not designed or intended to provide strategic information.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
19
Cont.. • Making the Business Turn • Get data in
wheels
of
Take an order Process a claim Make a shipment Generate an invoice Receive cash Reserve an air line seat
• Operational systems support the basic business processes of the company Day to day business
• Watching the wheels Business Turn • Get information out
of
Shows the top-selling products. Shows the problem region. Shows the highest margins Alert whenever a district sells below target.
Decision Support Systems (DSS) run the core business processes. No immediate payout DSS systems are developed to get str. Info out of the data base where as OLTP systems are designed to put the data into database
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
20
History of decision support systems # Ad-Hoc Reports• This was the earliest stage • Users would send the request the IT dept. for special reports. • IT would write special program typically one for each request, and produce the ad Hoc reports. # special Extract Programs• That stage was attempt by IT to anticipate the reports that would be requested from time to time. • IT would write a suit of programs and run the programs periodically to extract the data from various applications • IT would create and keep the extract files to fulfill any request for special reports.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
21
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.7
MCA 202, Data Warehousing & Data Mining
Cont.. # Small Applications • In this Stage It formalized the extract process • Create simple application based on extracted files. • User could specify the parameters for each special report. • The Report printing programs would prints the reports based on user-specified parameters # Information Center • In early 1970s,Major corporations created Information centers. • Information center, User could go to request ad hoc reports or view special reports on screen. • These were predetermined reports or screens. • IT personnel were there to help the users to obtain desired information.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
22
Cont.. # Decision Support Systems • In this Stage, Companies began to build more sophisticated systems to provide strategic information. • Systems were menu driven and provided on line information. • Systems were supported by extracted files. • User could specify the parameters for each special report. • Ability to print the reports.
# Executive Information systems • • • • • •
This was first attempt to bring the strategic information to the executive desktop. Systems were designed to display key info. every day. Straight forward reports. Only preprogrammed screens and reports were available. It was not possible to see analysis by region, by product, or by any dimension unless such break downs were already programmed. This limitations caused frustration and executives info. Systems did not last long in many companies.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
23
What is basic reason for failure of all previous attempts by IT to provide strategic information?
• The fundamental reason for the inability to provide strategic information is that we have been trying all along to provide strategic information from Operational systems. • These info. Sys. Like order processing, inventory control, claims processing etc. are not designed to provide strategic information. • We must get info. from different type of systems, only special designed decision support systems can provide strategic information.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
24
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.8
MCA 202, Data Warehousing & Data Mining
Typical OLAP Operations Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, or introducing new dimensions
Slice and dice: project and select
Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes.
Other operations drill across: involving (across) more than one fact table drill through: through the bottom level of the cube to its back-end relational tables (using SQL)
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
25
Data Ware housing - The only viable Solutions • Need for different types of DSS to provide Strategic information. for analysis, discerning trends monitoring performance. • Escalating Need for strategic information data ware housing is the only viable solution for providing Strategic information
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
26
New System Environment •
Desirable features and processing requirements of new type of system environment.
Data Base designed for analytical tasks. Data from multiple applications. Easy to use and Conducive to long interactive sessions by users. Content updated periodically and stable Content to include current and historical data Ability for users to run queries and get results online. Ability for users to initiative reports.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
27
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.9
MCA 202, Data Warehousing & Data Mining
Processing Requirements in the New Environment •
New environment for strategic information analytical 4 levels of analytical processing requirements
• • • • •
are
Running of Simple queries and report against current and historical data. Ability to perform “What if “ Analysis in many different ways. Ability to Query, step back, analyze, and then continue to process to any desired length. Spot historical trends and apply them for future results.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
28
Business Intelligence at the data Ware House Extraction, Cleansing, aggregation Operational Systems Data Transformation Basic Business Processes
Key Measurements, Business dimensions.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
29
Definition • Data warehouse is an information environment • Provides an integrated and total view of the enterprise • Makes the enterprise current and historical information easily available for decision making • Make decision support transaction possible without hindering operational system. • Renders organization’s information consistent • Present a flexible and interactive source of strategic information
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
30
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.10
MCA 202, Data Warehousing & Data Mining
Conclusion • Operational system are not for strategic information • Data warehouse is an computing environment not product to provide strategic information Data analysis and decision support Flexible and interactive User driven
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
31
Let’s Discuss 1. How strategic information can increase the quality and realize opportunities with readily available strategic information Insurance Company Airlines Company
Proposal to explain problems with reasons Why data warehouse is viable ? 2. A Senior Analyst (IT Dept.) of a company manufacturing automobile parts. Marketing VP complains about poor IT response in providing strategic information.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
32
U1.
33
Data Warehouse :Building Block • • • •
Defining Features Data warehouses and data marts Overview of the components Metadata in the data warehouse
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.11
MCA 202, Data Warehousing & Data Mining
Defining Features • Key Defining Features of the Data ware house based on these Definitions. • What is the nature of the Data in the Data Warehouse? • How is this Data Different from the Data in any operational System? • Why does it have to be different? • How is the Data content in the Data Ware house used?
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
34
What is a Data Warehouse? Defined in many different ways, but not rigorously. A decision support database that is maintained separately from the organization’s operational database Support information processing by providing a solid platform of consolidated, historical data for analysis.
“A data warehouse is a subject- oriented, integrated, time - variant, and nonvolatile collection of data in support of management’s decision - making process.”—W. H. Inmon Data warehousing: The process of constructing and using data warehouses
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
35
Data Warehouse—Subject-Oriented •
Organized around major subjects, such as customer, product, sales.
•
Focusing on the modeling and analysis of data for decision makers, not on daily
•
Provide a simple and concise view around particular subject issues by excluding data
operations or transaction processing. that are not useful in the decision support process.
• Operational Systems • Data stored by individual applications. • Data sets for an order processing application, • These data sets provide the Data for all the functions for entering orders, Checking stock, Verifying customer’s credit, and assigning the order for shipment.
• • • • •
Subject-Oriented Data: But in Data Ware house, Data is stored by subjects. Business Subjects differ from organization to organization.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
36
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.12
MCA 202, Data Warehousing & Data Mining
Data Warehouse—Integrated Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records
Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc. When data is moved to the warehouse, it is converted.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
37
Data Warehouse—Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems. Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)
Every key structure in the data warehouse Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain “time element”.
• The time-variant nature of the Data in a Data Warehouse. Allows for analysis of the past. Relates information to the present. Enables forecasts for the future.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
38
Data Warehouse—Non-Volatile A physically separate store of data transformed from the operational environment. Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: initial loading of data and access of data.
•Data from an operational system is added, deleted as each transaction happens •Data updates are common place and operational Database; •volatile data in the Operational Databases
•No update, once the data is captured in the data ware house, •do not run individual transactions to change the data there. •Non volatile in data warehouse
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
39
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.13
MCA 202, Data Warehousing & Data Mining
Data Granularity • Operational system Lowest level of detail lot of Data Daily details
• Data warehouse Data Granularity in a Data ware house refers to the level of details. Data summarized at different levels. Monthly/quarterly summary
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
40
U1.
41
Data Warehouse vs. Heterogeneous DBMS Traditional heterogeneous DB integration: Build wrappers/mediators on top of heterogeneous databases Query driven approach
When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set Complex information filtering, compete for resources
Data warehouse: update-driven, high performance Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
Data Warehouse vs. Operational DBMS
OLTP (on-line transaction processing) Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.
OLAP (on-line analytical processing) Major task of data warehouse system Data analysis and decision making
Distinct features (OLTP vs. OLAP): User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated Database design: ER + application vs. star + subject View: current, local vs. evolutionary, integrated Access patterns: update vs. read-only but complex queries
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
42
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.14
MCA 202, Data Warehousing & Data Mining
OLTP vs. OLAP OLTP
OLAP
users
clerk, IT professional
knowledge worker
function
day to day operations
decision support
DB design
application-oriented
subject-oriented
data
current, up-to-date detailed, flat relational isolated repetitive
historical, summarized, multidimensional integrated, consolidated ad-hoc
read/write index/hash on prim. key short, simple transaction
lots of scans
unit of work # records accessed
tens
millions
#users
thousands
hundreds
DB size
100MB-GB
100GB-TB
metric
transaction throughput
query throughput, response
usage access
complex query
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
43
Why Separate Data Warehouse? High performance for both systems DBMS— tuned for OLTP: access methods, indexing, concurrency control, recovery Warehouse—tuned for OLAP: complex multidimensional view, consolidation.
OLAP
queries,
Different functions and different data: missing data: Decision support requires historical data which operational DBs do not typically maintain data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
44
Data Ware Houses and Data Marts Cont.. Data Ware House
Data Mart
Enterprise-wide Union of all Data marts
Departmental A Single Business Process.
Data Received from Staging Area Structure for corporate view of Data Organized on E-R model
Facts and Dimensions Technology optimal for data access and analysis. Structure to Suit the departmental View of data
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
45
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.15
MCA 202, Data Warehousing & Data Mining
Data Warehousing and OLAP Technology for Data Mining
What is a data warehouse? A multi- dimensional data model Data warehouse building blocks
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
46
U1.
47
Overview of Components
Data Ware house Components
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
Data Warehouse Components Information Delivery Component Source Data Component
Mgt. & Control Component Data Staging Component
Data Storage Component & Meta data Component
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
48
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.16
MCA 202, Data Warehousing & Data Mining
Data Ware house Components cont.. 1. Source Data Component: grouped into four broad categories • Production Data: • This category of data comes from various operational systems of the enterprise. • Internal Data: • In every organization, user keep their “private” spread sheets, documents, customer profiles and some times even departmental Databases.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
49
Data Ware house Components cont.. •
Archived Data:
•
In operational systems, periodically take the old data and store it in archived files. The Data in these archived files is referred to as Archived Data.
External Data: •
In this Category, the data included the data from the external sources. •
For Example: competitors.
Market
share
data
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
of
U1.
50
Data Ware house Components cont.. 2) Data Staging Component: • Data extracted from various operational systems and external source • Prepare data for storing in the data ware house. • The Extracted data from several disparate sources needs to be changed converted Make data ready to be stored in format suitable for querying and analysis.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
51
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.17
MCA 202, Data Warehousing & Data Mining
Cont.. • The 3 major functions need to be performed for getting the data ready. • Data Extraction / Extract the Data: For data ware house extract the data using appropriate techniques from large amount of data received from the operational system • Data Transformation: involves many forms of combining pieces of data from the different sources.
Merging, sorting in large scale in the staging area • When data transformation functions ends (collection of integrated data is cleaned, standardized and summarized). The data is ready to be loaded data in data warehouse. • Data Loading: In this phase initial movement of moves large volumes of data using up substantial amount of time. • As data warehouse function continuous extraction the changes to source data Transform, revision, feed incremental data revision. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
52
U1.
53
Data Movement in data warehouse
Yearly refresh
Quarterly refresh
Data Sources
Data Warehouse Monthly refresh
Daily refresh
Base data load
•Time consuming •Initial load moves large volume of data •Business condition determine refresh cycle © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
Cont. 3)Data Storage Component: • The data Storage for the data ware house is a separate repository. • The operational systems of our enterprise support the day-to-day operations. • The Data repositories of the operational systems typically contain only the current data, while the data repository for a data ware house, we need to keep large volumes of historical data for analysis. • So keep the data need to be kept in the data ware house in structures suitable for analysis, and not for quick retrieval of individual pieces of Information.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
54
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.18
MCA 202, Data Warehousing & Data Mining
Cont... 4) Informational Delivery Component: • Who are the user who need information from data warehouse. • To Provide information to the wide community of Data Warehouse users. • Novoice user No training Prefabricated reports and present queries
• Casual user Need information once in while Need prepackaged information Navigate through data warehouse, create customer report, adhoc queries
• The information delivery component includes a variety of information delivery. Such as, we may include several information delivery mechanisms, we provide for online queries and reports.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
55
Information delivery Component
Data Warehouse Information Delivery Component
Data Marts
Online
Ad hoc reports
Intranet
Complex queries
•No voice •Casual user
•MD Analysis MD Analysis
Internet Statistical Analysis
E-mail
Executive Info System (EIS) feed
•Business Analyst
•Senior Manager •High Level Managers
Data Mining
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
56
Data Ware house Components cont.. 5) Meta Data Component: • Meta Data in a Data ware house is similar to the Data dictionary or the Data Catalog in a Data Base Management System. • In data dictionary information about the logical data Structures, information about the files and addresses, information about the indexes.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
57
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.19
MCA 202, Data Warehousing & Data Mining
Cont.. 6) Management and Control Component: • This component of the data ware house architecture sits on top of all the other components, • The management and control component co-ordinates the services and activities with in the data warehouse. • Moderates the information delivery to the users. • Works with the database mgt. systems and enables data to be properly stored in the repositories. • Monitors the movement of the data into the staging area to the data warehouse storage. • Management and control component interact with metadata component to perform the management and control functions • Metadata : source of information for management module © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
58
Meta Data in the Data Warehouse •
Meta Data component serve as a directory of contents of data warehouse. • Meta data in a data warehouse fall in three major categories. 1) Operational Meta Data: • Operation meta data gets its data from operational data sources. • These sources contains different data structures for storing data from various operational system.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
59
Meta Data in the Data Warehouse cont.. 2) Extraction and Transformation Meta Data: • Extraction and transformation metadata contains data about the extraction of data from the source system like extraction frequency, extraction methods for data extraction. • This also contains the information about all the data transformation that take place in the data staging area. 3) End-User Meta Data: • The end-user meta data is the navigational map of the data ware house. • It enables the end-users to find information from the data warehouse. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
60
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.20
MCA 202, Data Warehousing & Data Mining
Conclusion The Data ware house is an informational environment that • Provides an integrated and total view of the enterprise. • Makes the enterprise’s current and historical information easily available for Decision Making. • Makes Decision - Support transactions possible with out hindering Operational Systems. • Renders the Organization’s information Consistent. • Presents a Flexible and interactive Source of Strategic information. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
61
Let’s Discuss 1. Data Analyst on project building a data warehouse for an insurance company.
List all possible data sources from which data will be brought too data warehouse (State assumptions).
2. For an airlines company,
Identify three operational applications that would feed into the data ware What would be the data load and refresh cycle
3. Identify potential users and information delivery methods for a data warehouse supporting large national grocery chain. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
62
Defining The Business Requirements • • • •
Dimensional analysis Information packages Requirements gathering methods Requirements definition
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
63
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.21
MCA 202, Data Warehousing & Data Mining
Dimensional Analysis • A data warehouse is an information delivery system. • It is not about technology, but about solving users’ problems and providing strategic information to the user. Requirement defining phase What information users need, not how the information will be provide
• Building a data ware house is different from building an operational system. Users cannot fully describe what they want in a data warehouse but they provide with important insights into how they think about business. Analysis required Business dimensions Measurement unit
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
64
Manager think in business dimension (number) Marketing VP • How much did the new product generate • Month by month, in southern division, by user demographic, by sales office, relative to previous version, plan
Marketing Manager • Sales statistics • By product, summarized by product categories, daily, weekly, monthly, by sale districts, by distribution channel
Financial Controller • Show expenses • Listing actual vs budget, by months, quarters, annual, by budget line item, by district, by division, , summarized for whole company
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
65
From Tables and Spreadsheets to Data Cubes • A data warehouse is based on a multidimensional data model which views data in the form of a data cube • A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables •
In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
66
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.22
MCA 202, Data Warehousing & Data Mining
Multidimensional Data
NY
Sales Volume as a function of time, city and product
LA SF
Juice
10
Cola
47
Milk
30
Cream
12
3/1 3/2 3/3 3/4
Date
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
67
Cube: A Lattice of Cuboids
all time
time,item
0-D(apex) cuboid
item
time,location
location
item,location
time,supplier time,item,location
supplier
1-D cuboids
location,supplier
2-D cuboids item,supplier
time,location,supplier
3-D cuboids
time,item,supplier
item,location,supplier
4-D(base) cuboid time, item, location, supplier
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
68
Dimensional nature of business data
Delhi Product
TV sets
Slice of product sale info (units sold)
Ge og
r ap
hy
Jan
Time
• can be extended to multiple dimension • Multidimensional cubes : Hypercube © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
69
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.23
MCA 202, Data Warehousing & Data Mining
Examples of business dimensions Customer
Time
Time
Agent
Flight Frequent flights
Fare class
Claims
Type
Airport
Status
Airlines Company
Time
Status Policy
Insured Party
Promotion
Insurance Business Sales units Product Status
Store
Supermarket chain
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
70
Information Packages-A New Concept • Information Packages: A methodology for determining requirement for a data warehouse based on business dimensions for analysis on business dimension. It incorporates basic measurements and business dimensions • Information package enables to
Define the common subject areas. Design key business metrics. Decide how data must be presented Determine how users will aggregate or roll up. Decide the data quantify for user analysis or query. Decide how data will be accessed. Establish data granularity Estimate data ware house size Determine the frequency for data refreshing
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
71
U1.
72
Information Subject : Sales Analysis Dimensions Locations
Products
Age Groups
Year
Country
Class
Group 1
Hierarchies
Time Period
Measured Facts : Forecast Sales, Budget Sales, Actual Sales An Information Packages © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.24
MCA 202, Data Warehousing & Data Mining
Cont.. • Business dimensions basis of IP • Hierarchical levels for further processing Drilling down and rolling up for analysis
• Categories : Data elements within business dimensions e.g. sales on holiday
• Key business metrics or facts number
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
73
Business dimension for auto sales analysis • Hierarchies and categories for each dimension • Product : Model name, Model year, package styling, product line, product category, exterior color, interior color, first model year • Dealer : Dealer name, city, state, single brand flag, date first operation • Customer demographics: Age, gender, income, marital status, house hold size, vehicle owned, home value, own or rent • Payment method: Financial type, term in months, interest rate, agent • Time: Date, month, quarter, year, day of week, day of month, season, holiday flag w © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
74
U1.
75
Cont.. • Metrics for analyzing automobile
Actual sale price Option price Full price Dealer add-ons Dealer credits Dealer invoice Amount of down Amount financed
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.25
MCA 202, Data Warehousing & Data Mining
Information Subject : Automaker Sales
Hierarchies
Dimensions Time
Product
Payment Method
Customer Demo Graphics
Dealer
Year
Model Name
Financial type
Age
Dealer Name
Quarter
Model Year
Gender
City
Month
Package
State
Date
Single Brand flag
Week Month Season Holiday Flag Measured Facts : Actual sale price, Option price, Full price, Dealer add-ons, etc
An Information Packages © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
76
Classification of users of data warehouse • Senior executive ( including sponsors) Have sense of direction, Involved in focused area
• Key departmental manager Report to executive in the area of focus
• Business analysts Prepare reports and analyses for executive and manager
• Operational system DBA Only gives info
• Other nominated by above
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
77
U1.
78
What requirements to gather? Broad list: • Data elements: fact classes, dimensions • Recording of data in terms of time • Data extracts from source systems • Business rules: attributes, ranges, domains, operational records
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.26
MCA 202, Data Warehousing & Data Mining
Requirements Gathering Methods • Interviews one to one sessions Group Sessions Not good initial state Useful for confirming requirements
• JAD (Joint Application Development) sessions Joint approach concerned group for a well defined purpose
• Review the existing documents Documentation from user department Documentation from IT
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
79
U1.
80
U1.
81
Interview process task before project launches • Select and train team member conducting interview • Assign roles for team member • Prepare questionnaire
Current information sources Subject areas Key performance matrices Information frequency
• Pre interview research
History and current structure of business unit No. of employee and roles and responsibilities Location of user Primary purpose of business unit Company market Competitor in market
• List of user to be interviewed • List expectations © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
Initial document for requirement definition
• • • • • • • • •
Interview write ups User profile Background and objective Information requirement Analysis requirement Current tools used Success criteria Useful business metrics Relevant business dimensions
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.27
MCA 202, Data Warehousing & Data Mining
Expectations from interviews •Senior executive
• Dep. Managers /Analyst
Organization executive Criteria for measuring success Key business issues, current and future Problem identification Vision and direction of organization Anticipated usage of DW
Departmental objective Success metrics Factor limiting success Key business issues Product and services Useful business dimensions for analysis Anticipated usage of DW
•IT Dept. Professional
Key operational source system Current information deliver process Type routing analysis Known quality issue Current IT support for information requests Concerns about proposed DW
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
82
U1.
83
U1.
84
JAD five phased approach •
•
•
Project definition Complete high level interviews Conduct management interviews Prepare management definition guide Research Become familiar with the business are and systems Document user information requirements Document business process Gather preliminary information Prepare agenda for the session Preparation Create working documents from previous phase Train the scribes Prepare visual aids Conduct pre session meetings Set up a venue for session Prepare checklist for objective © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
Cont.. • JAD sessions
Open with review of agenda and purpose Review assumptions Review data requirement Review business metrics and dimensions Discuss dimensions hierarchies and roll ups Resolve open issues Close sessions with the list of action items
• Final document
Convert the working document Map the gathered information List all data sources Identify all business dimensions and hierarchies Assemble and edit the document Conduct review sessions Get final approvals Establish procedure to change requirements
• Success of project using JAD depend on JAD team © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.28
MCA 202, Data Warehousing & Data Mining
JAD team • Executive sponsor Person controlling the funding, providing direction, empowering team member
• Facilitator Person guiding the team through JAD process
• Scribe Person designated to record all decision
• Full time participants Involved in decision making for data warehouse
• On call participants Person affected by project but only in specific area
• Observers Person for specific session without participating in decison © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
85
Requirements Definition: Scope And Content: • Formal documentation is often neglected • requirements definition Phase. conduct interviews and GD . review the existing documentation
• requirements definition document is the basis for the next phases in the system development life cycle. But often skip the detailed documentation of the requirements definition.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
86
U1.
87
Data Sources •
The requirement definition document should include the following information:
Available Data sources Data Structures with in the data sources Location of the Data Sources Data extraction procedures Availability of historical data.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.29
MCA 202, Data Warehousing & Data Mining
Cont.. • Data Transformation Data Transformation necessarily involve mapping of source data to the data in the data ware house.
• Data Storage: requirement definition document must sufficient details about storage requirement.
include
• Information Delivery:
Drill-Down Analysis. Roll-Up Analysis Slicing Ad hoc reports
• Information Package Diagram © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
88
Information Package Diagrams • The information packages diagrams crystallize the information requirements for the data warehouse. • It contains the critical matrices measuring the performance of the business units, the business dimensions along which the metrics are analyzed, and the details how drill-down & roll-up analyses are done.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
89
Requirements Definition Document Outline 1. Introduction (Purpose and Scope of the Project) 2. General Requirements description (Source system review e.g. interview Summary). State what type information are required in data warehouse. 3. Specific Requirements ( data transformation and Storage requirements) 4. Information Package (form of IP dig) 5. Other Requirements ( data extract frequency, Includes Data Loading Methods, location for info delivery etc.) 6. User Expectations (How the users expect to use the data ware House) 7. User Participation (List of tasks in which users expected to participate through out the development life cycle) 8. General Implementation Plan: (give a high level plan for implementation). © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
90
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.30
MCA 202, Data Warehousing & Data Mining
Let’s Discuss 1.
2.
3.
4.
VP of marketing for nation wide appliance manufacturer with three production plants. Describe three ways to analyze sales. What are business dimension for analysis. BigBook Inc is a large book distributor with domestic and international distributors to all leading bookseller. Initially build data ware house to analyze shipments that are ,made from the company many data warehouse. Determine, metrics, and business dimensions. Prepare an information package diagram. For a data warehouse on AuctionsPlus.com, an Internet auction upscale for works of art gather requirement for sales analysis. Find out key metrics, business dimensions, hierarchies and categories. Draw the information package diagram. Create a detailed outline formal requirements definition document for a data warehouse to analyze profitability of large departmental store chain
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
91
Business Requirements as the driving force
Business Requirements
Planning & Management
Maintenance
Design Architecture Infrastructure
Construction Architecture Infrastructure
Data Acquisition Data Storage Information Delivery
Data Acquisition Data Storage Information Delivery
Deployment
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
92
Data Design • In design phase data models are required for Staging area Transform, cleanse and integrate data from source system
Data warehouse repository
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
93
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.31
MCA 202, Data Warehousing & Data Mining
Requirements driving the data model
Information Package Diagram
Data Marts (Conformed/Dependent) Dimensional Model
Enterprise Data Model
Relational Model
Enterprise data warehouse
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
94
Composition of the components • Source data
Operational source systems Computing platforms, O/S, database files Departmental data such as files, documents & spreadsheets External data sources
• Data staging
Data mapping between data sources and staging area data structure Data transformation Data cleansing Data integration
• Data Storage
Size of extracted and integrated data DBMS features Growth potential Centralized or distributed
• © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
95
U1.
96
Cont… • Information delivery
Types and number of users Types of queries and reports Classes of analysis Front end DSS applications
• Metadata Operational
Operational meta data ETL (data extraction/transformation/loading) metadata End user meta data Metadata storage
• Management & control
Data loading External sources Alert systems End user information delivery
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.32
MCA 202, Data Warehousing & Data Mining
Impact of requirement on architecture Business Managing & Control Source Data Metadata
Information Delivery
Data Staging Data Storage
Requirements © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
97
Data Quality Bad data leads to based decisions •Data Pollution Sources • System conversions & Migrations •Heterogeneous system integration •Inadequate database design of source systems •Data aging •Incomplete information from customers •Input errors •Internationalization/localization of systems •Lack of data management policies/procedures
•Type of data quality problems •Dummy values in source system fields •Absence of data in source system fields •Multipurpose fields •Cryptic data •Contradicting data •Improper use of name •Violation of rules •Reused primary key •Non-unique identifiers
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
98
Business Requirements
Impact of requirement on metadata Operational Source system data structure, External data formats
Extraction/Transformation Data cleansing, conversion, integration
Data Warehouse metadata
End-user Querying, reporting, analysis, OLAP, special apps
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
99
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.33
MCA 202, Data Warehousing & Data Mining
Data Storage specifications • DBMS should be compatible with back and front end • Business elements that effect the choice of DBMS
Level of experience Type of queries Need for openness Data loads Metadata management Data repository location Data warehouse growth
• Size estimation
Data staging area Overall corporate data warehouse Data marts, dependent or conformed Multi dimensional database
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
100
Requirement definition on Users, location, queries, reports, analysis
Business Requirements
Impact of business requirement on Information delivery
Ad hoc reports
•No voice •Casual user
Online Complex queries •MD Analysis
Intranet Information Delivery Component
MD Analysis
Internet
Statistical Analysis
E-mail
Executive Info System (EIS) feed
•Business Analyst
•Senior Manager •High Level Managers
Data Mining
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
101
Conclusion • Gathering requirement for data warehouse is not same as for an operational system. • Requirement definition guides the whole process of system design and development. • Data warehouse environment is an information delivery system where user themselves access the data repository and create their own output whereas in operational system user is provided with predefined outputs. • It is essential to have right elements of information in the mist optimal format.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
102
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.34
MCA 202, Data Warehousing & Data Mining
Review Questions Objective Questions: 1) A data warehouse is which of the following? a) Can be updated by end users. b) Contains numerous naming conventions and formats. c) Organized around important subject areas. d) Contains only current data. 2)An operational system is which of the following? a) A system that is used to run the business in real time and is based on historical data. b) A system that is used to run the business in real time and is based on current data. c) A system that is used to support decision making and is based on current data. d) A system that is used to support decision making and is based on historical data. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
103
Review Questions cont.. 3)The generic two-level data warehouse architecture includes which of the following? a) At least one data mart b) Data that can extracted from numerous internal and external sources c) Near real-time updates d) All of the above. 4)The active data warehouse architecture includes which of the following? a) At least one data mart b) Data that can extracted from numerous internal and external sources c) Near real-time updates d) All of the above. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
104
Review Questions cont.. 5)Reconciled data is which of the following? a) Data stored in the various operational systems throughout the organization. b) Current data intended to be the single source for all decision support systems. c) Data stored in one operational system in the organization. d) Data that has been selected and formatted for end-user support applications. 6)Transient data is which of the following? a) Data in which changes to existing records cause the previous version of the records to be eliminated b) Data in which changes to existing records do not cause the previous version of the records to be eliminated c) Data that are never altered or deleted once they have been added d) Data that are never deleted once they have been added © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
105
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.35
MCA 202, Data Warehousing & Data Mining
Review Questions cont.. 7)The extract process is which of the following? a) Capturing all of the data contained in various operational systems b) Capturing a subset of the data contained in various operational systems c) Capturing all of the data contained in various decision support systems d) Capturing a subset of the data contained in various decision support systems 8)Data scrubbing is which of the following? a) A process to reject data from the data warehouse and to create the necessary indexes b) A process to load the data in the data warehouse and to create the necessary indexes c) A process to upgrade the quality of data after it is moved into a data warehouse d) A process to upgrade the quality of data before it is moved into a data warehouse © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
106
Review Questions cont.. 9)The load and index is which of the following? a) A process to reject data from the data warehouse and to create the necessary indexes b) A process to load the data in the data warehouse and to create the necessary indexes c) A process to upgrade the quality of data after it is moved into a data warehouse d) A process to upgrade the quality of data before it is moved into a data warehouse 10)Data transformation includes which of the following? a) A process to change data from a detailed level to a summary level b) A process to change data from a summary level to a detailed level c) Joining data from one source into various sources of data d) Separating data from one source into various sources of data
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
107
Review Questions cont.. Short answer type Questions Q1. Explain the need of metadata in a data warehouse? Q2. What do you mean by Strategic Information? Q3. Differentiate between Data Warehouse and Data Mart? Q4. What do you mean by a Web-enabled data warehouse? Q5. Define OLTP? Q6. What type of Processing take Place in a data warehouse? Q7. Define ETL routine? Q8. What data does an information package contain? Q9. In which situations can JAD methodology be successful for collecting requirements? Q10. List various data sources that feed the data warehouse? © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
108
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.36
MCA 202, Data Warehousing & Data Mining
Review Questions cont.. Long answer type Questions Q1. Explain Data warehouse Architecture in detail? Q2. Explain business Dimensions. Why and how can business dimensions be useful for defining requirements for the data warehouse? Q3. State any three factors that indicate the continued growth in data warehousing. Can you think of some examples? Q4. Discuss the top - down and bottom up approach of creating a data warehouse?
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
109
Review Questions cont.. Q5. For a commercial bank, name five types of strategic objectives and explain each objective in detail. Q6. What do you mean by Information Packages and also explain the need for information packages. Q7. A data warehouse is an environment, not a product. Discuss. Q8. Explain various type of data ware house meta data in detail. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
110
Review Questions cont.. Q9. For an airlines company, how can strategic information increases the number of frequent flyers? Discuss giving specific details. Q10. Examine the opportunities that can be provided by strategic information for a medical center. Can you explain five such opportunities
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
111
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.37
MCA 202, Data Warehousing & Data Mining
Suggested Reading/References 1. Paul Raj Poonia, “Fundamentals of Data Warehousing”, John Wiley & Sons, 2003. 2. Sam Anahony, “Data Warehousing in the real world: A practical guide for building decision support systems”, John Wiley, 2004 3. W. H. Inmon, “Building the operational data store”, 2nd Ed., John Wiley, 1999. 4. Kamber and Han, “Data Mining Concepts and Techniques”, Hartcourt India P. Ltd.,2001
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.
U1.
112
© Bharati Vidyapeeth’s Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel
U1.38