PART
IV
Managerial and Decision Support Systems
10. Know Knowledg ledge e Manage Management ment
11.
Data Management: Management: War Warehousing, ehousing, Analyzing, Mining, and Visualization
12. Management Decision Support and Intelligent Systems
CHAPTER
11
Data Management: Warehousing, Analyzing, Mining, and Visualization Harrah’s Entertainment
11.1 Data Management: A Critical Success Factor 11.2 Data Warehousing 11.3 Information and Knowledge Discovery with Business Intelligence 11.4 Data Mining Concepts and applications 11.5 Data Visualization Technologies 11.6 Marketing Databases in Action 11.7 Web-based Data Management Systems Minicases: (1) Sears / (2) Dall Dallas as Area Rapid Rapid Trans Transit it
490
LEARNING OBJECTIVES
After studying this chapter, you will be able to:
³ Recognize the importance of data, their managerial issues, and their life cycle. · Describe the sources of data, their collection, and quality issues. » Describe document management systems. ¿ Explain the operation of data warehousing and its role in decision support. ´ Describe information and knowledge discovery and business intelligence. ² Understand the power and benefits of data mining. ¶ Describe data presentation methods and explain geographical information systems, visual simulations, and virtual reality as decision support tools. º Discuss the role of marketing databases and provide examples. ¾ Recognize the role of the Web in data management.
FINDING DIAMONDS BY DATA MINING AT HARRAH’S
➥
THE PROBLEM
Harrah’s Entertainment (harrahs.com) is a very profitable casino chain. With 26 casinos in 13 states, it had $4 billion sales in 2002 and net income of $235 million. One of Harrah’s casinos is located on the Las Vegas strip, which typifies the marketing issues that casino owners face. The problem is very simple, how to attract visitors to come and spend money in your casino, and to do it again and again. There is no other place like the Las Vegas strip, where dozens of mega casinos and hundreds of small ones lure visitors by operating attractions ranging from fiery volcanoes to pirate ships. Most casino operators use intuition to plan inducements for customers. Almost all have loyalty cards, provide free rooms to frequent members, give you free shows, and more. The problem is that there is only little differentiation among the casinos. Casinos believe they must give those incentives to survive, but do they help casinos to excel? Harrah’s is doing better than most competing casinos by using management and marketing theories facilitated by information technology, under the leadership of Gary Loveman, a former Harvard Business School professor.
➥
THE SOLUTION
Harrah’s strategy is based on technology-based CRM and the use of customer data base marketing to test promotions. This combination enables the company to finetune marketing efforts and service-delivery strategies that keep customers coming back. Noting that 82.7 percent of its revenue comes from slot machines, Harrah’s started by giving each player a loyalty smart card. A smart-card reader on each slot machinein all 26 of its casinos records each customer’s activities. (Readers are also available in Harrah’s restaurants, gift shops, etc., to record any spending.) Logging your activities, you earn credit, as in other loyalty programs, for which you get free hotel rooms, dinners, etc. Such programs are run by most competitors, but Harrah’s goes a step further: It uses a 300-gigabyte transactional database, known as a data warehouse, to analyze the data recorded by the card readers. By tracking millions of individual transactions, the IT systems assemble a vast amount of data on customer habits and preferences. These data are fed into the enterprise data warehouse, which contains not only millions of transactional data points about customers (such as names, adddresses, ages, genders) but also details about their gambling, spending, and preferences. This database has become a very rich repository of customer information, and it is mined for decision support. The information found in Harrah’s database indicated that a loyalty strategy based on same-store (same casino, in this case) sales growth could be very beneficial. The goal is to get a customer to visit your establishment regularly. For example, analysis discovered that the company’s best customers were middle-aged and senior adults with discretionary time and income, who enjoyed playing slot machines. These customers did not typically stay in a hotel, but visited a casino on the way home from work or on a weekend night out. These customers responded better to an offer of $60 of casino chips than to a free room, two steak 491
492
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
meals, and $30 worth of chips, because they enjoyed the anticipation and excitement of gambling itself (rather than seeing the trip as a vacation get-away). This strategy offered a way to differentiate Harrah’s brand. Understanding the lifetime value of the customers became critical to the company’s marketing strategy. Instead of focusing on how much people spend in the casinos during a single visit, the company began to focus on their total spending over a long time. And, by gathering more and more specific information about customer preferences, running experiments and analyses on the newly collected data, and determining ways of appealing to players’ interests, the company was able to increase the amount of money customers spent there by appealing to their individual preferences. As in other casinos with loyalty programs, players are segregated into three tiers, and the biggest spenders get priorities in waiting lines and in awards. There is a visible differentiation in customer service based on the three-tier hierarchy, and every experience in Harrah’s casinos was redesigned to drive customers to want to earn a higher-level card. Customers have responded by doing what they can to earn the higher-tiered cards. However, Harrah’s transactional database is doing much more than just calculating gambling spending. For example, the casino knows which specific customers were playing at particular slot machines and at what time. Using data mining techniques, Harrah’s can discover what specific machines appealed to specific customers. This knowledge enabled Harrah’s to configure the casino floor with a mix of slot machines that benefited both the customers and the company. In addition, by measuring all employee performance on the matrices of speed and friendliness and analyzing these results with data mining, the company is able to provide its customers with better experiences as well as earn more money for the employees. Harrah’s implemented a bonus plan to reward hourly workers with extra cash for achieving improved customer satisfaction scores. (Bonuses totaling $43 million were paid over three years.) The bonus program worked because the reward depends on everyone’s performance. The general manager of a lower-scoring property might visit a colleague at a higher-scoring casino to find out what he could do to improve his casino’s scores.
➥
THE RESULTS
Harrah’s experience has shown that the better the experience a guest has and the more attentive your are to t o him or her, the more money will be made. For Harrah’s, good customer service is not a matter of an isolated incident or two but of daily routine. So, while somewhere along the Las Las Vegas Vegas strip a “V “Vesuvian” esuvian” volcano erupts loudly every 15 minutes, a fake British frigate battles a pirate ship at regular intervals, and sparkling fountains dance in a lake, Harrah’s continues to enhance benefits to its Total Rewards program, improves customer loyalty through customer service supported by the data mining, and of course makes lots of money. Sources: Complied from Loveman (2003) and from Levinson (2001).
➥
LESSONS LEARNED FROM THIS CASE
The opening case about Harrah’s illustrates the importance of data to a large entertainment company. company. It shows that it is necessary to collect vast amount of data,
11.1
DATA MANAGEMENT: A CRITICAL SUCCESS FACTOR
493
organize and store it proprely in one place, and then analyze it and use the results for better make marketing and other corporate decisions. The case shows us that new data go through a process and stages: Data are collected, processed, and stored in a data warehouse. Then, data are processed by analytical tools such as data mining and decision modeling. The findings of the data analysis direct promotional and other decisions. Finally, continous collection and analysis of fresh data provide management with feedback regarding the success of management strategies. In this chapter we explain how this process is executed with the help of IT. We will also deal with some additional topics that typically supplement the data management process.
11.1
D ATA M ANAGEMENT: A CRITICAL SUCCESS F ACTOR As illustrated throughout this textbook, IT applications cannot be done without using some kind of data. In other words, without data you cannot have most IT applications, nor can you make good decisions. Data, as we have seen in the opening case, are at the core of management and marketing operations. However, there are increasing difficulties in acquiring, keeping, and managing data.
The Difficulties Since data are processed in several stages and possibly places, they may be subof Managing Data, ject to some problems and difficulties. and Some Potential Solutions DATA PROBLEMS AND DIFFICULTIES. Managing data in organizations is difficult for various reasons: The amount of data increases exponentially with time. Much past data must be kept for a long time, and new data are added rapidly. However, However, only small portions of an organization’s data are relevant for any specific application, and that relevant data must be identified and found to be useful. ● Data are scattered throughout organizations and are collected by many individuals using several methods and devices. Data are frequently stored in several servers and locations, and in different computing systems, databases, formats, and human and computer languages. ● An ever-increasing amount of external data needs to be considered in making organizational decisions. ● Data security, quality, and integrity are critical, yet are easily jeopardized. In addition, legal requirements relating to data differ among countries and change frequently frequently.. ● Selecting data management tools can be a major problem because of the huge number of products available. ●
These difficulties, and the critical need for timely and accurate information, have prompted organizations to search for effective and efficient data management solutions. Historically, data management has been geared to supporting transaction processing by organizing the data in a hierarchical format in one location. This format supports secured and efficient high-volume SOLUTIONS TO MANAGING DATA.
494
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
processing; however, it may be inefficient for queries and other ad-hoc applications. Therefore, relational databases, based on organization of data in rows and columns, were added to facilitate end-user computing and decision support. With the introduction of client/server environments and Web technologies, databases became distributed throughout organizations, creating problems in finding data quickly and easily. This was the major reason that Harrah’s sought the creation of a data warehouse. As we will see later, the intranet, extranets and Web technologies can also be used to improve data management. It is now well recognized that data are an asset, although they can be a burden to maintain. Furthermore, the use of data, converted to information and knowledge, is power. The purpose of appropriate data management is to ease the burden of maintaining data and to enhance the power from their use. To see how this is done, let’s begin by examining how data are processed during their life cycle.
Data Life Cycle Process
Businesses do not run on raw data. They run on data that have been processed to information and knowledge, which mangers apply to businesses problems and opportunities. As seen at the Harrah’s case, knowledge fuels solutions. Everything from innovative product designs to brilliant competitive moves relies on knowledge (see Markus et al., 2002). However, because of the difficulties of managing data, cited earlier, deriving knowledge from accumulated data may not be simple or easy. Transformation Trans formation of data into knowledge and solutions is accomplished in several ways. In general, it resembles the process shown in Figure 11.1. It starts with new data collection from various sources. These data are stored in a data base(s). Then the data is preprocessed to fit the format of a data warehouse or data marts, where it is stored. Users then access the warehouse or data mart and take a copy of the needed data for analysis. The analysis is done with data analysis and mining tools (see Chopoorian et al., 2001) which look for patterns, and with intelligent systems, which support data interpretation. Note that not all data processing follows this process. Small and medium companies do not need data warehouses, and even many large companies do not need them. (We will see later who needs them.) In such cases data go directly from data sources or databases to an analysis. An example of direct processing is an application that uses real-time data. These can be processed as they are collected and immediately analyzed. Many Web data are of this type. In such a case, as we will see later, we use Web mining instead of data mining.
Data Sources
Data Da ta Ana naly lysi sis s
Internal Data External Data Personal Data
FIGURE 11.1 Data life cycle.
Data Marts Data Warehouse
Legacy, Mainframe
Solutions
Data Visualization
SCM CRM
Decisions Data Marts
Meta Data
OLAP, Queries EIS, DSS
Resu Re sult lt
Data Mining
EC Strategy
Knowledge Management
Business Intelligence
Others
11.1
DATA MANAGEMENT: A CRITICAL SUCCESS FACTOR
495
The result of all these activities is the generating of decision support and knowledge. Both the data (at various times during the process) and the knowledge (derived at the end of the process) may need to be presented to users. Such a presentation can be accomplished by using different visualization tools. The created knowledge may be stored in an organizational knowledge base (as shown in Chapter 10) and used, together with decision support tools, to provide solutions to organizational problems. The elements and the process shown in Figure 11.1 are discussed in the remaining sections of this chapter.
Data Sources
The data life cycle begins with the acquisition of data from data sources. Data sources can be classified as internal, personal, and external. An organization’s internal data are about people, products, services, and processes. Such data may be found in one or more places. For example, data about employees and their pay are usually stored in the corporate database. Data about equipment and machinery may be stored in the maintenance department database. Sales data can be stored in several places— aggregate sales data in the corporate database, and details at each regional data base. Internal data are usually accessible via an organization’s intranet. INTERNAL DATA SOURCES.
IS users or other corporate employees may document their own expertise by creating personal data. These data are not necessarily just facts, but may include concepts, thoughts, and opinions. They include, for example, subjective estimates of sales, opinions about what competitors are likely to do, and certain rules and formulas developed by the users. These data can reside on the user’s PC or be placed on departmental or business units’ databases or on the corporate knowledge bases. PERSONAL DATA.
There are many sources for external data, ranging from commercial databases to sensors and satellites. Government reports constitute a major source for external data. Data are available on CD-ROMs and memory chips, on Internet servers, as films, and as sound or voices. Pictures, diagrams, atlases, and television television are other sources of external data. Hundreds Hundreds of thousands of organizations worldwide place publicly accessible data on their Web servers, flooding us with data. Most external data are irrelevant to any single application. Yet, much external data must be monitored and captured to ensure that important data are not overlooked. Large amounts of external data are available on the Internet. EXTERNAL DATA SOURCES.
Many thousands of databases all over the world are accessible through the Internet. Much of the database access is free. A user can access Web pages of vendors, clients, and competitors. He or she can view and download information while conducting research. Some external data flow to an organization on a regular basis through EDI or through other company-to-company channels. Much external data are free; other data are available from commercial database services. A commercial online database publisher sells access to specialized databases, newspapers, magazines, bibliographies, and reports. Such a service can provide external data to users in a timely manner and at a reasonable cost. Many commercial database publishers will customize the data for each user. Several thousand services are currently available, most of which are accessible via the Internet. Many consulting companies sells reports online (e.g., aberdeen.com). THE INTERNET AND COMMERCIAL DATABASE SERVICES.
496
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
Methods forr Co fo Coll llec ecti ting ng Raw Data
The diversity of data and the multiplicity of sources make the task of data collection fairly complex. Sometimes it is necessary to collect raw data in the field. In other cases it is necessary to elicit data from people. Raw data can be collected manually or by instruments and sensors. Some examples of manual data collection methods are time studies, surveys, observations, and contributions from experts. Data can also be scanned or transferred electronically. Although a wide variety of hardware and software exists for data storage, communication, and presentation, much less effort has gone into developing software tools for data capture in environments where complex and unsta ble data exist. Insufficient methods for dealing with such situations may limit the effectiveness of IT development and use. One exception is the Web. Clickstream data are those that can be collected automatically, using special software, from a company’s Web site or from what visitors are doing on the site (see Chapter 5, and Turban et al., 2004). In addition, the use of online polls and questionnaires is becoming very popular (see Baumer, 2003, and Ray and Tabor, 2003). The collection of data from multiple external sources may be an even more complicated task. One way to improve it is to use a data flow manager (DFM), which takes information from external sources and puts it where it is needed, when it is needed, in a usable form (e.g., see smartdraw.com ). A DFM consists of (1) a decision support system, (2) a central data request processor, (3) a data integrity component, (4) links to external data suppliers, and (5) the processes used by the external data suppliers. The complexity of data collection can create data-quality problems. Therefore, regardless of how they are collected, data need to be validated. A classic expression that sums up up the situation is “garbage in, garbage out” (GIGO). Safeguards for data quality are designed to prevent data problems.
Data Quality and an d In Inte tegr grit ity y
Data quality (DQ) is an extremely important issue since quality determines the data’s usefulness as well as the quality of the decisions based on the data (Creese and Veytsel, 2003). Data are frequently found to be inaccurate, incomplete, or ambiguous, particularly in large, centralized databases. The economical and social damage from poor-quality data has actually been calculated to have cost organizations billions of dollars (see Redman, 1998). According to Brauer (2001), data quality is the cornerstone of effective business intelligence. Interest in data quality has been known for generations. For example, according to Hasan (2002), treatment of numerical data for quality can be traced to the year 1881. An example of typical data problems, their causes, and possible solutions is provided in Table 11.1. For a discussion of data auditing and controls, see Chapter 15. Strong et al. (1997) conducted extensive research on data quality problems. Some of the problems are technical ones such as capacity, while others relate to potential computer crimes. The researchers divided these problems into the following four categories and dimensions. Accuracy,, objectivity objectivity,, believability believability,, and reputation. Intrinsic DQ: Accuracy ● Accessibility DQ: Accessibility and access security. Relevancy,, value added, a dded, timeliness, completeness, amount of ● Contextual DQ: Relevancy data. ● Representation DQ: Interpretability, ease of understanding, concise representation, consistent representation. ●
11.1
TABLE 11.1
DATA MANAGEMENT: A CRITICAL SUCCESS FACTOR
497
Data Problems and Possible Solutions
Problem
Typical Cause
Possible Solutions (in Some Cases)
Data Da ta ar are e no nott co corr rrec ect. t.
Raw Ra w da data ta we were re en ente tere red d in inac accu cura rate tely ly..
Develo Deve lop p a sy syst stem emat atic ic wa way y to to en ensu sure re th the e accuracy of raw data. Automate (use scanners or sensors). Carefully mo monitor bo both th the da data va values an and the manner in which the data have been generated. generate d. Check Check for complianc compliance e with collection rules. Modi diffy th the sys yste tem m for ge gen nerat atin ing g th the e data da ta.. Mov Move e to a cli clien ent/ t/se serv rver er sy syst stem em.. Automate. Deve De vellop a sys syste tem m for for re ressca callin ing g or or recombining the improperly indexed data da ta.. Use Use in inte tell llig igen entt se sear arch ch ag agen ents ts..
Data de derived by by an an in individual were generated carelessly.
Data Da ta ar are e not tim imel ely y.
Data ar Data are e no not mea meassure red d or indexed properly.
Needed data simply do not exist.
The me The meth tho od fo forr genera rattin ing g the data wass not wa not ra rapi pid d en enou ough gh to me meet et th the e need for the data. Raw da datta wer were e ga gath the ered acc cco ordi din ng to a logic or periodicity that was nott con no consi sist sten entt wit with h th the e pu purp rpos oses es of the analysis. No one ever stored the data needed now. Required data never existed.
Whether or not it is useful now, store data for future use. Use the Internet to search for similar data. Use experts. Make an effort to generate the data or to estimate them (use experts). Use neural computing for pattern recognition.
Source: Compiled and modified from Alter (1980).
Strong et al. (1997) have suggested that once the major variables and relationships in each category are identified, an attempt can be made to find out how to better manage the data. Different categories of data quality are proposed by Brauer (2001). They are: standardization (for consistency), matching (of data if stored in different places), verification (against the source), and enhancement (adding of data to increase its usefulness). An area of increasing importance is data quality that is done very fast in real time. Many decisions are being made today in such an environment. For how to handle data in such a case, see Creese and Veytsel (2003) and Bates (2003). Another major data quality issue is data integrity. This concept means that data must be accurate, accessible, and up-to-date. Older filing systems may lack integrity. That is, a change made in the file in one place may not be made in a related file in another place. This results in conflicting data. Data are collected on a routine basis or for a special application on the Internet. In either case, it is necessary to organize and store the data before they can be used. This may be a difficult task especially when media-rich Web sites are involved. See Online File W11.1 for an example of a multimedia Web-based system. For a comprehensive approach on how to ensure quality of Internet-generated data, see Creese and Veytsel (2003). DATA QUALITY IN WEB-BASED SYSTEMS.
Document Management
There are several major problems with paper documents. For example, in maintaining paper documents, we can pose the following questions: (1) Does everyone
498
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
have the current version? version? (2) How often often does it need to be updated? (3) How secure are the documents? and (4) How can the distribution of documents to the appropriate individuals be managed in a timely manner? The answers to these and similar questions may be difficult. Electronic data processing overcome some of these problems. One of the earliest IT-enabled tools of data management is called document management . When documents are provided in electronic form from a single repository (typically a Web server), only the current version is provided. For example, many firms maintain their telephone directories in electronic form on an intranet to eliminate the need to copy and distribute hard copies of a directory that requires constant corrections. Also, with data stored in electronic form, access to various documents can be restricted as required (see Becker, 2003). Document management is the automated control of electronic documents, page images, spreadsheets, word processing documents, and other complex documents through their entire life cycle within an organization, from initial creation to final archiving. Document management offers various benefits: It allows organizations to exert greater control over production, storage, and distribution of documents, yielding greater efficiency in the reuse of information, the control of a document through a workflow process, and the reduction of product cycle times. Electronic delivery of documents has been around since 1999, with UPS and the U.S. Post Office playing a major role in such service. They deliver documents electronically over a secured system (e-mail is not secured), and they are able to deliver complex complex “documents” such as large files and multimedia videos videos exchange.ups.com (which can be difficult to send via e-mail). (See , and take the test drive there.) The need for greater efficiency in handling business documents to gain an edge on the competition has fueled the increased availability of document management systems, also known as electronic document management. Essentially, document management systems (DMSs) provide information in an electronic format to decision makers. The full range of functions that a document management system may perform includes document identification, storage, and retrieval; tracking; version control; workflow management; and presentation. The Thomas Cook Company, for example, uses a document management system to handle travel-refund applications. The system works on the PC desktop and has automated the workflow process, helping the firm double its volume of business while adding only about 33 percent more employees (see Cole, 1996). Another example is the Massachusetts Department of Revenue, which uses imaging systems to increase productivity of tax return processing by about 80 percent (see civic.com/pubs, 2001). Document management deals with knowledge in addition to data and information. See Asprev and Middleton (2003) for an overview and for the relationship of document management with knowledge management. The major tools of document management are workflow software, authoring tools, scanners, and databases (object-oriented mixed with relational, known as object-relational database management systems; see Technology Guide 3). Document management systems usually include computerized imaging systems that can result in substantial savings, as shown in Online File W11.2. One of the major vendors of document management is Lotus Development Corporation. Its document databases and their replication property provide WHAT IS DOCUMENT MANAGEMENT?
11.1
DATA MANAGEMENT: A CRITICAL SUCCESS FACTOR
499
A CLOSER LOOK 11.1 HOW COMPANIES USE DOCUMENT MANAGEMENT SYSTEMS ere are some examples of how companies use docenables emergency crews to solve problems, and even save lives, much faster. Laptop computers are installed in ument management systems to manage data and each departmental vehicle, loaded with maps, drawings, documents: and historical repair data. (see laserfiche.com/newsroom/ The Surgery Center of Baltimore stores all of its medtorantoworks.html ). ical records electronically, providing instant patient inforThe University of Cincinnati, a state university in Ohio, mation to doctors and nurses anywhere and any time. is required to provide authorized access to the personnel The system also routes charts to the billing department, files of 12,000 active employees and tens of thousand of whose employees can scan and e-mail any related information to insurance providers and patients. The DMS also retirees. There are over 75,000 queries about the personnel records every year, and answers need to be found helps maintain an audit trail, including providing records among 2.5 million records. Using antiquated microfilm for legal purposes or action. Business processes have been system to find answers tooks days. The solution was a expedited by more than 50 percent, the cost of such DMS that digitized all paper and microfilm documents, processes is significantly lower, and morale of office employees in the center is up (see laserfiche.com/newsroom/ making them available via the Internet and the intranet. An authorized employee can now use a browser and baltimore.html ). American Express is using a DMS to collect and process access a document in seconds (see imrgold.com/en/case_ studies/edu_Univ_of_Cin.as p). over one million customer satisfaction surveys each year. The European Court of Human Rights (44 countries in The data are collected in templates of over 600 different Europe) created a Web-based document and KM system survey forms, in 12 languages, in 11 countries. The syswhich was originally stored on an intranet and now is tem (TELEform from Alchemy and Cardiff Software) is stored in a separate organizational knowledge base. The integrated with AMEX’s legacy system and is capable of DMS have had over 20 million hits in 2002 (Canada distributing processed results to many managers. Staff NewsWire, 2003). Millions of euros are saved each year who process these forms has been reduced from 17 to 1, saving AMEX over $500,000 each year (see imrgold.com/ just on printing and mailing documents. McDonnell-Douglas (now part of the Boeing Comen/case-studies/fin_AMEX_Cardiff.asp ). pany) distributed aircraft service bulletins to its customers LifeStar, an ambulance service in Tulare, California, is around the world using the Internet. The company used keeping all historical paper documents on optical disks. to distribute a staggering volume of bulletins to over 200 Hundreds of boxes with documents were digitized, and so airlines, using over 4 million pages of documentation are all new documents. Furthermore, all optical disks are every year. Now it is all on the Web, saving money and backed up and are kept in different locations for security time both for the company and for its customers. purposes (see laserfiche.com/newsroom/tulare.html ). Toronto, Canada, Works and Emergency Services Motorola uses a DMS not only for document storage and retrieval, but also for small-group collaboration and Department uses a Web-based record document-retrieval companywide knowledge sharing. It develops virtual solution. With it, employees have immediate access to communities where people can discuss and publish infordrawings and the documents related to roads, buildings, mation, all with the Web-enabled DMS. utility lines, and more. Quick access to these documents
H
many advantages for group work and information sharing (see lotus.com). For further discussion see imrgold.com and docuvantage.com. In many organizations, documents are now viewed as multimedia objects with hyperlinks. The Web provides easy access to pages of information. DMSs excel in this area. (see examples in A Closer Look 11.1). Web-enabled DMSs also make make it easy to put information on intranets, since many of them provide instantaneous instantaneous conversion of documents documents to HTML. BellSouth, BellSouth, for example, WEB-BASED DMS.
500
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
saves an estimated $17.5 million each year through its intranet-enabled formsmanagement system. For an example of Web-enabled document management systems see Delcambre et al. (2003). Other examples of how companies use document management systems, both on and off the Web, are shown in A Closer Look 11.1. Closerr Look Look 11. 11.1, 1, time and In all of the examples cited in the text and in A Close money are saved. Also, documents are not lost or mixed up. An issue related to document management systems is how to provide the privacy and security of personal data. We We address that issue in Chapters 15 and 16.
11.2
D ATA W AREHOUSING Many large and even medium size companies are using data warehousing to make it easy and faster to process, analyze, and query data.
Transactional versus Analytical Processing
Data processing in organizations can be viewed either as transactional or analytical . The data in transactions processing systems (TPSs) are organized mainly in a hierarchical structure (Technology Guide 3) and are centrally processed. The databases and the processing systems involved are known as operational systems, and the results are mainly transaction reports. This is done primarily for fast and efficient processing of routine, repetitive data. Today, however, the most successful companies are those that can respond quickly and flexibly to market changes and opportunities, and the key to this response is the effective and efficient use of data and information as shown in the Harrah’s case. This is done not only via transaction processing, but also through a supplementary activity, called analytical processing, which involves analysis of accumulated data, frequently by end users. Analytical processing, also refered to as business intelligence , includes data mining, decision support systems (DSS), enterprise information systems (EIS), Web applications, quering, and other end-user activities. Placing strategic information in the hands of decision makers aids productivity and empowers users to make better decisions, leading to greater competitive advantage. A good data delivery system should be able to support easy data access by the end users themselves, quick, accurate, flexible and effective decision making. There are basically two options for conducting analytical processing. One is to work directly with the operational systems (the “let’s use what we have” approach), using software tools and components known as front-end tools, and middleware (see Technology Guide 3). The other is to use a data warehouse. The first option can be optimal for companies that do not have a large number of end users running queries and conducting analyses against the operating systems. It is also an option for departments that consist mainly of users who have the necessary technical skills for an extensive use of tools such as spreadsheets (see BIXL, 2002), and graphics. (See Technology Guide 2.) Although it is possible for those with fewer technical skills to use query and reporting tools, they may not be effective, flexible, or easy enough to use in many cases. Since the mid-1990s, there has been a wave of front-end tools that allow end users to ease these problems by directly conducting queries and reporting on data stored in operational databases. The problem with this approach, however, is that the tools are only effective with end users who have a medium to high
11.2
DATA WAREHOUSING
501
degree of knowledge about databases. This situation improved drastically with the use of Web-based tools. Yet, when data are in several sources and in several formats, it is difficult to bring them together to conduct an analysis. The second option, a data warehouse, overcomes these limitations and provides for improved analytical processing. It involves three concepts: 1. A business representation of data for end users 2. A Web-based environment that gives the users query and reporting capa bilities 3. A server-based repository (the data warehouse) that allows centralized security and control over the data
The Data Warehouse
The Harrah’s case illustrates some major benefits of a data warehouse, which is a repository of subject-oriented historical data that is organized to be accessible in a form readily acceptable for analytical processing activities (such as data mining, decision support, querying, and other applications). The major benefits of a data warehouse are (1) the ability to reach data quickly, since they are located in one place, and (2) the ability to reach data easily and frequently by end users with Web browsers. To aid the accessibility of data, detail-level operational data must be transformed to a relational form, which makes them more amenable to analytical processing. Thus, data warehousing is not a concept by itself, but is interrelated with data access, retrieval, analysis, and visualization. (See Gray and Watson, 1998, and Inmon, 2002.) The process of building and using a data warehouse is shown in Figure 11.2. The organization’s data are stored in operational systems (left side of the figure). Using special software called ETL (extraction, transformation, load), data
Business Intelligence Data Access Extraction, Transformation, Load (ETL)
POS
Replication
Data Mart
DSS
Custom-Built Applications (4GL tools)
Marketing ERP Metadata Repository Finance
Legacy
Enterprise Data Warehouse Misc. OLTP
External Web documents Operational Systems/Data
Supply Chain Data
Marketing
Data Mart
Federated Data Warehouse
Management
M i d d l e w a r e
EIS, Reporting
I n t e r n e t
Web Browser
Relational Query Tools
External EDI
OLAP/ROLAP
Data Mart Finance Data Mining
Source: Drawn by E. Turban.) FIGURE 11.2 Data warehouse framework and views. ( Source:
502
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
are processed and then stored in a data warehouse. Not all data are necessarily transferred to the data warehouse. Frequently only a summary of the data is transferred. The data that are transferred are organized within the warehouse in a form that is easy for end users to access. The data are also standardized. Then, the data are organized by subject, such as by functional area, vendor, or product. In contrast, operational data are organized according to a business process, such as shipping, purchasing, or inventory control and/or functional department. (Note that ERP data can be input to data warehouse as well as ERP and SCM decisions use the output from data warehouse. See Grant, 2003.) Data warehouses provide for the storage of metadata, meaning data about data. Metadata include software programs about data, rules for organizing data, and data summaries that are easier to index and search, especially with Web tools. Finally, middleware tools enable access to the data warehouse (see Technology Guide 3, and Rundensteiner et al., 2000). CHARACTERISTICS OF A DATA WAREHOUSE.
The major characteristics of data
warehousing are: 1. Organization. Data are organized by subject (e.g., by customer, vendor, product, price level, and region), and contain information relevant for decision support only. 2. Consistency. Data in different operational databases may be encoded differently. For example, gender data may be encoded 0 and 1 in one operational system syste m and “m” and “f” in another another.. In the warehouse warehouse they they will be coded coded in a consistent manner. 3. Time variant. The data are kept for many years so they can be used for trends, forecasting, and comparisons over time. 4. Nonvolatile. Once entered into the warehouse, data are not updated. 5. Relational. Typically the data warehouse uses a relational structure. 6. Client/server. The data warehouse uses the client/server architecture mainly to provide the end user an easy access to its data. 7. Web-based. Today’s data warehouses are designed to provide an efficient computing environment for Web-based applications (Rundensteiner et. al., 2000). Moving information off the mainframe presents a company with the unique opportunity to restructure its IT strategy. Companies can reinvent the way in which they shape and form their application data, empowering end users to conduct extensive analysis with these data in ways that may not have been possible before (e.g., see Minicase 1, about Sears). Another immediate benefit is providing a consolidated view of corporate data, which is better than providing many smaller (and differently formatted) views. For example, separate applications may track sales and coupon mailings. Combining data from these different applications may yield insights into the cost-efficiency of coupon sales promotions that would not be immediately evident from the output data of either applications alone. Integrated within a data warehouse, however, such information can be easily extracted and analyzed. Another benefit is that data warehousing allows information processing to be offloaded from expensive operational systems onto low-cost servers. Once this is BENEFITS.
11.2
DATA WAREHOUSING
503
done, the end-user tools can handle a significant number of end-user information requests. Furthermore, some operational system reporting requirements can be moved to decision support systems, thus freeing up production processing. These benefits can improve business knowledge, provide competitive advantage (see Watson et. al., 2002), enhance customer service and satisfaction (see Online File W11.3), facilitate decision making, and help in streamlining business processes. The cost of a data warehouse can be very high, both to build and to maintain. Furthermore, it may difficult and expensive to incorporate data from obsolete legacy systems. Finally, there may be a lack of incentive to share data. Therefore, a careful feasibility study must be undertaken before a commitment is made to data warehousing. COST.
There are several basic architectures for data warehousing. Two common ones are two-tier and three-tier architectures. (See Gray and Watson, 1998.) In three-tier architecture, data from the warehouse are processed twice and deposited in an additional multidimensional database, organized for easy multidimensional analysis and presentation (see Section 11.3), or replicated in data marts. For a Web-based architecture see Rundensteiner et al., 2000. The architecture of the data warehouse determines the tools needed for its construction (see Kimball and Ross, 2002).
ARCHITECTURE AND TOOLS.
Delivery of data warehouse content to decision makers throughout the enterprise can be done via an intranet. Users can view, query, and analyze the data and produce reports using Web browsers. This is an extremely economical and effective method of delivering data (see Kimball and Ross, 2002, and Inmon, 2002). PUTTING THE WAREHOUSE ON THE INTRANET.
Data warehousing is most appropriate for organizations in which some of the following apply: large amounts of data need to be accessed by end users (see the Harrah’s and Sears Cases); the operational data are stored in different systems; an information-based approach to management is in use; there is a large, diverse customer base (such as in a utility company or a bank); the same data are represented differently in different systems; data are stored in highly technical formats that are difficult difficult to decipher; and extensive end-user end-user computing is performed (many end users performing many activities; for example, Sears has 5,000 users). Some of the successful applications are summarized in Table 11.2. Hundreds of other successful applications are reported (e.g., see client success stories and case studies at Web sites of vendors such as Brio Technology Inc., Business Objects, Cognos Corp., Information Builders, NCR Corp., Oracle, Platinum Technology, Software A&G, and Pilot Software). For further discussion see Gray and Watson (1998) and Inmon (2002). Also visit the Data Warehouse Institute ( dw-institute.org ). SUITABILITY.
Data Marts, Operational Data Stores, and Multidimensional Databases
Data warehouses are frequently supplemented with or substituted by the following: data marts, operational data stores, and multidimensional databases. The high cost of data warehouses confines their use to large companies. An alternative used by many other firms is creation of a lower cost, DATA MARTS.
504
TABLE 11.2
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
Summary of Strategic Uses of Data Warehousing
Industry
Functional Areas of Use
Strategic Use
Airline
Operations and Marketing
Apparel Banking
Health Care Investment and Insurance Personal Care Products
Distribution and Marketing Product Development, Operatio Oper ations, ns, and Marketing Product Development and Marketing Operations Product Development, Operatio Oper ations, ns, and Marketing Marketing Distribution and Marketing
Crew assignment, aircraft deployment, mix of fares, analysis of route profitability, frequent flyer program promotions Merchandising, and inventory replenishment Customer service, trend analysis, product and service promotions. Reduction of IS expenses
Public Sector Retail Chain
Operations Distribution and Marketing
Steel Telecommunications
Manufacturing Product Development, Operatio Oper ations, ns, and Marketing Marketing
Credit Card
Customer service, new information service for a fee, fraud detection Reduction of operational expenses Risk management, market movements analysis, customer tendencies tendencies analysis, analysis, portfolio portfolio manageme management nt Distribution decision, product promotions, sales decision, pricing policy Intelligence gathering Trend analysis, buying pattern analysis, pricing policy, inventory control, sales promotions, optimal distribution channel Pattern analysis (quality control) New product and service promotions, reduction of IS budget, profitability analysis
Source: Park (1997), p. 19, Table 2.
scaled-down version of a data warehouse called a data mart. A data mart is a small warehouse designed for a strategic business unit (SBU) or a department. The advantages of data marts include: low cost (prices under $100,000 versus $1 million or more for data warehouses); significantly shorter lead time for implementation, often less than 90 days; local rather than central control, conferring power on the using group. They also contain less information than the data warehouse. Hence, they have more rapid response and are more easily understood and navigated than an enterprisewide data warehouse. Finally, they allow a business unit to build its own decision support systems without relying on a centralized IS department. There are two major types of data marts: 1. Replicated (dependent) data marts. Sometimes it is easier to work with a small subset of the data warehouse. In such cases one can replicate some subsets of the data warehouse in smaller datamarts, each of which is dedicated to a certain area, as was shown in Figure 11.2. In such a case the data mart is an addition to the data warehouse. 2. Standalone data marts. A company can have one or more independent data marts without having a data warehouse. Typical Typical data marts are for marketing, finance, and engineering applications. An operational data store is a database for transaction processing systems that uses data warehouse concepts to provide OPERATIONAL DATA STORES.
11.3
INFORMATION AND KNOWLEDGE DISCOVERY WITH BUSINESS INTELLIGENCE
505
clean data. That is, it brings the concepts and benefits of the data warehouse to the operational portions of the business, at a lower cost. It is used for shortterm decisions involving mission-critical applications rather than for the medium- and long-term decisions associated with the regular data warehouse. These decisions depend on much more current information. For example, a bank needs to know about all the accounts for a given customer who is calling on the phone. The operational data store can be viewed as situated between the operational data (in legacy systems) and the data warehouse. A comparison between the two is provided by Gray and Watson (1998). Multidimensional databases are specialized data stores that organize facts by dimensions, such as geographical region, product line, salesperson, pr time. The data in multidimensional databases are usually preprocessed and stored in what is called a (multi-dimensional) data cube. Facts, such as quantities sold, are placed at the intersection of the dimensions. One such intersection might be the quantities of widgets sold by Ms. Smith in the Morristown, New Jersey, branch of XYZ Company in July 2003. Dimensions often have a hierarchy. Sales figures, for example, might be presented by day, by month, or by year. They might also roll up an organizational dimension from store to region to company. Multidimensional databases can be incorporated in a data warehouse, sometimes as its core, or they can be used as an additional layer. See Online File W11.4 for details. MULTIDIMENSIONAL DATABASES.
11.3
INFORMA NFORMATION TION AND KNOWLEDGE DISCOVERY WITH BUSINESS INTELLIGENCE Business Once the data are in the data warehouse and/or data marts they can be accessed Intelligence by managers, and analysts, and other end users. Users can then conduct several activities. These activities are frequently referred to as analytical processing or more commonly as business intellignce . (Note: For a glossary of these and other terms, see Dimensional Insight , 2003.) Business intellignce (BI) is a broad category of applications and techniques for gathering, storing, analyzing and providing access to date to help enterprise users make better business and strategic decisions (see Oguz, 2003, and Moss and Atre, 2003). The process of BI usually, but not necessarily, involves the use of a data warehouse, as seen in Figure 11.3. Operational raw data are usually kept in corporate databases. For example, a national retail chain that sells everything from grills and patio furniture to plastic utensils, has data about inventory, customer information, data about past promotions, and sales numbers in various databases. Though all this information may be scattered across multiple systems— and may seem unrelated—a data warehouse building software can bring it together to the data warehouse. In the data warehouse (or mart) tables can be linked, and data cubes (another term for multidimensional databases) are formed. For instance, inventory information is linked to sales numbers and customer data bases, allowing for extensive analysis of information. Some data warehouses have a dynamic link to the databases; others are static. HOW BUSINESS INTELLIGENCE WORKS.
506
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
Raw Data and Databases
Point of Sale
s e l a S
Data Warehouse
Business Intelligence Tools
Insight
Results
Graph er ot
S Supply b Chain e W
Call Center
y r o t n e v n I
P = X3 2V Report
Alert
r e m o t s u C
Forecasting
Query
Data Mining
FIGURE 11.3 How business intelligence Works.
Using a business intelligence software the user can ask queries, request adhoc reports, or conduct any other analyses. For example, analysis can be carried out by performing multilayer queries. Because all the databases are linked, you can search for what products a store has too much of. You can then determine which of these products commonly sell with popular items, based on previous sales. After planning a promotion to move the excess stock along with the popular products (by bundling them together, for example), you can dig deeper to see where this promotion would be most popular (and most profitable). The results of your request can be reports, predictions, alerts, and/or graphical presentations. These can be disseminated to decision-making tasks. More advanced applications of business intelligence include outputs such as financial modeling, budgting, resource allocation, and competitive intelligence. Advanced business intelligence systems include components such as decision models, business performance analysis, metrics, data profiling and reengineering tools, and much more. (For details see dmreview.com ) BI employs large number of tools and techniques. The major applications include the activities of query and reporting, online analytical processing (OLAP), DSS, data mining, forecasting and statistical analysis. A major BI vendor is SAS ( sas.com). Other vendors include Microstrategy, Congos, SPSS, and Business Objects. We have divided the BI tools into two major categories: (1) information and knowledge discovery and (2) decision support and intelligent analysis . In each category there are THE TOOLS AND TECHNIQUES OF BUSINESS INTELLIGNCE.
11.3
INFORMATION AND KNOWLEDGE DISCOVERY WITH BUSINESS INTELLIGENCE
507
Business Intelligence
Information and Knowledge Discovery
FIGURE 11.4 Categories of business intelligence.
Decision Support and Intelligent Systems
Ad-hoc query
DSS
Management Science Analysis
OLAP
Group DSS
Profitability and other analysis
Data Mining
Executive and Enterprise support
Web Mining
Integrated Decision Support
Applied Artificial Intelligence
Metrics scorecard BPI, BPM
several tools and techniques, as shown in Figure 11.4. In this chapter we will describe the information and knowledge discovery category, while Chapter 12 is dedicated to decision support and intelligent systems.
The Rools and Techniques of Information and Knowledge Discovery
Information and knowledge discovery differs from decision support in its main objective: Discovery. Once discovery is done the results can be used for decision support. Let’s distinguish first between information and knowledge discovery. Information discovery started in the late sixties and early seventies with data collection techniques. It was basically simple data collection and answered queries that involved one set of historical data. This analysis was extended to answer questions that involved several sets of data with tools such as SQL and relational database management systems (see Table 11.3 for the evolution). During the 1990s, a recognition for the need of better tools to deal with the ever increasing amount of data, was initiated. This resulted in the creation of the data warehouse and the appearance of OLAP and multidimensional databases and presentation. When the amount of data to be analyzed exploded in the mid 1990s knowledge discovery emerged as an important analytical tool. The process of extracting useful knowledge from volumes of data is known as knowledge discovery in databases (KDD), or just knowledge discovery, and it is the subject of extensive research (see Fayyad et al., 1996). KDD’s major objective is to identify valid, novel, potentially useful, and ultimately understandable patterns in data. KDD is useful because it is supported by three technologies that are now sufficiently mature: massive data collection, powerful multiprocessor computers, and data mining and other algorithms. KDD processes have appeared under various names and have shown different characteristics. As time has passed, KDD has become able to answer more complex THE EVOLUTION OF INFORMATION AND KNOWLEDGE DISCOVERY.
508
TABLE 11.3
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
Stages in the Evolution of Knowledge Discovery
Evolutionary Stage
Business Quesstion
Enabling Technologies
Characteristics
Data collection (1960s)
What was my total revenue in the last five years? What were unit sales in New England last March?
Computers, tapes, disks
Retrospective, static data delivery Retrospective, dynamic data delivery at record level Retrospective, dynamic data delivery at multiple levels Prospective, proactive information delivery
Data access (1980s)
Data warehousing and decision support (early 1990s) Intelligent data mining (late 1990s) Advanced in intellegenet system Complete integration (2000 – 2004)
What were the sales in region A, by product, by salesperson? What’s likely to happen to the Boston unit’s sales next month? Why? What is is th the be best pl plan to to follow? How did we perform compared to metrics?
Relational databases (RDBMS), structured query language (SQL) OLAP, multidimensional databases, data warehouses Advanced glorithms, multiprocessor computers, massive databases Neural co computing, ad advanced AI models, complex optimization, Web services
Proactive, in integrative; multiple business partners
Source: Based on material from accure.com (Accure Software).
business questions. In this section we will describe two tools of information discovery: Ad-hoc queries, and OLAP. OLAP. Data mining as a KDD is described in Section 11.4. We discuss multidimensionality in Section 11.5. Web-based query tools are described in Section 11.7. Ad-hoc queries allow users to request, in real time, information from the computer that is not available in periodic reports. Such answers are needed to expedite decision making. The system must be intelligent enough to understand what the user wants. Simple ad-hoc query systems are often based on menus. More intelligent systems use structured query language (SQL) and query-by-example approaches, which are described in Technology Guide 3. The most intelligent systems are based on natural language understanding (Chapter (Chapter 12) and some some can communicate with users using voice recognition. Later on we will describe the use of Web tools to facilitate queries. Quering systems are frequently combined with reporting systems that generate routine reports. For an example of such a combination in a radio rental store see Amato-McCoy, 2003. For Web-based information discovery tools see Online File W11.5. AD-HOC QUERIES AND REPORTING.
The term online analytical processing (OLAP) was introduced in 1993 by E. F. Codd, to describe a set of tools that can analyze data to reflect actual business needs. These tools were based on a set of 12 rules: (1) multidimensional view, (2) transparency to the user, (3) easy accessibility, (4) consistent reporting, (5) client/server architecture, (6) generic dimensionality, (7) dynamic sparse matrix handling, (8) multiuser support, (9) cross-dimensional operations, (10) (10) intuitive data manipulation, manipulation, (11) flexible ONLINE ANALYTICAL PROCESSING.
11.3
INFORMATION AND KNOWLEDGE DISCOVERY WITH BUSINESS INTELLIGENCE
509
reporting, and (12) unlimited levels of dimension and aggregation. For details see Codd et al., 1993. Let’s see how these rules may work: Assume that a business might organize its sales force by regions, say the Eastern and Western. These two regions might then be broken down into states. In an OLAP database, this organization would be used to structure the sales data so that the VP of sales could see the sales figures for each region. The VP might then want to see the Eastern region broken down by state so that the performance of individual state sales managers could be evaluated. Thus OLAP reflects the business in the data structure. The power of OLAP is in its ability to create these business structures (sales regions, product categories, fiscal calendar, partner channels, etc.) and combine them in such a way as to allow users to quickly answer business questions. “How many blue sweaters were were sold via mail-order mail-order in New York York so far this year?” year?” is the kind of question that OLAP is very good at answering. Users can interactively slice the data and drill down to the details they are interested in. In terms of the technology, an OLAP database can be implemented on top of an existing relational database (this is called ROLAP, for Relational OLAP) or it can be implemented via a specialized multidimensional data store (this is called MOLAP, for Multidimensional OLAP. In ROLAP, the data request is translated into SQL and the relational database is queried for the answer. In MOLAP, the specialized data store is preloaded with the answers to (all) possible queries so that any request for data can be returned quickly. Obviously there are performance and storage tradeoffs between these two approaches. (Another technology called HOLAP attempts to combine these two approaches.) Unlike online transaction processing (OLTP) applications, OLAP involves examining many data items (frequently many thousands or even millions) in complex relationships. In addition to answering users’ queries, OLAP may analyze these relationships and look for patterns, trends, and exceptions. A typical OLAP query might access a multigigabyte, multiyear sales database in order to find all product sales in each region for each product type. (See the Harrah’s case.) After reviewing the results, an analyst might further refine the query to find sales volume for each sales channel within region or product classifications. As a last step, the analyst might want to perform year-to-year or quarterto-quarter comparisons, for each sales channel. This whole process must be carried out online with rapid response time so that the analysis process is undisturbed. The 12 original rules of Codd are reflected in today’s software. For example, today’s software permits access to very large amounts of data, such as several years of sales data, usually with a browser, and analysis of the relationships between many types of business elements, such as sales, products, regions, and channels. It enables users to process aggregated data, such as sales volumes, budgeted dollars, and dollars spent, to compare aggregated data over time, and to present data in different perspectives, such as sales by region versus sales by product or by product within each region. Today’s software also ienables complex calculations between data elements, such as expected profit as calculated as a function of sales revenue for each type of sales channel in a particular region; and responds quickly to user requests so that users can pursue an analytical thought process without being stymied by the system. OLAP can be combined with data mining to conduct a very sophisticated multidimensional-based decision support as illustrated by Fong et al., 2002. By
CHAPTER 11
510
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
attaching a rule-based component to the OLAP, one can make such integrated system an integlligent data mine system system (see Lau et al., 2001). For more information, information, products and vendors visit olapreport.com and olap.com. Although OLAP and ad-hoc queries are very useful in many cases, they are retrospective in nature and cannot provide the automated and prospective knowledge discovery that is done by advanced data mining techniques.
11.4
D ATA MINING CONC ONCEPT EPTS S
AND
APPLICATIONS
Data mining is becoming a major tool for analyzing large amount of data, usually in a data warehouse, (Nemati and Barko., 2002) as well for analyzing Web data. Data mining derives its name from the similarities between searching for valuable business information in a large database, and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material or intelligently probing it to find exactly where the value resides (see the Harrah’s case at the start of the chapter). In some cases the data are consolidated in a data warehouse and data marts (e.g., see Chopoorian et al., 2001). In others they are kept on the Internet and intranet servers.
Capabilities of Da Data ta Mi Mini ning ng
Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities: Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive hands-on analysis can now be answered directly and quickly from the data. A typical example of a predictive problem is targeted marketing . Data mining can use data on past promotional mailings to identify the targets most likely to respond favorably to future mailings. Other predictive examples include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events. ● Automated discovery of previously unknown patterns. Data mining tools identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together, such as baby diapers and beer. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying invalid (anomalous) data that may represent data entry keying errors. ●
When data mining tools are implemented on high-performance parallelprocessing systems, they can analyze massive databases in minutes. Often, these databases will contain data stored for several years. Faster processing means that users can experiment with more models to understand complex data. High speed makes it practical for users to analyze huge quantities of data. Larger databases, in turn, yield improved predictions (see Winter, 2001, and Hirji, 2001). Data mining mining also can be conducted conducted by nonprogrammer nonprogrammers. s. The “miner” “miner” is often an end user, user, empowered by “data drills” drills” and other power query query tools to ask ad-hoc questions and get answers quickly, with little or no programming skill. Data mining tools can be combined with spreadsheets and other end-user software development tools, making it relatively easy to analyze and process the mined data. Data mining appears under different names, such as knowledge
11.4
DATA MINING CONCEPTS AND APPLICATIONS
511
A CLOSER LOOK 11.2 DA DATA TA MINING TECHNIQUES AND INFORMATION TYPES he most commonly used techniques for data mining are the following.
algorithms that sort through large data sets and express statistical rules among items. (See Moad, 1998, for details.)
T ●
Case-based reasoning. The case-based reasoning approach uses historical cases to recognize patterns (see Chapter 12). For example, customers of Cognitive Systems, Inc., utilize such an approach for helpdesk applications. One company has a 50,000-query case library. New cases are matched quickly against the 50,000 samples in the library, providing more than 90 percent accurate and automatic answers to queries. computing. Neural computing is a machine learning approach by which historical data can be examined for pattern recognition. These patterns can then be used for making predictions and for decision support (details are given in Chapter 12). Users equipped with neural computing tools can go through huge databases and, for example, identify potential customers of a new product or companies whose profiles suggest that they are heading for bankruptcy. Most practical applications are in financial services, in marketing, and in manufacturing.
●
The most common information types are: ●
Classification. Implies the defining characteristics of a certain group (e.g., customers who have been lost to competitors).
●
Clustering. Identifies groups of items that share a particular characteristic. Clustering differs from classification in that no predefining characteristic is given.
● Neural
●
●
Intelligent agents. One of the most promising approaches to retrieving information from the Internet or from intranet-based databases is the use of intelligent agents. As vast amounts of information become available through the Internet, finding the right information is more difficult. This topic is further discussed in Chapters Chapters 5 and 12. 12. Association analysis. Association analysis is a relatively new approach that uses a specialized set of
Other tools. Several other tools can be used. These include decision trees, genetic algorithms, nearestneighbor method, and rule induction. For details, see Inmon, 2002.
Identifies relationships between events that occur at one time (e.g., the contents of a shopping basket). For an application in law enforcement see Brown and Hagen, 2003.
● Association.
●
Sequencing. Similar to association, except that the relationship exists over a period of time (e.g., repeat visits to a supermarket or use of a financial planning product).
● Forecasting.
Estimates future values based on patterns within large sets of data (e.g., demand forecasting).
There are a large number of commerical products available for conducting data mining (e.g., dbminer.com, dataminer.com, and spss.com). For a directory see kdnuggets.com/ software.
extraction, data dipping, data archeology, data exploration, data pattern processing, cessi ng, data dredging, dredging, and information information harvesting. harvesting. “Striking “Striking it rich” in data mining often involves finding unexpected, valuable results.
The Tools of Da Data ta Mi Mini ning ng
Data mining consists of two steps, building a data cube and using the cube to extract data for the mining functions that the mining tool supports. A data cube is multidimensional as shown in Online File W11.6. Data miners can use several tools and techniques; see a list and definitions in A Closer Look 11.2.
Data Mining Applications
Large number of applications applications exist in data mining both in in business (see Apte et al., 2002) and other fields. According to a GartnerGroup report ( gartnergroup.com), more than half of all the Fortune 1000 companies worldwide are using data mining technology.
512
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
Data mining is used extensively today for many business applications (see Apte et al., 2002), as illustrated by the 12 representative examples that follow. Note that the intent of most of these examples is to identify a business opportunity in order to create a sustainable competitive advantage.
A SAMPLER OF DATA MINING APPLICATIONS.
1. Retailing and sales. Predicting sales; determining correct inventory levels and distribution schedules among outlets and loss prevention. For example, retailers such as AAFES (store in military bases) use data mining to combat fraud done by employees at their 1400 stores, using Fraud Watch solution from a Canadian Company, Triversity (see Amato-McCoy, 2003b). Eddie Bauer (see Online File W11.7 ) uses data mining for serveral applications. 2. Banking. Forecasting levels of bad loans and fraudulent credit card use, credit card spending by new customers, and which kinds of customers will best respond to (and qualify for) new loan offers. 3. Manufacturing and production. Predicting machinery failures; finding key factors that control optimization of manufacturing capacity. 4. Brokerage and securities trading. Predicting when bond prices will change; forecasting the range of stock fluctuations for particular issues and the overall market; determining when to buy or sell stocks. 5. Insurance. Forecasting claim amounts and medical coverage costs; classifying the most important elements that affect medical coverage; predicting which customers will buy new insurance policies. 6. Computer hardware and software. Predicting disk-drive failures; forecasting how long it will take to create new chips; predicting potential security violations. 7. Policework. Tracking crime patterns, locations, and criminal behavior; identifying attributes to assist in solving criminal cases. 8. Government and defense. Forecasting the cost of moving military equipment; testing strategies for potential military engagements; predicting resource consumption. 9. Airlines. Capturing data on where customers are flying and the ultimate destination of passengers who change carriers in hub cities; thus, airlines can identify popular locations that they do not service and check the feasibility of adding routes to capture lost business. 10. Health care. Correlating demographics of patients with critical illnesses; developing better insights on symptoms and their causes and how to provide proper treatments. 11. Broadcasting. Predicting what is best to air during prime time and how to maximize returns by interjecting advertisements. 12. Marketing. Classifying customer demographics that can be used to predict which customers will respond to a mailing or buy a particular product (as illustrated in Section 11.6).
Text Mining and Web Mining
Text mining is the application of data mining to nonstructured or less-structured text files (see Berry, 2002). Data mining takes advantage of the infrastructure of stored data to extract predictive information. For example, by mining a customer database, an analyst might discover that everyone who TEXT MINING.
11.4
DATA MINING CONCEPTS AND APPLICATIONS
513
buys product A also buys products B and C, but does so six months later. Text mining, however, operates with less structured information. Documents rarely have strong internal infrastructure, and when they do, it is frequently focused on document format rather than document content. Text mining helps organizations to do the following: following: (1) find the “hidden” content of documents, including additional useful relationship and (2) group documents by common themes (e.g., identify all the customers of an insurance firm who have similar complaints). Web mining is the application of data mining techniques to discover actionable and meaningful patterns, profiles, and trends from Web resources (see Linoff and Berry, 2002). The term Web mining is used to refer to both Web-content mining and Web-usage mining. Web-content mining is the process of information of analyzing Web access logs and other information connected to user browsing and access patterns on one or more Web localities. Web mining is used in the following areas: information filtering (e-mails, magazimes, and newspaper); surveillance (of competitors, patents, technological development); mining of Web-access logs for analyzing usage (clickstream analysis); assisted browsing, and services that fight crime on the Internet. In e-commerce, Web content mining is especially critical, due to the large number of visitors to e-commerce sites, about 2.5 billion during the Christmas 2002 season (Weiss, 2003). For example, when you look for a certain book on Amazon.com, the site will also provide you with a lot of books purchased by the customers who have purchased the specific book you are looking for. By providing such mined information, the Amazon.com site minimizes the need for additional search by providing customers with valuable service. According to Etzioni (1996), Web mining can perform the following functions: WEB MINING.
Resource discovery: locating unfamiliar documents and services on the Web. ● Information extraction: automatically extracting specific information form newly discovered Web resources. ● Generalization: uncovering general patterns at individual Web sites and across multiple sites. Miner3D (miner3d.com) is a suite of visual data analysis tools including a Web-mining tool that displays hundreds and even thousands of search hits on a single screen. The actual search for Web pages is performed through any major search engine, and this add on tool presents the resulting search in the form of a 3-D graphic instead of displaying links to the first few pages. For details on a number of Web-mining products see Kdnuggets.com/ software/web.html . Also see spss.com, and bayesia.com (free downloads). ●
Failures of Data Warehouses and Data Mining
Since their early inceptions, data warehouses and mining have produced many success stories. However, there have also been many failures. Carbone (1999) defined levels of data warehouse failures as follows: (1) warehouse does not meet the expectations of those involved; (2) warehouse was completed, but went severely over budget in relation to time, money, or both; (3) warehouse failed one or more times but eventually was completed; and (4) warehouse failed with no effort to revive it. Carbone provided examples and identified a number of reasons for failures (which are typical for many other large information systems): These are summarized in Table 11.4. Suggestions on how to avoid data warehouse failure are
CHAPTER 11
514
TABLE 11.4 ●
● ● ● ● ●
● ●
● ● ●
●
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
The Reasons Data Warehouses and Mining Fail
Unrealistic expectations—overly optimistic time schedule or underestimation of cost. Inappropriate architecture. Vendors overselling capabilities of products. Lack of training and support for users. Omitted information. Lack of coordination (or requires too much coordination). Cultural issues were ignored. Using the warehouse only for operational, not informational, purposes. Not enough summarization of data. Poor upkeep of technology. Improperly managing multiple users with various needs. Failure to align data marts and data warehouses.
●
● ● ● ●
●
● ●
●
●
Unclear business objectives; not knowing the information requirements. Lack of effective project sponsorship. Lack of data quality. Lack of user input. Using data marts instead of data warehouses (and vice versa). Inexperienced/untrained/inadequate number of personnel. Interfering corporate politics. Insecure access to data manipulation (users should not have the ability to change any data). Inappropriate format of information—not a single, standard format. Poor upkeep of information (e.g., failure to keep information current).
Source: Compiled from Carbone (1999).
provided at datawarehouse.com, at bitpipe.com, and at teradatauniversotynetwork. com . Suggestions how to properly implement data mining are provided by Hirji (2001). 11.5
D ATA VISUALIZATION TECHNOLOGIES Once data have been processed, they can be presented to users as text, graphics, tables, and so on, via several data visualization technologies. A variety of methods and software packages are available to do visualization for supporting decision making (e.g., see I/S Analyzer, 2002 and Li, 2001).
Data Visualization
Visual technologies make pictures worth a thousand numbers and make IT applications more attractive and understandable to users. Data visualization refers to presentation of data by technologies such as digital images, geographical information systems, graphical user interfaces, multidimensional tables and graphs, virtual reality, three-dimensional presentations, vedios and animation. Visualization is becoming more and more popular on the Web not only for entertainment, but also for decision support (see spss.com, microstrategy.com). Visualization software packages offer users capabilities for self-guided exploration and visual analysis of large amounts of data. By using visual analysis technologies, people may spot problems that have existed for years, undetected by standard analysis methods. Data visualization can be supported in a dynamic way (e.g., by video clips). It can also be done in real time (e.g., Bates, 2003). Visualization technologies can also be integrated among themselves to create a variety of presentations, as demonstrated by the IT At Work 11.1. Data visualization is easier to implement when the necessary data are in a data warehouse. Our discussion here will focused mainly on the data visualization techniques of multidimensionality multidimensionality,, geographical information systems, visual interactive modeling, and virtual reality. Related topics, such as multimedia (see informatica.com) and hypermedia, are presented in Technology Guide 2.
11.5
IT At At
DATA VISUALIZATION TECHNOLOGIES
Work 11.1
DATA VISUALIZATION HELPS HAWORTH TO COMPETE anufacturing office furniture is an extremely competitive business. Haworth Corporation ( haworth.com ) operates in this environment and has been able to survive and even excel with the help of IT. To compete, Haworth allows its customers to customize what they want to buy (see demo at haworth.com). It may surprise you to learn that an office chair can be assembled in 200 different ways. The customization of all Haworth’s products resulted in 21 million potential product combinations, confusing customers who were not able to visualize, until the item was delivered, how the customized furniture would look. The solution was computer visualization software that allowed sales representatives with laptop computers to show customers exactly what they were ordering. Thus, the huge parts catalogs became more easily understood, and sales representatives were able to configure different options by entering the corporate database, showing what a product would look like, and computing its price. The customers can now make changes until the furniture design meets their needs. The salesperson can do all this from the customer’s office by connecting to the corporate
M
Multidimensionality Visualization
515
POM
intranet via the Internet and using Web tools to allow customers to make the desired changes. As of 2000 the company operates, on the Web, a design studio in which customers can interact with the designers. These programs allow the company to reduce cycle time. After the last computer-assisted design (CAD) mockup of an order has been approved, the CAD software is used to create a bill of materials that goes to Haworth’s factory for manufacturing. This reduces the time spent between sales reps and CAD operators, increasing the time available for sales reps to make more sales calls and increasing customer satisfaction with quicker delivery. By using this visualization computer program, Haworth has increased its competitive advantage. Sources: Compiled from Infoworld , 1997, p. 92 and from haworth.com (2003).
For Further Exploration: How can the intranet be used to improve the process? How does this topic relate to car customization at jaguar.com? How can this case be related to wireless?
Modern data and information may have several dimensions. For example, management may be interested in examining sales figures in a certain city by product, by time period, by salesperson, and by store (i.e., in five dimensions). The common tool for such sitations is OLAP, and it often includes a visual presentation. The more dimensions involved, the more difficult it is to present multidimensional information in one table or in one graph. Therefore, it is important to provide the user with a technology that allows him or her to add, replace, or change dimensions quickly and easily in a table and/or graphical presentation. Such changes are known as “slicing and dicing” dicing” of data. The technology of slicing, dicing, and similar manipulations is called OLAP multidimensionality, and it is available in most business intelligence packages (e.g., brio.com) Figure 11.5 shows three views of the same data, organized in different ways, using multidimensional software, usually available with spreadsheets. Part a shows travel hours of a company’s employees by means of transportation and by country. country. The “next year” year” column gives projections projections automatically automatically generated by an embedded formula. In part b the data are reorganized, and in part c they are reorganized again and manipulated as well. All this is easily done by the end user with one or two clicks of the mouse. The major advantage of multidimensionality is that data can be organized the way managers like to see them rather than the way that the system analysts do. Furthermore, different presentations of the same data can be arranged and rearranged easily and quickly.
CHAPTER 11
516
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
Travel Hours
Canada Japan France Germany Country
Planes T hi hi s Ye ar ar N ex ex t Ye ar ar 74 0 88 8 43 0 51 6 32 0 38 4 42 5 51 0
Trains T hi hi s Ye ar ar N ex ex t Ye ar ar 14 0 16 8 29 0 34 8 46 0 55 2 43 0 51 6
Au t o m o b i l e s T hi hi s Ye ar ar N ex ex t Ye a r 64 0 76 8 15 0 18 0 21 0 25 2 32 5 39 0
(a )
Canada Japan
Planes
This Th is Yea earr 74 0 43 0
France Germany
Canada Japan
Trains
France Germany
Automobiles
Travel
Canada Japan France Germany
Hours Nextt Year Nex 88 8 51 6
32 0 425
38 4 51 0
14 0 29 0 46 0
16 8 34 8
430
55 2 51 6
64 0 15 0
76 8 18 0
21 0 32 5
25 2 39 0
Country
(b )
Worksheet1-View1-TUTORIAL
Canada Japan Planes Europe
France Germany Total
Canada Japan Trains
Automobiles
Europe Canada Japan
Europe Travel
France Germany Total
France Germany Total
This Th is Yea earr 74 0 43 0 32 0 42 5 74 5 14 0 29 0 46 0 43 0 89 0 64 0 15 0 21 0 32 5 53 5
Country Next Year = (This Year)*1.2 Year)*1.2
Shows that formula 2 calculates all Total cells.
(c )
Multidimensionality onality views. FIGURE 11.5 Multidimensi
Hours Nextt Year Nex 88 8 51 6 38 4 51 0 89 4 16 8 34 8 55 2 51 6 1068 76 8 18 0 25 2 39 0 64 2
• The software adds a
Total Item
• The software adds formula 2 and calculates Total
• Auto-making shades the formulas using two shades of gray
Shows how formula 1 calculates cells (in this case, the cells Total:Next Year )
11.5
DATA VISUALIZATION TECHNOLOGIES
517
Three factors are considered in multidimensionality: dimensions, measures, and time. 1. Examples of dimensions: products, salespeople, market segments, business units, geographical locations, distribution channels, countries, industries 2. Examples of measures: money, sales volume, head count, inventory profit, actual versus forecasted results 3. Examples of time: daily, weekly, monthly, quarterly, yearly For example, a manager may want to know the sales of product M in a certain geographical area, by a specific salesperson, during a specified month, in terms of units. Although the answer can be provided regardless of the database structure, it can be provided much faster, and by the user himself or herself, if the data are organized in multidimensional databases (or data marts), or if the query tools are designed for multidimensionality (e.g., via OLAP). In either case, users can navigate through the many dimensions and levels of data via tables or graphs and then conduct a quick analysis to find significant deviations or important trends. Multidimensionality is available with different degrees of sophistication and is especially popular in business intellignece software (e.g., see Campbell, 2001). There are several types of software from which multidimensional systems can be constructed, and they often work in conjunction with OLAP tools.
Geographical Information Systems
A geographical information system (GIS) is a computer-based system for capturing, storing, checking, integrating, manipulating, and displaying data using digitized maps. Its most distinguishing characteristic is that every record or digital object has an identified geographical location. By integrating maps with spatially oriented databases and other databases (called geocoding), users can generate information for planning, problem solving, and decision making, increasing their productivity and the quality of their decisions. GIS software varies in its capabilities, from simple computerized mapping systems to enterprisewide tools for decision support data analysis (see Minicase 2). As a high-quality graphics display and high computation and search speeds are necessary, most early GIS implementations were developed for mainframes. Initially, Initially, the high cost of GISs prevented their use outside experimental facilities and government agencies. Since the 1990s, however, the cost of GIS software and its required hardware has dropped dramatically. Now relatively inexpensive, fully functional PC-based packages are readily available. Representative GIS software vendors are ESRI, Intergraph, and Mapinfo. GIS SOFTWARE.
GIS data are available from a wide variety of sources. Government sources (via the Internet and CD-ROM) provide some data, while vendors provide diversified commercial data as well Some are Ffee (see CD-ROMs from MapInfo, and downable materal from esri.com and gisdatadepot.com.) The field of GIS can be divided into two major categories: functions and applications. There are four major functions: design and planning, decising modeling, database management, and spatial imaging. These functions support six areas of applications as shown in Figure 11.6. Note that the functions (shown as pillars) can support all the applications. The applications they support the most are shown closest to each pillar. GIS DATA.
518
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
GIS Applications
Design and Engineering
Design and Planning
Strategic Planning and Decision Making
Transportation and Logistics
Decision Modeling
Demogrpahics and Market Analysis
Facilities Management
Database Management Function
Surveying and Mapping
Spatial Imaging
FIGURE 11.6 GIS functions and applications.
GISs provide a large amount of extremely useful information that can be analyzed and utilized in decision making. Its graphical format makes it easy for managers to visualize the data. For example, as Janet M. Hamilton, market research administrator for Dow Elanco, a $2 billion maker of agricultural chemicals based in Indianapolis, Indiana, explains, “I can put 80page spreadsheets with thousands of rows into a single map. It would take a couple of weeks to comprehend all of the information from the spreadsheet, but in a map, the story story can be told in seconds” seconds” (Hamilton, 1996, p. 21). There are countless applications of GISs to improve decision making in the public or private sector (see Nasirin and Birks, 2003). They include the dispatch of emergency vehicles, transit management (see Minicase 2), facility site selection, and wildlife management. GISs are extremely popular in local governments, where the tools are used not only for mapping but for many decisionmaking applications (see O’Looney, 2000). States, cities and counties are using a GIS application related to property assessment, mapping, and flood control. (e.g., Hardester, 2002). Banks have been using GIS for over a decade to support expansion and marketing decision making (See Online File W11.8.) For many companies, the intelligent organization of data within a GIS can provide a framework to support the process of decision making and of designing alternative strategies especially when location decisions are involved (Church, 2002).. Some examples of 2002) of successful successful GIS applications applications are provided provided by Korte (2000) and Hamilton (1996). Other examples of successful GIS applications are summarized in A Closer Look 11.3. GIS AND DECISION MAKING.
Most major GIS software vendors are providing Web access, such as embedded browsers, or a Web/Internet/intranet server that hooks directly into their software. Thus, users can access dynamic maps and data via the Internet or a corporate intranet. A number of firms are deploying GISs on the Internet for internal use or for use by their customers. For example, Visa Plus, which operates a network of automated teller machines, has developed a GIS application that lets Internet users call up a locator map for any of the company’ company’ss 300,000 ATM machines worldwide. A common application on the Internet is a store locator. Not only do you get an address near you, but you also are told how to get there in the GIS AND THE INTERNET OR INTRANET.
11.5
DATA VISUALIZATION TECHNOLOGIES
519
A CLOSER LOOK 11.3 GIS SAMPLE APPLICATIONS COMPANY
APPLICA PPLICATION TION
Pepsi Cola Inc., Super Value, Acordia Inc.
Used in site selection for new Taco Bell and Pizza Hut restaurants; combining demographic data and traffic patterns used in deciding on new facilities and on marketing strategy decision
CIGNA (health insurance)
Uses GIS to answer such questions as “How many CIGNAaffiliated physicians are available within an 8-mile radius of a business?”
Western Auto (a subsidiary of Sears)
Integrates data with GIS to create a detailed demographic profile of a store’s neighborhood to determine the best product mix to offer at the store. (Narisin and Birks, 2003)
Sears, Roebuck & Co.
Uses GIS to support planning of truck routes.
Health maintenance organizations
Using GIS to support drought risk manmagement (Goddard et al., 2003)
Wood Personnel Services (employment agencies)
Tracks cancer rate and other diseases to determine expansion strategy and allocation of expensive equipment in their facilities.
Wilkening & Co. (consulting services)
Maps neighborhoods where temporary workers live; for locating marketing and recruiting cities.
CellularOne Corp.
Map Real estate for tax assessments, property appraisals, flood control, land surveys and related applications (see Hardester, 2002)
Sun Microsystems
Designs optimal sales territories and routes for their clients, reducing travel costs by 15 percent.
Consolidated Rail Corp.
Maps its entire cellular network to identify clusters of call disconnects and to dispatch technicians accordingly.
Federal Emergency Management Agency
Making decisions regarding location of facilities (Church, 2002)
Toyota (other car manufacturers)
Manages leased property in dozens of places worldwide. Monitors the condition of 20,000 miles of railroad track and thousands of parcels of adjoining land. Assesses the damage of hurricanes, floods, and other natural disasters by relating videotapes of the damage to digitized maps of properties. Combines GIS and GPS as a navigation tool. Drivers are directed to destinations in the best possible way.
OF
GIS
FOR
DECISION SUPPORT
shortest way (e.g., try frys.com). As GIS Web server software is deployed by vendors, more applications will be developed. Maps, GIS data, and information about GISs are available over the Web through a number of vendors and public agencies. (For design issues see Sikder and Gangopadhyay, 2002.) The integration of GISs and global positioning systems (GPSs) has the potential to help restructure and redesign the aviation, EMERGING GIS APPLICATIONS.
520
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
transportation, and shipping industries. It enables vehicles or aircraft equipped with a GPS receiver to pinpoint their location as they move (Steede-Terry, 2000). Emerging applications of GPSs include personal automobile mapping systems, vehicle tracking (Terry and Kolb, 2003), and earth-moving equipment tracking. The price of these applications is dropping with improvements in hardware, increased demand, and the availability of more competing vendors. (A simple GPS cost less than $50 in 2003.) GPSs have also become a major source of new GIS data (see Group Assignment 1). Some researchers have developed intelligent GISs that link a GIS to an expert system or to intelligent agents (Tang et al., 2001). L-Commerce. In Chapter Chapter 6 we introduced introduced the concept concept of locationlocation-based based commerce (l-commerce), a major part of mobile-commerce (m-commerce). In l-commerce, advertising is targeted to an individual whose location is known (via a GPS and GIS combination). Similarly, emergency medical systems identify the location of a car accident in a second, and the attached GIS helps in directing ambulances to the scene. For other interesting applications, see Sadeh (2002). Improvements in the GIS user interface have substantially altered altere d the GIS GIS “look” “look” and “feel.” “feel.” Advan Advanced ced visualiza visualization tion (three(three-dimen dimensiona sionall graphics) is increasingly integrated with GIS capabilities, especially in animated and interactive maps. GISs can provide information for virtual reality engines, and they can display complex information to decision makers. Multimedia and hypermedia play a growing role in GISs, especially in help and training systems. Object linking and embedding is allowing users to import maps into any document. More GISs will be deployed to provide data and access data over the Web and organizational intranets as “Web-ready” “Web-ready” GIS software becomes more more affordable. See Korte (2000) for an overview of GISs, their many capabilities, and potential advances. CONCLUSIONS.
Visual Interactive Models and Simulation
Visual interactive modeling (VIM) uses computer graphic displays to represent the impact of different management or operational decisions on goals such as profit or market share. VIM differs from regular simulation in that the user can intervene in the decision-making process and see the results of the intervention. A visual model is much more than a communication device, it is an integral part of decision making and problem solving. A VIM can be used both for supporting decisions and for training. It can represent a static or a dynamic system. Static models display a visual image of the result of one decision alternative at a time. (With computer windows, several results can be compared on one screen.) Dynamic models use animation or video clips to show systems that evolve over time. These are also used in realtime simulations. VIM has been used with DSSs in several operations management decision processes (see Beroggi, 2001). The method loads the current status of a plant (or a business process) into a virtual interactive model. The model is then run rapidly on a computer, allowing management to observe how a plant is likely to operate in the future. One of the most developed areas in VIM is visual interactive simulation (VIS), a method in which the end user watches the progress of the simulation
11.5
IT At At
DATA VISUALIZATION TECHNOLOGIES
Work 11.2
COMPUTER TRAINING IN COMPLEX LOGGING MACHINES AT PARTEK FOREST he foresting industry is extremely competitive, and countries with high labor costs must automate tasks such as moving, cutting, delimbing, and piling logs. A new machine, called called the “Harvester “Harvester,” ,” can replace 25 lumberlumber jacks. The Harvester is a highly complex machine that takes six months to learn how to operate. The trainee t rainee destroys a sizeable amount of forest in the process, and terrain suitable to practice on is decreasing. In unskilled hands, this expensive machine can also be damaged. Therefore, extensive and expensive training is needed. Sisu Logging of Finland (a subsidiary of Parter Forest) found a solution to the training problem by using a real-time simulation ( partekforest.com ). In this simulation, the chassis, suspension, and wheels of the vehicles have to be modeled, together with the forces acting on them (inertia and friction), and the movement equations linked with them have to be solved in real time. This type of simulation is mathematically complex, and until recently it required equipment investment running into millions of dollars. However, with the help of a visual simulations program, simulation training can now be carried out for only 1 percent of the cost of the traditional method.
T
521
HRM
Inside the simulator are the Harvester’s actual controls, which are used to control a virtual model of a Harvester plowing its way through a virtual forest. The machine sways back and forth on uneven terrain, and the grapple of the Harvester grips the trunk of a tree, fells it, delimbs it, and cuts it into pieces very realistically in real time. The simulated picture is very sharp: even the structures of the bark and annual growth rings are clearly visible where the tree has been cut. In traditional simulators, the traveling path is quite limited beforehand, but in this Harvester simulator, you are free to move in a stretch of forest covering two hectares (25,000 square yards). In addition, the system can be used to simulate different kinds of forest in different parts of the world, together with the different tree species and climatic conditions. An additional advantage of this simulator is that the operations can be videotaped so that training sessions can be studied afterward. Moreover, it is possible to practice certain dangerous situations that cannot be done using a real machine. Source: Condensed from Finnish Business Report, April 1997
For Further Exploration: Why is the simulated training time shorter? Why is visualization beneficial?
model in an animated form using graphics terminals. The user may interact with the simulation and try different decision strategies. (See Pritsker and O’Reilly, 1999.) VIS is an approach that has, at its core, the ability to allow decision makers to learn about their own subjective values and about their mistakes. Therefore, VIS can be used for training as well as in games, as in the case of flight At Work 11. 11.2. 2. simulators, and as shown in IT At Animation systems that produce realistic graphics are available from many simulation software vendors (e.g., see SAS.com and vissim.com). The latest visual simulation technology is tied in with the concept of virtual reality, where an artificial world is created for a number of purposes—from training to entertainment to viewing data in an artificial landscape.
Virtual Reality
There is no standard definition of virtual reality. The most common defvirtu tual al realit reality y (V (VR) R) is interactive, computerinitions usually imply that vir generated, three-dimensional graphics delivered to the user through a headmounted display. Defined technically, virtual reality is an environment and/or technology that provides artificially generated sensory cues sufficient to engender in the user some willing suspension of disbelief. So in VR, a person “believes” that what he or she is doing is real real even though it is artificially artificially created.
522
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
More than one person and even a large group can share and interact in the same artificial environment. VR thus can be a powerful medium for communication, entertainment, and learning. Instead of looking at a flat computer screen, the VR user interacts with a three-dimensional computer-generated environment. To see and hear the environment, the user wears stereo goggles and a headset. To interact with the environment, control objects in it, or move around within it, the user wears a computerized display and hand position sensors (gloves). Virtual reality displays achieve the illusion of a surrounding medium by updating the display in real time. The user can grasp and move virtual objects. Most VR applications to date have been used to support decision making indirectly. For example, Boeing has developed a virtual aircraft mockup to test designs. Several other VR applications for assisting in manufacturing and for converting military technology to civilian technology are being utilized at Boeing. At Volvo, VR is used to test virtual cars in virtual accidents; Volvo also uses VR in its new model-designing process. British Airways offers the pleasure of experiencing first-class flying to its Web site visitors. For a comprehensive discussion of virtual reality in manufacturing, see Banerjee and Zetu (2001). Another VR application area is data visualization. VR helps financial decision makers make better sense of data by using visual, spatial, and aural immersion virtual systems. For example, some stock brokerages have a VR application in which users surf over a landscape of stock futures, with color, hue, and intensity indicating deviations from current share prices. Sound is used to convey other information, such as current trends or the debt/equity ratio. VR allows side-by-side comparsions with a large assortment of financial data. It is easier to make intuitive connections with three-dimensional support. Morgan Stanley & Co. uses VR to display the results of risk analyses. VIRTUAL REALITY AND DECISION MAKING.
A platform-independent standard for VR called virtual reality markup language (VRML) ( vrmlsite.com, and Kerlow, 2000) makes navigation through online supermarkets, museums, and stores as easy as interacting with textual information. VRML allows objects to be rendered as an Internet user “walks” through a virtual room. room. At the moment, moment, users can utilize regular browsers, but VRML browsers will soon be in wide circulation. Extensive use is expected in e-commerce marketing (see Dalgleish, 2000). For example, Tower Records offers a virtual music store on the Internet where customers can “meet” each other in front of the store, go inside, inside, and preview CDs and videos. They select and purchase their choices electronically and interactively from a sales associate. Applications of virtual reality in other areas are shown in Table 11.5. Virtual supermarkets could spark greater interest in home grocery shopping. In the future, shoppers will enter a virtual supermarket, walk through the virtual aisles, select virtual products and put them in their virtual carts. This could help remove some of the resistance to virtual shopping. Virtual malls, which can be delivered even on a PC ( synthonics.com), are designed to give the user a feeling of walking into a shopping mall. Virtual reality is just beginning to move into many business applications. An interactive, three-dimensional world on the Internet should prove popular because it is a metaphor to which everyone can relate. VIRTUAL REALITY AND THE WEB.
11.6
TABLE 11.5
11.6
MARKETING DATABASES IN ACTION
523
Examples of Virtual Reality Applications
Applications in Manufacturing
Applications in Business
Training Desi De sign gn tes esti tin ng and inte terrpr pre eta tattio ion n of results Safety analysis Virtual prototyping Engineering analysis Ergonomic analysis Virtual simulation of assembly, production, and maintenance
Real estate presentation and evaluation Adve Ad vert rtis isin ing g
Applications in Medicine
Applications in Research and Education
Train iniing sur surg geon onss (w (wit ith h sim simu ula lato torrs) Interpretation of of me medical da data Planning surgeries Physical therapy
Vir irtu tua al ph phys ysiics lab lab Representation of of co complex ma mathematics Galaxy configurations
Amusement Applications
Applications in Architecture
Virtual museums Three-dimensional race car games (on PCs) Air combat simulation (on PCs) Virtual reality arcades and parks Ski simulator
Design of building and other structures
M ARKETING D AT ATABASES
Presentation in e-commerce Presentation of financial data
ACTION
IN
Data warehouses and data marts serve end users in all functional areas. However, the most dramatic applications of data warehousing and mining are in market keting ing dat dataamarketing, as seen in the Harrah’s case, in what is referred to as mar bases (also referred to as database marketing). In this section we examine how data warehouses, their extensions, and data mining are used, and what role they play in new marketing strategies, such as the use of Web-based marketing transaction databases in interactive marketing.
The Marketing Transaction Database
Most current databases are static: They simply gather and store information about customers. They appear in the following categories: operations databases, data warehouses, and marketing databases. Success in marketing today requires a new kind of database, oriented toward targeting the personalizing marketing messages in real time. Such a database provides the most effective means of capturing information on customer preferences and needs. In turn, enterprises can use this knowledge to create new and/or personalized products and services. Such a data base is called a marketing transaction database (MTD). The MTD combines many of the characteristics of the current databases and marketing data sources into a new database that allows marketers to engage in real-time personalization and target every interaction with customers. The MTD provides dynamic, or interactive, functions not available with traditional types of marketing databases. In marketing terms, a transaction occurs with the exchange of information. With interactive media, MTD’S CAPABILITIES.
524
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
each exposure to the customer becomes an opportunity to conduct a marketing “transaction.” “transaction.” Exchanging information information (whether (whether gathered actively through registration or user requests, or passively by monitoring customer behavior) allows marketers to refine their understanding of each customer continuously and to use that information to target him or her specifically with personalized marketing messages. This is done most frequently on the Web. Comparing various characteristics of MTDs with other marketing-related databases shows the advantages of MTDs. For example, marketing databases focus on understanding customers’ behavior, but do not target the individual customer nor personalize the marketing approach, as do MTDs. Additionally, data in MTDs can be updated in real time, as opposed to the periodic (weekly, monthly, or quarterly) updates that are characteristic of data warehouses and marketing databases. Also, the data quality in an MTD is focused, and is verified by the individual customers. It thus is of much higher quality than data in many operations databases, where legacy systems may offer only poor assurance of data quality. Further, MTDs can combine various types of data—behavioral, descriptive, and derivative; other types of marketing databases may offer only one or two of these types. Note that MTDs do not eliminate the need for traditional databases. They complement them by providing additional capabilities (see Online Minicase 1, Dell Computers). Data mining, data warehousing, and MTDs are delivered on the Internet and intranets. The Internet does not simply represent another advertising venue or a different medium for catalog sales. Rather, it contains new attributes that smart marketers can exploit to their fullest degree. Indeed, the Internet promises to revolutionize sales and marketing. Dell Computer (see Online Minicase 1) offers an example of how marketing professionals can use the Internet’s electronic sales and marketing channels for market research, advertising, information dissemination, product management, and product delivery. For an overview of marketing databases and the Web, see Grossnickle and Raskin (2000). THE ROLE OF THE INTERNET.
Implementation Examples
Fewer and fewer companies can afford traditional marketing approaches, which include big-picture strategies and expensive marketing campaigns. Marketing departments are being scaled down, (and so is the traditional marketing approaches) and new approaches such as one-to-one marketing, speed marketing, interactive marketing, and relationship marketing are being employed (see Strauss et al., 2003). The following examples illustrate how companies use data mining and warehousing to support the new marketing approaches. For other examples, see Online File W11.9. ●
●
Through its online registry for expectant parents, Burlington Coat Factory tracks families as they grow. The company then matches direct-mail material to the different stages of a family’s development over time. Burlington also identifies, on a daily basis, top-selling styles and brands. By digging into reams of demographic data, historical buying patterns, and sales trends in existing stores, Burlington determines where to open its next store and what to stock in each store. Au Bon Pain Company, Inc., a Boston-based chain of cafes, discovered that the company was not selling as much cream cheese as planned. When it analyzed point-of-sale data, the firm found that customers preferred small,
11.6
MARKETING DATABASES IN ACTION
525
one-serving packaging (like butter). As soon as the package size of the cream cheese was changed, sales shot up. ● Bank of America gets more than 100,000 telephone calls from customers every day. Analyzing customers’ banking activities, the bank determines what may be of interest to them. So when a customer calls to check on a balance, the bank tries to sell the customer something in which he or she might be interested. ● Supermarket chains regularly analyze reams of cash register data to discover what items customers are typically buying at the same time. These shopping patterns are used for issuing coupons, designing floor layouts and products’ location, and creating shelf displays. ● In its data warehouse, the Chicago Tribune stores information about customer behavior as customers move through the various newspaper Web sites. Data mining helps to analyze volumes of data ranging from what browsers are used to what hyperlinks are clicked on most frequently. The data warehouses in some companies include several terabytes or more of data (e.g., at Sears, see Minicase 1). They need to use supercomputing to sift quickly through the data. Wal-Mart, the world’s largest discount retailer, has a gigantic database, as shown in IT At Work 11.3.
IT At At
Work 11.3
DATA MINING POWERS AT WAL-MART ith more than 50 terabytes of data (in 2003) on two
W NCR (National Cash Register) systems, Wal-Mart
(walmart.com ) manages one of the world’s largest data warehouses. Besides the two NCR Teradata databases, which handle most decision-support applications, WalMart has another 6 terabytes of transaction processing data on IBM and Hitachi mainframes. Wal-Mart’s formula for success—getting the right product on the appropriate shelf at the lowest price— owes much to the company’s multimillion-dollar investment in data warehousing. “Wal-Mart “Wal-Mart can be more detailed than most of its competitors on what’s going on by product, by store, by day—and day— and act on it,” says Richard Winter Winter,, a database consultant in Boston. “That’s a tremendously powerful thing.” The systems house data on point of sale, inventory, products in transit, market statistics, customer demographics, finance, product returns, and supplier performance. The data are used for three broad areas of information discovery and decision support: analyzing trends, managing inventory, and understanding customers. What emerges are “personality “personality traits” for each of Wal-Mart’ Wal-Mart’ss 3,500 or so outlets, which Wal-Mart managers can use to determine product mix and inventory levels for each store.
MKT
Wal-Mart is using a data mining-based demand-forecasting application that employes on neural networking software and runs on a 4,000-processor parallel computer. The application looks at individual items for individual stores to decide the seasonal sales profile of each item. The system keeps a year’s worth of data on the sales of 100,000 products and predicts which items will be needed in each store and when. Wal-Mart is expanding its use of market-basket analysis. Data are collected on items that comprise a shopper’s total purchase so that the company can analyze relationships and patterns in customer purchases. The data warehouse is available over an extranet to store managers and suppliers. In 2003, 6,000 users made over 40,000 database queries each day. “What Wal-Mart is doing is letting an army of people use the database database to make tactical decisions,” decisions,” says consultconsultant Winter. “The cumulative impact is immense.” Sources: This information is courtesy of NCR Corp. (2000) and walmart.com.. walmart.com
For Further Exploration: Since small retailers cannot afford data warehouses and data mining, will they be able to compete?
526
11.7
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
CHAPTER 11
WEB-B ASED D ATA M ANAGEMENT S YSTEMS Data management and business intelligence activities—from data acquisition (e.g., Atzeni et al., 2002), through warehousing, to mining—are often performed with Web tools, or are interrelated with Web technologies and e-business (see Liautaud, 2001). Users with browsers can log onto a system, make inquiries, and get reports in a real-time setting. This is done through intranets, and for outsiders via extranets (see remedy.com ). E-commerce software vendors are providing Web Web tools that connect the data warehouse with EC ordering and cataloging systems. Hitachi’s EC tool suite, Tradelink (at hitachi.com), combines EC activities such as catalog management, payment applications, mass customization, and order management with data warehousesand martsand ERP systems. Oracle (see Winter, 2001) and SAP offer similar products. Data warehousing and decision support vendors are connecting their products with Web technologies and EC. Examples include Brio’s Brio One, Web Intelligence from Business Objects, and Cognos’s DataMerchant. Hyperion’s Appsource “wired for OLAP” OLAP” product connects OLAP OLAP with Web Web tools. IBM’s IBM’s Decision Edge i makes OLAP capabilities available on the intranet from anywhere in the corporation using browsers, search engines, and other Web technologies. MicroStrategy offers DSS Agent and DSS Web for help in drilling down for detailed information, providing graphical views, and pushing information to users’ desktops. Oracle’s Financial Analyzer and Sales Analyzer, Hummingbird’s Bi/Web and Bi/Broker, and several of the products products cited above bring bring interactive querying, reporting, reporting, and other OLAP tasks to many users (both company employees and business partners) via the Web. Also, for a comprehensive discussion of business intelligence on the Web, see the white paper at businessobjects.com. The systems described in the previous sections of this chapter can be integrated on by Web-based platforms, such as the one shown in Figure 11.7. The Web-based system is accessed via a portal, and it connects the following parts:
PORTAL
Y
BI SERVICES Ad Hoc Query Data Mining Analysis Scorcarding Managed Reporting Visualization TI R U C E S
META DATA
BI DATA MART CREATION
FIGURE 11.7 Web-based data management system. ( Source: Source: cognos.com. Platform for Enterprise Business Intelligence. © Cognos Inc. 2001.)
e-Business, ERP, CRM, SCM, Legacy
BI READY DATA INFRA STRUCTURE
E - A P P L C I A T O I N S
11.7
WEB-BASED DATA DATA MANAGEMENT SYSTEMS
527
Warehouse management ERP application S t r u c t u r e d
On-line ETL and transaction data database quality
Analytical application
OLAP Analysis Query and reporting
D a t a
Data mining
Point of sale data
Data U n s t r u c t u r e d
FIGURE 11.8 Sources of content for an enterprise Source: information portal. ( Source: Merrill Lynch, 1998.)
D a t a
Understanding Enterprise Information Portals
Information
Web documents E-mail Product plans Legal contracts Word processing documents Advertisements Audio files Product specifications Purchase orders Paper invoices Video files Research reports Graphics/graphic designs
Content Management Application Content Management Reposition
Database
the business intelligence (BI) services, the data warehouse and marts, the corporate applications, and the data infrastructure. A security system protects the corporate proprietary data. Let’s examine how all of these components work together via the corporate portal.
Enterprise BI suites and Corporate Portals
Enterprise BI suites (EBISs) integrate query, reporting, OLAP, and other tools. They are scalable, and offered by many vendors (e.g., IBM, Oracle, Microsoft, Hyperion Solution, Sagent Technology, AlphaBlox, MicroStrategy and Crystal Decisions). EBISs are offered usually via enterprise portals. In Chapter 4 we introduced the concept of corporate portals as a Web-based Web-based gateway to data, information, and knowledge. As seen in Figure 11.8, the portal integrates data from many sources. It provides end users with a single Web based point of personalized access to BI and other applications. Likewise, it provides IT with a single point of delivery and management of this content. Users are empowered to access, create, and share valuable information.
Intelligent Data Warehouse Webbased Systems
The amount of data in the data warehouse can be very large. While the organization of data is done in a way that permits easy search, it still may be useful to have a search engine for specific applications. Liu (1998) describes how
528
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
an intelligent agent can improve the operation of a data warehouse in the pulp and paper industry. This application supplements the monitoring and scanning of external strategic data. The intelligent agent application can serve both managers’ ad-hoc query/reporting information needs and the external data needs of a strategic management support system for forest companies in Finland.
Clickstream Data Warehouse
➥
Large and ever increasing amounts of B2C data about consumers, products, etc. can be collected. Such data come from several sources: internal data (e.g., sales data, payroll data etc.), external data (e.g., government and industry reports), and clickstream data. Clickstream data are those that occur inside the Web environment, when customers visit a Web site. They provide a trail of the users’ activities in the Web site, including user behavior and browsing patterns. By looking at clickstream data, an e-tailer can find out such things as which promotions are effective and which population segments are interested in specific products. According to Inmon (2001), clickstream data can reveal information to answer questions such as the following: What goods has the customer looked at or purchased? What items did the customer buy in conjunction with other items? What ads and promotion were effective? Which were ineffective? Are certain products too hard to find? Are certain products too expensive? Is there a substitute product that the customer find first? The Web is an incredibly rich source of business intelligence, and many enterprises are scrambling to build data warehouses that capture the knowledge contained in the clickstream data from their Web sites. By analyzing the user behavior patterns contained in these clickstream data warehouses, savvy business can expand their markets, improve customer relationships, reduce costs, streamline operations, strengthen their Web sites, and hone their business strategies. One has two options: incorporate Web-based data into preexisting data warehouses, or to build new clickstream data warehouses that are capa ble of showing both e-business activities and the non-Web aspects of the business in an integrated fashion. (see Sweiger at el., 2002).
MANAGERIAL ISSUES 1. Cost-benefit issues and justification. Some of the data management solutions discussed in this chapter are very expensive and are justifiable only in large corporations. Smaller organizations can make the solutions cost effective if they leverage existing databases rather than create new ones. A careful cost benefit analysis must be undertaken before any commitment to the new technologies is made. 2. Where to store data physically. phy sically. Should data be distributed close to theirusers? This could potentially speed up data entry and updating, but adds replication and security risks. Or should data be centralized for easier control, security, and disaster recovery? This has communications and single point of failure risks.
11.7
WEB-BASED DATA MANAGEMENT SYSTEMS
529
3. Legal issues. Data mining may suggest that a company send catalogs or promotions to only one age group or one gender. A man sued Victoria’s Secret Corp. because his female neighbor received a mail order catalog with deeply discounted items and he received only the regular catalog (the discount was actually given for volume purchasing). Settling discrimination charges can b e very expensive. 4. Internal or external? Should a firm invest in internally collecting, storing, maintaining, and purging its own databases of information? Or should it subscribe to external databases, where providers are responsible for all data management and data access? 5. Disaster recovery. Can an organization’s business processes, which have become dependent on databases, recover and sustain operations after a natural or other type of information system disaster? (See Chapter 15.) How can a data warehouse be protected? At what cost? 6. Data security and ethics. Are the company’s competitive data safe from external snooping or sabotage? Are confidential data, such as personnel details, safe from improper or illegal access and alteration? A related question is, Who owns such personal data? (See Smith, 1997.) 7. Ethics: Paying for use of data. Compilers of public-domain information, such as Lexis-Nexis, face a problem of people lifting large sections of their work without first paying royalties. The Collection of Information Antipiracy Act (Bill HR 2652 in the U.S. Congress) will provide greater protection from online piracy. This, and other intellectual property issues, are being debated in Congress and adjudicated in the courts. 8. Privacy. Collecting data in a warehouse and conducting data mining may result in the invasion of individual privacy. What will companies do to protect individuals? What can individuals do to protect their privacy? (See Chapter 16.) 9. The legacy data problem. One very real issue, often known as the legacy data acquisition problem, is what to do with the mass of information already stored in a variety of systems and formats. Data in older, perhaps obsolete, databases still need to be available to newer database management systems. Many of the legacy application programs used to access the older data simply cannot be converted into new computing environments without considerable expense. Basically, there are three approaches to solving this problem. One is to create a database front end that can act as a translator from the old system to the new. The second is to cause applications to be integrated with the new system, so that data can be seamlessly accessed in the original format. The third is to cause the data to migrate into the new system by reformatting it. A new promising approach is the use of Web Services (see Chapter 14). 10. Data delivery. Moving data efficiently around an enterprise is often a ma jor problem. The inability to communicate effectively and efficiently among different groups, in different geographical locations is a serious roadblock to implementing distributed applications properly, especially given the many remote sites and mobility of today ’s workers. Mobile and wireless computing (Chapter 6) are addressing some of these difficulties.
530
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
ON THE WEB SITE… Additional resources, including quizzes; online files of additional text, tables, figures, and cases; and frequently updated Web links to current articles and information can be found on the book’s Web site. (wiley.com/college/turban).
KEY TERMS Analytical processing ••• Business intelligence (BI) •••
Geographical information system (GIS) •••
Clickstream data •••
Knowledge discovery •••
Data integrity •••
Operational data store •••
Data mart •••
Knowledge discovery in databases (KDD) •••
Data mining •••
Marketing database •••
Virtual reality (VR) •••
Data quality (DQ) ••• Data visualization •••
Marketing transaction database (MTD) •••
Data warehouse •••
Metadata •••
Document management •••
Multidimensional database •••
Document management system (DMS) •••
Multidimensionality ••• Multimedia database •••
Object-oriented database ••• Online analytical processing (OLAP) ••• Text mining ••• Virtual reality markup language (VRML) ••• Visual interactive modeling (VIM) ••• Visual interactive simulation (VIS) •••
CHAPTER HIGHLIGHTS (Numbers Refer to Learning Objectives) ³ Data are the foundation of any information system ´ Online analytical processing is a data discovery method
· · · » »
that uses analytical approaches.
and need to be managed throughout their useful life cycle, which convert data to useful information, knowledge and a basis for decision support.
´
Data exist in internal and external sources. Personal data and knowledge are often stored in people’s minds.
Business intelligence is an umbrella name for large number of methods and tools used to conduct data analysis.
²
The Internet is a major source of data and knowledge. Other sources are databases, paper documents, videos, maps, pictures and more.
Data mining for knowledge discovery is an attempt to use intelligent systems to scan volumes of data to locate necessary information and knowledge.
¶
Visualization is important for better understanding of data relationships and compression of information. Several computer-based methods exist.
¶
A geographical information system captures, stores, manipulates, and displays data using digitized maps.
¶
Virtual reality is 3-D, interactive, computer-generated graphics that provides users with a feeling that they are inside a certain environment.
º
Marketing databases provide the technological support for new marketing approaches such as interactive marketing.
Many factors that impact the quality of data must be recognized and controlled. Today data and documents are managed electroni cally. They are digitized, stored and used in electronic management systems. Electronic document management, the automated control of documents, is a key to greater efficiency in handling documents in order to gain an edge on the competition.
»
Multidimensional presentation enables quick and easy multiple viewing of information in accordance with people’s needs.
º
Marketing transaction databases provide dynamic interactive functions that facilitate customized advertisement and services to customers.
¿
Data warehouses and data marts are necessary to support effective information discovery and decision making. Relevant data are indexed and organized for easy access by end users.
¾
Web-based systems are used extensively in supporting data access and data analysis. Also, Web-based systems are an important source of data. Finally, data visualization is frequently combined with Web systems.
EXERCISES
531
QUESTIONS FOR REVIEW 1. List the major sources of data.
10. Define business intelligence.
2. List some of the major data problems.
11. Define online analytical processing (OLAP).
3. What is a terabyte? (Write the number.)
12. Define data mining and describe its major characteristics.
4. Review the steps of the data life cycle and explain them.
13. Define data visualization.
5. List some of the categories of data available on the Internet.
14. Explain the properties of multidimensionality and its vizualization.
6. Define data quality.
15. Describe GIS and its major capabilities.
7. Define document management.
16. Define visual interactive modeling and simulation.
8. Describe a data warehouse.
17. Define a marketing transaction database.
9. Describe a data mart.
18. Define virtual reality.
QUESTIONS FOR DISCUSSION 1. Compare data quality to data integrity. How are they related?
10. Why is the combination of GIS and GPS becoming so popular? Examine some applications.
2. Discuss the relationship between OLAP and multidimentionality.
11. Discuss the advantages of terabyte marketing data bases to a large corporation. Does a small company need a marketing database? Under what circumstances will it make sense to have one?
3. Discuss business intelligence and distinguish between decision support and information and Knowledge discovery. 4. Discuss the factors that make document management so valuable. What capabilities are particularly valuable? 5. Relate document management to imaging systems. 6. Describe the process of information and knowledge discovery, and discuss the roles of the data warehouse, data mining, and OLAP in this process. 7. Discuss the major drivers and benefits of data warehousing to end users. 8. Discuss how a data warehouse can lessen the stov epipe problem. (See Chapter 9.) 9. A data mart can substitute for a data warehouse or supplement it. Compare and discuss these options.
12. Discuss the benefits managers can derive from visual interactive simulation in a manufacturing company. 13. Why is the mass-marketing approach may not be effective? What is the logic of targeted marketing? 14. Distinguish between operational databases, data warehouses, and marketing data marts. 15. Relate the Sears minicase to the phases of the data life cycle. 16. Discuss the potential contribution of virtual reality to e-commerce. 17. Discuss the interaction between marketing and management theories and IT support at Harrah’s case.
EXERCISES 1. Review the list of data management difficulties in Section 11.1. Explain how a combination of data warehousing and data mining can solve or reduce these difficulties. Be specific. 2. Interview a knowledge worker in a company you work for or to which you have access. Find the data problems they have encountered and the measures they have taken to solve them. Relate the problems to Strong’s four categories. 3. Ocean Spray Cranberries is a large cooperative of fruit growers and processors. Ocean Spray needed data to determine the effectiveness of its promotions and its
advertisements and to make itself able to respond strat egically to its competitors’ promotions. The company also wanted to identify trends in consumer preferences for new products and to pinpoint marketing factors that might be causing changes in the selling levels of certain brands and markets. Ocean Spray buys marketing data from InfoScan ( infores.com ), a company that collects data using bar code scanners in a sample of 2,500 stores nationwide and from A. C. Nielsen. The data for each product include sales volume, market share, distribution, price information, and information about promotions (sales, advertisements).
532
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
The amount of data provided to Ocean Spray on a daily basis is overwhelming (about 100 to 1,000 times more data items than Ocean Spray used to collect on its own). All the data are deposited in the corporate marketing data mart. To analyze this vast amount of data, the company developed a DSS. To give end users easy access to the data, the company uses an expert system–based data-mining process called CoverStory, which summarizes information in accordance with user preferences. CoverStory interprets data processed by the DSS, identifies trends, discovers cause and effect relationships, presents hundreds of displays, and provides any information required by the decision
makers. This system alerts managers to key problems and opportunities. a. Find information about Ocean Spray by entering Ocean Spray’s Web site ( oceanspray.com ). b. Ocean Spray has said that it cannot run the business without the system. Why? c. What data from the data mart are used by the DSS? d. Enter infores.com or scanmar.nl and review the marketing decision support information. How is the company related to a data warehouse? e. How does Infoscan collect data? (Check the Data Wrench product.)
GROUP ASSIGNMENTS 1. Several applications now combine GIS and GPS. a. Survey such applications by conducting literature and Internet searches and query GIS vendors. b. Prepare a list of five applications, including at least two in e-commerce (see Chapter 6). c. Describe the benefit of such integration. 2. Prepare a report on the topic of “data management and the intranet.” intranet.” Specific Specifically ally,, pay attention to the role of the data warehouse, the use of browsers for query, and data mining. Also explore the issue of GIS and the Internet. Finally, describe the role of extranets in support of business partner collaboration. Each student will visit one or two vendors’ sites, read the white papers, and examine products (Oracle, Red Bricks, Brio, Siemens Mixdorf IS, NCR, SAS, and Information Advantage). Also, visit the Web site of the Data Warehouse Institute (dw-institute.org ).
(ISD) activities that have supported accounting and finance in the past are shifting to marketing. According to Tucker (1997), some people think that the ISD should report to marketing. Do you agree or disagree? Debate this issue. 4. In 1996, Lexis-Nexis, the online information service, was accused of permitting access to sensitive information on individuals. Using data mining, it is possible not only to capture information that has been buried in distant courthouses, but also to manipulate and crossindex it. This can benefit law enforcement but invade privacy. The company argued that the firm was targeted unfairly, since it provided only basic residential data for lawyers and law enforcement personnel. Should LexisNexis be prohibited from allowing access to such information or not? Debate the issue.
3. Companies invest billions of dollars to support data base marketing. The information systems departments’
INTERNET EXERCISES 1. Conduct a survey on document management tools and applications by visiting dataware.com , documentum.com , and aiim.org/aim/publications . 2. Access the Web sites of one or two of the major data management vendors, such as Oracle, Informix, and Sybase, and trace the capabilities of their latest products, including Web connections. 3. Access the Web sites of one or two of the major data warehouse vendors, such as NCR or SAS; find how their products are related to the Web.
4. Access the Web site of the GartnerGroup ( gartnergroup. com). Examine some of their research notes pertaining to marketing databases, data warehousing, and data management. Prepare a report regarding the state of the art. 5. Explore a Web site for multimedia database applications. Visit such sites as leisureplan.com , illustra.com , or adb.fr . Review some of the demonstrations, and prepare a concluding report. 6. Enter microsoft.com/solutions/BI/customer/biwithinreach_ demo.asp and see how BI is supported by Microsoft’s tools. Write a report.
INTERNET EXERCISES
7. Enter teradatauniversitynetwork.com . Prepare a summary on resources available there. there . Is it valuable to a student? 8. Enter visual mining.com and review the support they provide to business intelligence. Prepare a report. 9. Survey some GIS resources such as geo.ed.ac.uk/home/ hiswww.html and prenhall. prenhall.com/stratgis com/stratgis/sites.html /sites.html . Identify GIS resources related to your industry, and prepare a report on some recent developments or applications. See http://nsdi.usgs.gov/nsdi/pages/what_is_ gis.html . 10. Visit the sites of some GIS vendors (such as mapinfo. com,, esr com esri.c i.com, om, au autod todesk esk.co .com m or bently.com). Join a newsgroup and discuss new applications in marketing, banking, and transportation. Download a demo. What
533
are some of the most important capabilities and new applications? 11. Enter websurvey.com , clearlearning.com , and tucows.com/ webforms , and prepare a report about data collection via the Web. 12. Enter infoscan.com . Find all the services related to dynamic warehouse and explain what it does. 13. Enter ibm.com/software and find their data mining products, such as DB2 Intelligent Miner. Prepare a list of products and their capabilities. Mining 101,” 101,” “T “Text ext 14. Enter megapuker.com, Read “Data Mining Analyst,” and “We “WebAnalyst.” bAnalyst.” Compare the two products. (Look at case studies.)
534
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
Minicase 1 Precision Buying, Merchandising, and Marketing at Sears The Problem. Sears, Roebuck and Company, the largest department store chain and the third-largest retailer in the United States, was caught by surprise in the 1980s as shoppers defected to specialty stores and discount mass merchandisers, causing the firm to lose market share rapidly. In an attempt to change the situation, Sears used several response strategies, ranging from introducing its own specialty stores (such as Sears Hardware) to restructure its mall-based stores. Recently, Sears has moved to selling on the Web. It discontinued its paper catalog. Accomplishing the transformation and restructuring, required the retooling of its information systems. Sears had 18 data centers, one in each of 10 geographical regions as well as one each for marketing, finance, and other departments. The first problem was created when the reorganization effort produced only seven geographical regions. Frequent mismatches between accounting and sales figures and information scattered among numerous data bases forced users to query multiple systems, even when they needed an answer to a simple query. Furthermore, users found that data that were already summarized made it difficult to conduct analysis at the desired level of detail. Finally, errors were virtually inevitable when calculations were based on data from several sources. The Solution. To solve these problems Sears constructed a single sales information data warehouse. This replaced the 18 old databases which were packed with
redundant, conflicting, and sometimes obsolete data. The new data warehouse is a simple repository of relevant decision-making data such as authoritative data for key performance indicators, sales inventories, and profit margins. Sears, known for embracing IT on a dramatic scale, completed the data warehouse and its IT reengineering efforts in under one year—a perfect IT turnaround story. Using an NCR enterprise server, the initial 1.7 terabyte (1.7 trillion bytes) data warehouse is part of a project dubbed the Strategic Performance Reporting System (SPRS). By 2003, the data warehouse had grown to over 70 terabytes. SPRS includes comprehensive sales data; information on inventory in stores, in transit, and at distri bution centers; and cost per item. This has enabled Sears to track sales by individual items(skus) in each of its 1,950 stores (including 810 mall-based stores) in the United States and 1,600 international stores and catalog outlets. Thus, daily margin by item per store can be easily computed, for example. Furthermore, Sears now fine tunes its buying, merchandising, and marketing strategies with previously unattainable precision. SPRS is open to all authorized employees, who now can view each day’s sales from a multidimensional perspective (by region, district, store, product line, and individual item). Users can specify any starting and ending dates for special sales reports, and all data can be accessed via a highly user-friendly graphical interface. Sears managers
MINICASE 1
can now monitor the precise impact of advertising, weather, and other factors on sales of specific items. This means that buyers and other specialists can examine and adjust if needed, inventory quantities, merchandising, and order placement, along with myriad other variables, almost immediately, so they can respond quickly to environmental changes. SPRS users can also group together widely divergent kinds of products, for example, tracking sales of items marked as “gifts under $25.” Advertisin Advertising g staffers can follow so-called so-called “great items,” drawn from vastly different different departments, that are splashed on the covers of promotional circulars. SPRS enables extensive data mining, but only on sku and location related analysis. In 1998 Sears created a large customer database, dubbed LCI (Leveraging Customer Information) which contained customer-related sale information (which was not available on SPRS). The LCI enable hourly records of transactions, for example, guiding hourly promotion (such as 15% discounts for early birds). In the holiday season of 2001 Sears decided to replace its regular 10% discount promotion by offering deep discount during early shopping hours. This new promotion, which was based on SPRS failed, and only when LCI was used the problem was corrected. This motivated Sears to combine LCI and SPRS in a single platform, which enables sophisticated analysis (in 2002). By 2001 Sears also had the following Web initiatives: e-commerce home improvement center, a B2B supply exchange for the retail industry, a toy catalog (wishbook. com), an e-procurement system, and much more. All of these Web-marketing initiatives feed data into the data
535
warehouse, and their planning and control are based on accessing the data in the data warehouse. The Results. The ability to monitor sales by item per store enables Sears to create a sharp local market focus. For example, Sears keeps different shades of paint colors in different cities to meet local demands. Therefore, sales and market share have improved. Also, Web-based data monitoring of sales at LCI helps Sears to plan marketing and Web advertising. At its inception, the data warehouse had been used daily by over 3,000 buyers, replenishers, marketers, strategic planners, logistics and finance analysts, and store managers. By 2003, there were over 5000 users, since users found the system very beneficial. Response time to queries has dropped from days to minutes for typical requests. Overall, the strategic impact of the SPRS-LC data warehouse is that it offers Sears employees a tool for making better decisions, and Sears retailing profits have climbed more than 20 percent annually since SPRS was implemented. Sources: Compiled from Amato-McCoy (2002), Beitler and Leary (1997); and press releases of Sears (2001–2003).
Questions for Minicase 1 1. What were the drivers of SPRS? 2. How did the data wareshouse solve Sear’s problems? 3. Why was it beneficial to integrate the customers’ data base with SPRS?
536
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
Minicase 2 GIS at Dallas Area Rapid Transit Public transportation in Dallas and its neighboring communities is provided by Dallas Area Rapid Transit (DART), which operates buses, vans, and a train system. The service area has grown very fast. By the mid-1980s, mid-1980s , the agency was no longer able to respond properly to customer requests, make rapid changes in scheduling, plan properly, or manage security. The solution to these problems was discovered using GISs. A GIS digitizes maps and maplike information, integrates it with other database information, and uses the combined information for planning, problem solving, and decision making. DART ( dart.org ) maintains a centralized graphical database of every object for which it is responsible. The GIS presentation makes it possible for DART’s managers, consultants, and customers to view and analyze data on digitized maps. Previously, DART created service maps on paper showing bus routes and schedules. The maps were updated and redistributed several times a year, at a high cost. Working Working with paper maps made it difficult to respond quickly and accurately to the nearly 6,000 customer inquiries each day. For example, to answer a question concerning one of the more than 200 bus routes or a specific schedule, it was often necessary to look at several maps and routes. Planning a change was also a time-consuming task. Analysis of the viability of bus route alternatives made it necessary to photocopy maps from map books, overlay tape to show proposed routes, and spend considerable time gathering information on the demographics of the corridors surrounding the proposed routes. The GIS includes attractive and accurate maps that interface with a database containing information about bus schedules, routes, bus stops (in excess of 15,000), traffic surveys, demographics, and addresses on each street in the database. The system allows DART employees to: ●
Respond rapidly to customer inquiries (reducing response time by at least 33 percent).
●
Perform the environmental impact studies required by the city.
●
Track where the buses are at any time using a global positioning system.
●
Improve security on buses.
●
Monitor subcontractors quickly and accurately.
●
Analyze the productivity and use of existing routes.
For instance, a customer wants to know the closest bus stop and the schedule of a certain bus to take her to a certain destination. The GIS automatically generates the answer when the caller says where she is by giving an address, a name of an intersection, or a landmark. The computer can calculate the travel time to the desired destination as well. Analyses that previously took days to complete are now executed in less than an hour. Special maps, which previously took up to a week to produce at a cost of $13,000 to $15,000 each, are produced in 5 minutes at the cost of 3 feet of plotter paper. In the late 1990s, the GIS was combined with a GPS. The GPS tracks the location of the buses and computes the expected arrival time at each bus stop. Many maps are on display at the Web site, including transportation lines and stops superimposed on maps. Sources: Condensed from GIS World, July 1993; updated with information compiled from dart.org (2003).
Questions for Minicase 2 1. Describe the role of data in the DART system. 2. What are the advantages of computerized maps? 3. Comment on the following statement: “Using GIS, users can improve not only the inputting of data but als o their use.” 4. Speculate on the type of information provided by the GPS.
MINICASE 2
537
538
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
Virtual Company Assignment Data Management You were intrigued by the opening story of Harrah’s use of CRM data to improve the customer experience, and it seems to you that there are many parallels between satisfying Harrah’s customers and The Wireless Café’s customers (aside from the slot machines, of course). You have noticed that Jeremy likes to review The Wireless Café’s financial and customer data using Excel charts and graphs, so you think he might be interested in discussing ways to exploit existing The Wireless Café data for improving the customer experience in the diner. 1. As you’ve grown quite fond of Marie-Christine’s cooking, you would like to help her develop even more popular menu hits. The idea of using business intelligence and data mining to better predict customer ordering trends and behaviors strikes you as something that would help. For example, do customers buy more desserts on Friday and Saturday than during the week? Do some research on business intelligence software and
prepare a brief report for Jeremy and Marie-Christine to describe the kinds of business intelligence that could be used to prepare smash-hit menus. 2. The Wireless Café does not have a significant marketing budget; however, some inexpensive Web-based marketing strategies can be effective. Earlier, Barbara expressed interest in creating a CRM system to better know and serve The Wireless Café’s customers. Describe a strategy whereby you could combine CRM data and Web-based marketing to increase the business at The Wireless Café. 3. Business intelligence, data warehousing, data mining, and visualization are activities that could benefit The Wireless Café’s bottom line. Many CIOs these days expect an IT project to pay for itself within 12 months or less. How would you measure the benefits of introducing new data management software to determine which competing projects to implement and whether you could expect a 12-month payback on the software?
REFERENCES Adjeroh, D. A., and K. C. C. Nwosu, “Multimedia “Multimedia Database ManManagement—Requirements and Issues,” IEEE Multimedia, Multimedia , July— September 1997. Alter, S. L., Decision Support Systems, Systems, Reading, MA: Addison Wesley, 1980. Amato-McCoy, D. M., “Sears Combines Retail Reporting And Customer Databases on a Single Platform,” Stores, November 2002. Amato-McCoy, D. M., “Movie Gallery Mines Data to Monitor Associate Activities,” Stores Stores,, May 2003a. Amato-McCoy, D. M., “AAFES Combats Fraud with Exception Reporting Solution,” Stores Stores,, May 2003b. Apte, C., et al., “Business Application of Data Mining ,” ,”CommunicaCommunications of the ACM, August 2002. Asprev, L., and M. M. Middlet Middleton, on, (eds.),Integrative (eds.), Integrative Document and Content Management , Hershey, PA: The Idea Group, 2003. Atzeni, P., et al., “Managing Web-Based Data,” IEEE Internet Com2002. puting,, July– August, 2002. puting Banerjee,, P., Banerjee P., and D. Zetu, Virtual Manufacturing: Virtual Reality and Computer Vision Techniques. New York: Wiley, 2001. Bates, J., “Business in Real Time–Realizing the Vision,”DM Vision,” DM Review , May 2003. Baumer, D., “Innovative Web Use to Learn about Consumer Behavior and Online Privacy,” Communications of the ACM , April, 2003. Becker, S. A., (Ed.) Effective Database for Text and Document Management , Hershey, PA: IRM Press, 2003.
Beitler, S. S., and R. Leary, “Sears’ Epic Transformation: Transformation: Converting from Mainframe Legacy Systems to OLAP,” Journal of Data Warehousing,, April 1997. Warehousing Beroggi, G.E., “Visual Interactive Decision Modeling in Policy Management,” Eurpoean Journal of Operational Research, Research , January 2001. Berry, M., Survey of Text Mining: Clustering, Classification and Retrieval. Berlin: Springer–Verlag, 2002. Best’s Review Magazine, May 1993. BIXL, Business Intelligence for Excel , a white paper, Business Intelligence Technologies, Inc. 2002 (BIXL.com (BIXL.com). ). Brauer, J. R., “Data Quality Is the Cornerstone of Effective Business Intelligence,” DM Review , October 5, 2001. Brown, D. E., and S. Hagen, “Data Association Association Methods with AppliApplications to Law Enforcement,” Decision Support Systems, Systems, March 2003. Campbell, D., “Visualization for the Real World,” DM Review, September 7, 2001. Canada NewsWire, “European Court of Human Rights Saves Time and Money for a News Wire,” April 29, 2003 NAICS#922110. Carbone, P. L., “Data Warehousing: Many of the Common Failures,” Presentation, mitre.org/support/papers/tech…9_00/d-warehoulse_ presentation.htm (May 3, 1999). Church, R. L., “Geographical Information Systems and Location Science,” Computers and Operations Research, May 2002. Chopoorian, J. A., et al., “Mind Your Business by Mining Your Data,” SAM Advanced Management Journal , spring 2001.
REFERENCES
Civic.com/pubs (accessed March 2001).
539
Codd, E. F., et al., “Beyond Decision Support,” Computerworld , July 1993.
Lau, H. C. W., et al., “Development of an Intelligent Data-Mining System for a Dispered Manufacturing Network,” Expert Systems, September 2001.
Cognos.com. “Platform for Enterprise Business Business Intelligence,” Cognos Inc., 2001.
Levinson, M., “Jackpot! Harrah’s Entertainment,” CIO Magazine, Magazine, February 1, 2001.
Cohen, J. B., “‘Virtual Newsstand’ Debuts Online,”Editor Online,” Editor and Publisher , Vol. 129, No. 24, June 15, 1996, pp. 86–87.
Li, T., et al., “Information Visualization for Intelligent DSS,” Knowledge Based Systems, August 2001. Liautaud, B., E-Business Intelligence, Intelligence, New York: McGraw-Hill, 2001.
Cole, B., “Document Management on a Budget,” Network World , Vol. 13, No. 8, September 16, 1996. Creese, G., and A. Veytsel, “Data Quality at a Real-Time Tempo,” Tempo,” InSight , An Aberdeen Group publication, January 9, 2003, aberdeen.com/2001/research/01030003.asp . Dalgleish, J., Customer-Effective Web Sites. Sites . Upper Saddle River, NJ: Pearson Technology Group, 2000. Datamation, May 15, 1990. Datamation, Delcambre, L., et al., “Harvesting Information to Sustain Forests,” Communications of the ACM , January 2003. Dimensional Insight, Business Intelligenc and OLAP Terms: An Online Glossary, dimins.com/Glossory1.html (accessed dimins.com/Glossory1.html (accessed June 15, 2003). Etzioni, O., “The WWW: Quagmire or Gold Mine ,” ,”Communications Communications of the ACM, November 1996. Fayyad, U. M., et al., “The KDD Process for Extracting Useful Knowledge from Volumes of Data,” Communications of the ACM , November 1996. Finnish Business Report, April 1997. Fong, A. C. M., et al., “Data Mining for Decision Support,” IT Pro, March/April, 2002. Technology Guide,” August 1993. Fortune, “1994 Information Technology Fortune, GIS World , July 1993. Goddard, S., et al., “Geospatial Decision Support for Drought Risk Management,” Communications of the ACM , January 2003.
Linoff, G. S., and J. A. Berry, Mining Linoff, Mining the Web: Transforming Customer Data.. New York: Wiley, 2002. Data Liu, S., “Data Warehousing Agent: To Make the Creation and Maintenance of Data Warehouse Easier,” Journal of Data Warehousing,, Spring 1998. ing Loveman, G., “Diamonds in the Data,” Harvard Business Review , May 2003. Markus, M. L., et al., “A Design Theory for Systems that Support Emergent Knowledge Processes,” MIS Processes,” MIS Quarterly, September 2002. Mattison, R., Winning Telco Customers Using Marketing Databases. Norwood, MA: Artech House, 1999. Merrill Lynch, 1998. MIS Quarterly, December 1991. Moad, J., “Mining a New Vein,” PC Week, Week, January 5, 1998. Moss,, L. R., Moss R., and S. S. Atre Atre,, Business Intelligence Road Map, Map , Boston: Addison Weeky, 2003. Nasirin, S., and D. F. F. Birks, “DSS Implementation in the UK Retail Organizations: A GIS Perspective,” Information and Management , March, 2003. NCR Corp., 2000. Nemati, H. R., and C. D. Barko, “Enhancing “Enhancing Enterprise Enterprise Decision Decision through Organizational Data Mining,” Journal of Computer Information Systems, Systems, Summer, 2002.
Grant, G., ed., ERP and Datawarehousing in Organizations: Issues and Challenges, Hershey, PA: IRM Press, 2003. Gray, P., P., and H. J. Wat Watson, son, Decision Support in the Data Warehouse. Warehouse . Upper Saddle River, NJ, Prentice-Hall, 1998.
Nwosu, K. C., et al., “Multimedia Database Systems: A New Frontier,” IEEE Multimedia, Multimedia , July–September 1997. O’Looney, J. A., Beyond Maps: GIS Decision Making in Local Governments. Redlands, CA: ESRI Press, 2000.
Grossnickle, J., and O. Raskin, Grossnickle, Raskin,The The Handbook of Marketing Research. Research . New York: McGraw-Hill, 2000.
Oguz, M. T., “Strategic Intelligence: Business Intelligence in Competitive Strategy,” DM Review , May 31, 2003.
Hamilton, J. M., “A Mapping Feast,” CIO CIO,, March 15, 1996. Hardester, K. P., “Au Enterprise GIS Solution for Integrating GIS and CAMA,” Assessment CAMA,” Assessment Journal , November/December, 2002. Hasan, B., “Assesing Data Authenticity with Benford’s Law,” Information Systems Control Journal , July 2002. Hirji, K. K., “Exploring Data Mining Implementation,”CommunicaImplementation,” Communications of the ACM , July 2001. IBM Systems Journal , Vol. 29, No. 3, 1990. Information Week (May 14, 2001). Infoworld , January 27, 1997.
Park, Y. T., “Strategic Uses of Data Warehouses,” Journal of Data Warehousing,, April 1997. Warehousing Pritsker, Pritske r, A. A. B., and J. J. J. O’Reil O’Reilly, ly, Simulation with Visual SLAM York: Wiley, Wiley, 1999. and Awesim, Awesim, 2nd ed. New York: Radding, A., “Going with GIS,” Bank Management , December 1991; updated February 2003. Ray, N., and S.W. S.W. Tabor, “Cyber-Surveys “Cyber-Surveys Come of Age,” Marketing Age,” Marketing Research,, Spring 2003. Research Redman, T. C., “The Impact of Poor Data Quality on the Typical Enterprise,” Communications of the ACM , February 1998.
Inmon, W. H., Building The Data Warehouse, New York: York: Warehouse, 3rd ed. New Wiley, 2002.
Rudnensteiner, E. A., et al., “Maintaining Data Warehousing over Changing Information Sources,” Communications of the ACM, June 2000.
Inmon, W. H., “Why Clickstream Data Counts,” e-Business Advisor , April 2001.
Sadeh, N., M-Commerce N., M-Commerce.. New York: Wiley, 2002. Sears (2001– 2003) 2003)..
I/S Analyze Analyzer, r, “Visualization “Visualization Software Software Aids in Decision Making,” Sikder, I., and A. Gangopadhyay, “Design and Implementation of I/S Analyzer , July 2002. a Web-Based Collaborative Spatial Decision Support System: Organizational and Managerial Implications,” Information Resources Kimball,, R., and M. Kimball M. Ross, The Data Warehouse Tool Kit , 2nd ed. New New York: Wiley, 2002. Management Journal , Journal , October–December 2002. Smith, H. J., “Who Owns Personal Data?” Beyond Computing, Computing, Kerlow, I. V., The Art of 3D, York: Wiley, Wiley, 2000. 3D, 2nd ed. New York: Korte, G. B., The GIS Book, Albany, NY: Onward Press, November–December 1997. Book, 5th ed. Albany, Southam.com, “Southam on the Web,” Web,” 2001. 2000.
540
CHAPTER 11
DATA MANAGEMENT: WAREHOUSING, ANALYZING, MINING,
Steede-Terry, K., Integrating GIS and GPS. GPS. Redlands, CA: Environmental Systems Research (eSRI.com (eSRI.com), ), 2000. Strauss, J., et al., E-Marketing E-Marketing.. Upper Saddle River, NJ: Prentice Hall, 2003. Strong, D. M., et al., “Data Quality in Context,” Communications of the ACM , May 1997. Sweiger, M., et al., Clickstream Data Warehousing, Warehousing , New York: John Wiley and Sons, 2002. Tang Ta ng C., et al., “An Agent-Based Geographical Geographical Information Information SysSystem,” Knowledge-Ba Knowledge-Based sed Systems, Vol. 14, 2001. Terry, K., and D. Kolb, “Integrated Vehicle Vehicle Routing and Tracking Tracking Using GIS-Based Technology,” Logistics Logistics,, March/April, 2003.
Tucker, M. J., “Poppin’ “Poppin’ Fresh Dough,” (Database Marketing), Marketing), Datamation,, May 1997. Datamation Turban, E., et al., Electronic Commerce: A Manageria Manageriall Perspective. Perspective . 3rd ed., Upper Saddle River, NJ: Prentice-Hall, 2004. usaa.com (2001). Watson, H. J., et al., “The Effects of Technology-Enabled Business Strategy At First American Corporation,” Organizationa Organizationall Dynamics, Winter, 2002. Weiss, T. R., “Online Retail Sales On the Rise,” PC World , January 2003. Winter, R., Large Scale Data Warehousing with Oracle 9i Database, Database , Special Report, Waltham MA: Winter Corp., 2001.