Data Warehousing For Dummies

Data Warehousing For Dummies

by Thomas C. Hammergren
Data Warehousing For Dummies

Data Warehousing For Dummies

by Thomas C. Hammergren

eBook

$24.00 

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
WANT A NOOK?  Explore Now

Related collections and offers

LEND ME® See Details

Overview

Data warehousing is one of the hottest business topics, and there’s more to understanding data warehousing technologies than you might think. Find out the basics of data warehousing and how it facilitates data mining and business intelligence with Data Warehousing For Dummies, 2nd Edition.

Data is probably your company’s most important asset, so your data warehouse should serve your needs. The fully updated Second Edition of Data Warehousing For Dummies helps you understand, develop, implement, and use data warehouses, and offers a sneak peek into their future. You’ll learn to:

  • Analyze top-down and bottom-up data warehouse designs
  • Understand the structure and technologies of data warehouses, operational data stores, and data marts
  • Choose your project team and apply best development practices to your data warehousing projects
  • Implement a data warehouse, step by step, and involve end-users in the process
  • Review and upgrade existing data storage to make it serve your needs
  • Comprehend OLAP, column-wise databases, hardware assisted databases, and middleware
  • Use data mining intelligently and find what you need
  • Make informed choices about consultants and data warehousing products

Data Warehousing For Dummies, 2nd Edition also shows you how to involve users in the testing process and gain valuable feedback, what it takes to successfully manage a data warehouse project, and how to tell if your project is on track. You’ll find it’s the most useful source of data on the topic!


Product Details

ISBN-13: 9780470482926
Publisher: Wiley
Publication date: 04/13/2009
Series: For Dummies Books
Sold by: JOHN WILEY & SONS
Format: eBook
Pages: 398
File size: 3 MB

About the Author

Thomas C. Hammergren has been involved with business intelligence and data warehousing since the 1980s. He has helped such companies as Procter & Gamble, Nike, FirstEnergy, Duke Energy, AT&T, and Equifax build business intelligence and performance management strategies, competencies, and solutions. Alan R. Simon is a data warehousing expert and author of many books on data warehousing.

Read an Excerpt

Chapter 4
Data Marts: Your Retail Data Outlet

In This Chapter

  • Revolutionizing the data warehouse world with data marts
  • Getting past the data mart hype and focusing on business value
  • Looking at different architectures for data marts
  • Determining what should be in your data mart
  • Developing a data mart

About two weeks before writing this chapter, I was on a business trip to Los Angeles and driving down the Ventura freeway on a nice summer day. (I always call it the Ventura highway, after the America song from the 1970s: "Ventura highway, in the sunshine; where the days are longer...." Oops, I got a little carried away there.) An ad was on the radio for a regional (I think) hardware chain based on this premise: "Shop at our stores because we have fewer products than the big warehouse-like competition. It's much easier to get in and out of here more quickly with what you need."

Interestingly, another hardware chain on the east coast ran ads a few years ago with almost the identical theme. This chain used to make fun of the warehouse-size competition by featuring radio ads with helicopter search parties looking for shoppers lost in a distant department and references to shuttle buses having to take shoppers between departments in the warehouse-size stores. The premise was the same: "We have less merchandise than the other guy, so shop with us because it's easier."

That's the idea of the data mart.

The Idea behind Data Marts

Forget all the hype. The idea of a data mart is hardly revolutionary, despite what you may read in the computer trade press and hear at conferences and seminars.

A data mart is simply a scaled-down data warehouse, that's all.

No official, standard definition exists for a data mart, just as no official, standard definition exists for a data warehouse or operational data store or data mining or virtually any other concept in this realm of the technology landscape. Vendors do their best to define data marts in the context of their products; consultants and analysts usually define data marts in a way that's advantageous to their particular offerings and specialties. That's the way this business goes, and there's nothing wrong with it; be prepared, however, to ask the tough questions.

Architectural Approaches to Data Marts

You can take one of three main approaches to creating a data mart:

  • Sourced by a data warehouse (most or all of the data mart's contents come from a data warehouse)

  • Tactical, quickly developed, and created from scratch

  • Developed from scratch with an eye toward eventual integration

Data marts sourced by a data warehouse

Many data warehousing experts would argue (and I'm one of them, in this case) that a true data mart is a "retail outlet" with its contents provided from a data warehouse, as shown in Figure 4-1.

In an environment like the one shown in Figure 4-1, look at the relationship between the data sources, the data warehouse, the data mart, and the user in this way:

  1. The data sources, acting as suppliers of raw materials, send data into the data warehouse.

  2. The data warehouse then serves as a consolidation and distribution center, collecting the raw materials in much the same way as in any data warehouse.

  3. Instead of the user (the consumer) going straight to the data warehouse, though, the data warehouse serves as a wholesaler with the premise of "we sell only to retailers, not directly to the public." In this case, the retailers are the data marts.

  4. The data marts order data from the warehouse and, after stocking the newly acquired information, make it available to consumers (users).

Before moving on to the next approach, though, I want to show you a variation of the sourced-from-the-warehouse model. Sometimes the data warehouse that serves as the source for the data mart does not have all the information the data mart's users need. You can solve this problem in one of two ways:

  • Supplement the missing information directly into the data warehouse before sending the selected contents to the data mart, as shown in Figure 4-2.

  • Don't touch the data warehouse; instead, add the supplemental information to the data mart in addition to what it receives from the data warehouse, as shown in Figure 4-3.

If your data mart needs data that's not in the data warehouse, which of these two approaches should you choose? If your data mart is the only one within your company that needs that additional data (be sure to ask around), leave the warehouse alone and bring the supplemental data directly into your data mart. If other data marts or other projects served by the data warehouse can use the additional information, you should add it to the data warehouse first and then send it, along with the other contents you need, to your data mart.

Tactical, quick-strike data marts

Sometimes you just don't have a data warehouse from which to get data for your data mart, so you have to do it yourself. In many (probably most) of these situations, you create a tactical, quick-strike data mart that is, in effect, a miniature data warehouse. You follow the same methodology and complete the same processes of data extraction, transformation, quality assurance, and loading. The difference is that you're doing it on a smaller scale than with a full-blown data warehouse.

What does a smaller scale mean? As shown in Figure 4-4, data brought into a tactical data mart is often necessary to answer a specific set of business questions within relatively narrow confines. Some examples are a specific region or territory within a company, a subset of a company's overall product line, or some other subsetting model.

So the question must be asked: If you have to start from scratch and don't have a data warehouse to provide data to your data mart, why not build a full-scale data warehouse instead? Here are three reasons to go the data mart route:

  • Speed: A tactical data mart is typically completed in 90 to 120 days rather than the much longer time required for a full-scale data warehouse.

  • Cost: Doing the job faster means that you spend less money; it's that simple.

  • Complexity and risk: When you work with less data and fewer sources over a shorter period, your environment is likely to be significantly less complex -- and have fewer associated risks.

Bottom-up, integration-oriented data marts

What happens, though, if pressing business needs steer you toward a tactical data mart and you have a longer-term vision of its contents being integrated with other data? Have you created an architectural dead end? Will you have to throw away your data mart at some point and start over with a "real" data warehousing effort? Will Batman arrive in time to save Robin from the Riddler? (Sorry about that -- the reruns are getting to me.)

Theoretically, you can design data marts so that they're eventually integrated in a bottom-up manner, by building a data warehousing environment (in contrast to a single, monolithic data warehouse).

Pay close attention to the word theoretically. Bottom-up integration of data marts isn't for the fainthearted. You can do it, but it's more difficult than creating a tactical data mart that will always remain stand-alone. Can this approach be successful? The answer is a definite maybe.

What Should Be in a Data Mart

If a data mart is a smaller-scale version of a data warehouse, the question comes up again: What does "smaller scale" mean in reference to the contents of a data mart?

This section describes some ways you can select subsets of information for a data mart and the circumstances under which you may want to try each approach.

Geography-bounded data

A data mart may contain only the information relevant to a certain geographical area, such as a region or territory within your company. Figure 4-5 illustrates an example of geography-bounded data.

Although the use of a geography-bounded data mart is technically feasible and relatively straightforward, it's often not such a good idea. Why? Because a cross-geography comparison (for example, "How are our Arizona stores doing versus our Pennsylvania stores?") is a natural use of any data warehouse environment. When you create separate data marts for various geographical reasons, you make it much more difficult to make these types of comparisons.

Organization-bounded data

Another approach to deciding what should be in your data mart is to base decisions on what information a specific organization needs when it's the sole (or at least primary) user of the data mart. As shown in Figure 4-6, a bank may create one data mart for consumer checking-account analysis and another data mart for commercial checking accounts.

This approach works well when the overwhelming majority of inquiries and reports are organization-oriented. That is, the commercial checking group has no need whatsoever to analyze consumer checking accounts and vice versa. It pays to dig into the business needs during the scope phase of a data warehousing or data mart project. Outsiders, for example, may think, "Okay, put all checking-account information, both consumer and commercial, into the same environment because, this way, reports can be run comparing average balances and other information for the entire checking-account portfolio at the bank." After additional analysis, though, you may notice that the bank doesn't do this type of comparison, so why not keep the two areas separate and avoid unnecessary complexity?

Function-bounded data

Using another approach, one that crosses organizational boundaries, you establish a data mart's contents based on a specific function (or set of related functions) within the company. A multinational chemical company, for example, may create a data mart exclusively for the sales and marketing functions across all organizations and across all product lines, as shown in Figure 4-7.

Competitor-bounded data

A company may occasionally be so focused on a specific competitor that it may make sense to create a data mart oriented toward that particular competitor. As shown in Figure 4-8, this type of environment may include competitive sales, all available public information about the competitor (particularly if the information is available over the Internet), and industry analysts' reports, for example.

To truly provide the business intelligence that's necessary in a competitor-driven situation, you should construct the data mart to include multimedia information in addition to the traditional data types typically found in a data warehouse. (Chapter 24 describes multimedia data and data warehousing.)

Answers to specific business questions

An organization's operations occasionally are driven by the answers to a selected number (often a handful) of business questions. Based on the answers, a company may speed up or slow down production lines, start up extra shifts to increase production or initiate layoffs, or, possibly, choose to acquire or not acquire other companies.

Business questions with this degree of weighty importance have traditionally caused nightmares for the in-house employees chartered with digging out data and reports, consolidating and checking the information, and reporting the results to executive management. Sounds like a job for a data warehouse, you say? Before constructing a full-scale data warehouse that can answer these (and many other) business questions, however, it's probably worth considering that the answer may be a small-scale data mart designed specifically to answer those high-impact, high-value "How are we doing?" type of questions.

Later, this type of environment may grow into a larger-scale data warehouse. It often makes more sense, however, to concentrate your efforts on supporting a data mart with known business value instead of on supplementing it with volumes of additional data that may provide business value (but can also slow response time or significantly complicate the end-to-end architecture). Again, the job you do up-front, in the early phases of your project, make a big difference in the direction you take and your level of success.

Anything!

Any set of criteria you can dream up can determine a data mart's contents. Some make sense; others don't. Some take you into an architectural dead end because you get only limited value and have to start all over to expand your capabilities.

Data mart or data warehouse?

Now that you (presumably) have read an entire chapter about the concept of data marts, I want to make one important point. If you start a project from the outset with either of the following premises, you already have two strikes against you:

  • "We're building a real data warehouse, not a puny little data mart."

  • "We're building a data mart, not a data warehouse."

By labeling your project as one or the other of these terms, you already have made some preconceived notions about the work you will do, before you have even begun to dig into the business problem. Until you understand the following three issues, you have no foundation on which to classify your impending project as either a data mart or a data warehouse:

  • The volumes and characteristics of data you need

  • The business problems you're trying to solve and the questions you're trying to answer

  • The business value you expect to gain when your system is successfully built

As mentioned at the beginning of this chapter, what's more important is that no formal definitions distinguish a data mart from a data warehouse. If you're extracting and rehosting a subset of data from an existing application into another environment, you can accurately deem what you're building as a data mart.

If you're starting from scratch, however, and extracting data from one or more source systems, handling the quality assurance and transformation, and copying that data into a separate environment, what determines whether you're building a data warehouse or a data mart? Although some guidelines exist, such as number of subject areas and volumes of data, it all comes down to this statement: As soon as you start labeling your environment as one or the other, you're adding preconceived notions and beliefs about its characteristics that may not fit your business needs.

Here's the answer: Forget about the terms data warehouse and data mart. Concentrate instead on your business problem and its possible solution: What data do you need in order to perform certain informational and analytical functions; where is that data now and in what form; and what do you have to do to make it available to your users?

Leave the terminology wars to the vendors and analysts. Don't get caught up in the hype.

Implementing a Data Mart -- Quickly

No matter how you decide to divide the universe of possible contents into some subset for your data mart, it's important to remember that in order to obtain maximum business value from your data mart, you must implement it quickly.

Here are the three keys to speedy implementation:

  • Follow a phased methodology. As described in Chapter 13, you spend the majority of your up-front time on the project focusing on the specific business value you want.

  • Hold to a fixed time for each phase. If you set aside two weeks for your scope, for example, stick to that window. Don't extend any phase (especially the early ones) unless the project is doomed to failure.

  • Avoid scope creep at all costs. Though costly and dangerous in any project -- data warehousing or otherwise -- scope creep (when additional feature requests keep creeping in long past the cutoff point) is devastating in a data mart effort. You probably will add complexity with only marginal incremental business value (if any) and do little other than put your project at risk.

Table of Contents

Introduction.

Part I: The Data Warehouse: Home for Your Data Assets.

Chapter 1: What’s in a Data Warehouse?

Chapter 2: What Should You Expect from Your Data Warehouse?

Chapter 3: Have It Your Way: The Structure of a Data Warehouse.

Chapter 4: Data Marts: Your Retail Data Outlet.

Part II: Data Warehousing Technology.

Chapter 5: Relational Databases and Data Warehousing.

Chapter 6: Specialty Databases and Data Warehousing.

Chapter 7: Stuck in the Middle with You: Data Warehousing Middleware.

Part III: Business Intelligence and Data Warehousing.

Chapter 8: An Intelligent Look at Business Intelligence.

Chapter 9: Simple Database Querying and Reporting.

Chapter 10: Business Analysis (OLAP).

Chapter 11: Data Mining: Hi-Ho, Hi-Ho, It’s Off to Mine We Go.

Chapter 12: Dashboards and Scorecards.

Part IV: Data Warehousing Projects: How to Do Them Right.

Chapter 13: Data Warehousing and Other IT Projects: The Same but Different.

Chapter 14: Building a Winning Data Warehousing Project Team.

Chapter 15: You Need What? When? — Capturing Requirements.

Chapter 16: Analyzing Data Sources.

Chapter 17: Delivering the Goods.

Chapter 18: User Testing, Feedback, and Acceptance.

Part V: Data Warehousing: The Big Picture.

Chapter 19: The Information Value Chain: Connecting Internal and External Data.

Chapter 20: Data Warehousing Driving Quality and Integration.

Chapter 21: The View from the Executive Boardroom.

Chapter 22: Existing Sort-of Data Warehouses: Upgrade or Replace?

Chapter 23: Surviving in the Computer Industry (and Handling Vendors).

Chapter 24: Working with Data Warehousing Consultants.

Part VI: Data Warehousing in the Not-Too-Distant Future.

Chapter 25: Expanding Your Data Warehouse with Unstructured Data.

Chapter 26: Agreeing to Disagree about Semantics.

Chapter 27: Collaborative Business Intelligence.

Part VII: The Part of Tens.

Chapter 28: Ten Questions to Consider When You’re Selecting User Tools.

Chapter 29: Ten Secrets to Managing Your Project Successfully.

Chapter 30: Ten Sources of Up-to-Date Information about Data Warehousing.

Chapter 31: Ten Mandatory Skills for a Data Warehousing Consultant.

Chapter 32: Ten Signs of a Data Warehousing Project in Trouble.

Chapter 33: Ten Signs of a Successful Data Warehousing Project.

Chapter 34: Ten Subject Areas to Cover with Product Vendors.

Index.

From the B&N Reads Blog

Customer Reviews