June 30, 2009

Sun and Oracle

L0_announcement

The recent acquisition of Sun by Oracle has raised a lot of speculative discussion about the latter vendor’s strategic pursuits. The move may or may not result in a power triumvirate of HP-IBM-Oracle. But Oracle expanding its portfolio to include hardware could be a game-changer.

Oracle has a dubious record with hardware plays. The nCube investment (circa 1988) and network computer idea (circa 1996) both presented interesting vision, but didn’t deliver tactically. NCube video-on-demand (circa 1994) ceded to decommissioning the product (circa 2001).

While many are focused on the state of Sun’s numerous DBMS partnerships, I’m more interested in the fate of Storage Technologies, which was acquired by Sun (circa 2002). Do a little research and you’ll see that EMC stores the lion’s share of DBMS data across enterprise data centers. If Oracle keeps the Storage Tech products it might shave some revenue from EMC and gain an even larger wallet share with IT organizations. Oracle’s intentions are equally unclear around the Exadata product, which had previously relied on the HP partnership that’s certainly strained. With the acquisition of Sun, Oracle is more able to go head-to-head with the likes of HP’s Neoview and Teradata.

Clearly the company has the option of producing a database appliance on its own. Personally I’m waiting to see the level of fear, uncertainty, and doubt Oracle stir up into the data warehouse appliance market. Oracle hasn’t differentiated its DBMS in years. The differentiation has always been about the company’s size, the number of Fortune 500 customers, and its broad array of application offerings, and that they work on every conceivable hardware platform. Focus on non-database products has fanned the flames of the market’s perception that databases are a mere commodity.

I can only imagine what’s going on in Oracle’s slideware development organization right now. Here are some of the messaging scenarios that are likely to be on the table:

Scenario 1: “Through our acquisition of Sun, we can now deliver a more fully-functional database appliance.”

In reality, the whole point of an appliance is to reduce complexity and configuration effort. Prepackaging Oracle on a hardware platform already occurs with companies like Sun, HP, and Dell. This isn’t simpler or better.

Scenario 2: “Oracle can now be your de-facto desktop and development tool provider.”

This one could actually be true. Oracle can leverage Sun’s vast software capabilities in two significant ways. With Sun’s desktop office suite, StarOffice, Oracle could provide a captivating alternative to the Microsoft Office monopoly. Any executive would find it difficult to ignore an Oracle office option, particularly in cases where they’ve made significant investments in Oracle as the corporate database standard. Plus, Oracle can monetize open source software by dramatically improving support revenue from these customers. Microsoft does not deliver customer service and support the way Oracle does—and enterprise clients expect more sophisticated and consistent support than the channel usually delivers.

Scenario 3: “Our Java-based toolset covers the spectrum of development needs without forcing your reliance on a specific vendor. Whether it’s middleware, server development, or reporting, we have the tools to support a multi-tier network enabled environment.  You can now come to a single company for a single set of tools regardless of your platform type, desktop, server, or operating system.”

For IT organizations that still rely on custom development, this may dramatically reduce the number of suppliers they need. Over the past few years the number of middleware and application tool vendors has diminished—with Oracle being the buyer of many of them. Most IT organizations prefer fewer vendors. Whether open source or proprietary, the combined Oracle-Sun toolset offers Oracle a significant revenue stream in the support arena.

I’m fascinated that little or no attention has been paid to the software assets that Sun has. This combined with Oracle’s DBMS, middleware, and application toolsets offers an unexpected alternative to the ongoing IBM and Microsoft battles for enterprise development. Moreover, with Sun’s Java leadership and the popularity of Java in consumer electronics, Oracle can now enter into the world of consumer software, a la Apple. The opportunity for Oracle to support media companies that sell directly to the end consumer is wide open.

If it’s not careful, Oracle’s future may be in milking the legacy product cow instead of exploiting its newfound software assets.  The real question is, is Oracle a company of innovators or bean counters?

June 09, 2009

No Data Warehouse Required: BI Reporting Extends Its Reach

Rusted Warehouse by jakeliefer via Flickr

It’s rare these days to find clients who haven’t already decided on a standard BI platform. Most of the new BI tool discussions we get into with clients are with companies who’ve decided that it’s time to broaden their horizons beyond Microsoft.

The dirty little secret in most companies is that the BI reporting team has morphed into a de-facto enterprise reporting team. Why is this?

When it comes to reporting, there’s a difference between the BI team and the rest of IT. The fact is that BI teams are successful not because of the infrastructure technologies, but because of the technologies in front of the users: the actual BI tool. To the end user, data visualization and access are much more important than database management and storage infrastructure.  So when a new operational system is introduced, users expect the same functionality, look and feel as their other reports.

An insurance company we’re working with is replacing its operational systems. The company’s management has already decided not to use the vendor’s reports—they’re too limited and brittle. They expect these reports to dovetail into the company’s information portal and work alongside their BI reporting. Companies are refreshing their operational platforms every seven to ten years. It’s now 2009, and the last time they refreshed their operational systems was in reaction to Y2K. It’s once again time to revisit those operational systems.

If you look at the challenges BI tool vendors are facing, there is limited growth in data warehousing. Most companies have standardized their BI tool suite. Absent disruptive technology or new functionality, there’s limited growth opportunity for BI tools in the data warehousing space.

But for every data warehouse or data mart within a company, there are likely dozens of operational systems that users need access to. The opportunity for BI vendors now is delivering operational information to business users. This isn’t about complex analytics or advanced computation. This is the retrieval of operational information from where it lives.

Photo by jakeliefer (via Flickr)

May 19, 2009

The Rise of the Columnar Database

Column_eflon
photo by eflon

I’m continually surprised that more vendors haven’t hurled themselves onto the columnar database bandwagon. The more this space matures, the more evident it becomes that analytics is a perfect match for column-based database architectures.

One of the most frustrating phenomena to IT is adherence to a theoretical view. In the 1970s the entire relational database industry implemented what was really an academic precept. For those pragmatists who haven’t dusted off their textbooks recently, I’ll recall the writings of Codd and Date. They introduced the concepts of organizing data in tuples, organizing primary values along with their descriptive details (aka: attributes). Vendors interpreted this to mean that data should be physically stored in this fashion, architecting their products to store data in tables, populated with rows consisting of columns. If you wanted to access a value, you had to retrieve the entire row.

With all due respect, this approach has been cumbersome since Day 1. The fact is, storing data the way the business looks doesn’t lend itself to the way people ask questions. When I create an outbound marketing list, I need a name, a phone number, and an address. I don’t need information on household, demographic segment, or the name of a customer’s dog.

While I do need to store all the customer data, I don’t want to be bogged down by processing all that data in order to answer my question. Herein lies the quandary: do I structure the data based on all the information we have, or based on the information I might access?

Vendors have tried to bridge the gap. We’ve seen partitioning, star indexes, query pre-processing, bitmap and index joins, and even hashing in an attempt to support more specific data retrieval. Such solutions still require examining the contents of the entire row.

Although my background is in engineering, I know enough about Occam’s razor to know that it applies here: the simplest solution is the best one. Vendors like Kickfire, Vertica, Paraccel, and Sybase—whose pioneering IQ product launched over a dozen years ago--went back to the drawing board and fixed the problem, architecting their products structure and store the data the way people ask questions—in columns.

For you SQL jockeys, most of the heavy-lifting in database processing is in the where clause. Columnar databases are faster because their processing isn’t inhibited by unnecessary row content. Because many database tables can have upwards of 100 columns, and because most business questions only request a handful of them, this just makes business sense. And In these days of multi-billion row tables and petabyte-sized systems, columnar databases make more sense than ever.

As the data warehouse market continues to consolidate through acquisitions, look for column-based startups—including several open-source solutions—to fill the void. If you ask me, there’s plenty of room.

May 05, 2009

MDM and M&A


Mergers


A lot of our new clients have asked us to build MDM business cases to support their merger and acquisition strategies. Specifically, they’re looking to support the following four activities:

  • Recent corporate mergers
  • Acquisitions
  • Reorganizations
  • Spin-offs


Collectively, these activities can roll up into a category called corporate restructuring. Contrary to popular belief, restructuring isn’t just a financial challenge. It includes realignment of marketing activities (for instance, reconciling promotions and re-aligning diverse product sets), sales (reorganizing territories and compensation plans), and operational issues (company locations, product inventories).

Most companies approach restructuring as a one-time-only activity in which an army of analysts tries to reconcile financial structures from organizational hierarchies, to budgets, to the accounts themselves. The fact is these activities aren’t just part of high-profile M&A events. They occur every year as companies go through their annual budget processes. During a corporate restructuring the process usually takes longer than the acquisition itself.

Three principle MDM features lend themselves to this restructuring work: matching, grouping, and linking. MDM excels at matching “like” items from disparate sources, tracking and managing hierarchies and groupings, and linking disparate data sources to enable ongoing data integration. The point is that the act of merging organizations also means consolidating details across the companies. Most people consider this a one-time-only activity. The fact is, it must be an ongoing process.

When one company buys another, it’s typical to allow the acquired company to continue to operate using the same systems and methods it always has. The acquiring company simply needs to know how to integrate the information into their existing business. Consider Berkshire Hathaway. They acquire companies frequently, but don’t change how they run their business. They simply know how to reconcile and roll up the details.

Ideally, corporate restructuring means establishing a process to allow organizations to continue their operations using their existing systems. IT systems reconciliation simply cannot get in the way of running business operations. All too often, the answer is, “Replace their systems with ours.” This statement means that the new organization should reengineer its business. This simply takes too long.

MDM provides a company the capability to link the data content from disparate systems within and across companies. I’m not talking about linking Linux with Windows, I’m talking about matching and linking business content across dozens or even hundreds of systems. This way invoices continue going out, sales people continue getting commissions, and customers can still get product support in a seamless way. 

Next time you’re discussing corporate restructuring and someone says the word “re-platform,” ask the question, “If we can link and move the data to continue to support core business processes, then we wouldn’t have to disrupt our operational systems, right?” Matching and linking the data across core systems can save a lot in terms of software and labor costs. But improving it where it lays? Priceless.

April 21, 2009

Your Company’s Data Supply Chain

Chain
photo by BotheredByBees


At Baseline Consulting we've been talking for several years about the concept of a data supply chain. But IT executives are only now starting to catch on to its importance.

Over the past 15 years there has been a big push to standardize on off-the-shelf software. This allowed IT organizations to buy instead of build. We've migrated from proprietary architectures to Windows and Linux standards. We've gone from custom-built applications to packaged CRM and ERP applications. IT adopted this approach because its value is automating business processes and supporting analysis-- not inventing new technologies. The problem is that moving data between all of these "packaged systems" still requires custom code.

There's no question that middleware provides value: it delivers the pre-built data pipes. Unfortunately, these are toolkits requiring developers to write code to connect their packages to the pipes. Most CIOs are blissfully unaware of the amount of custom coding middleware requires. Trust me: IT spends an enormous amount of money on supporting such data migration solutions. Many IT shops still view middleware as sacred ground.

The data warehousing world has enthusiastically adopted ETL tools to reduce custom coding so they can focus on the issues of data accuracy and usability. One fact lost in translation is that ETL integrates data-- it's more than just a pipe. The application world has adopted EAI, ESB, and orchestration to move data quicker. However, there's no integration. Each application is responsible for integrating the data they receive.

So, there's even more custom code. Code to connect an application to the pipes. Code to integrate and cleanup the data they receive from the pipes. Custom code to move data around isn't the answer. Orchestration, message passing, and data movement just creates a labyrinth of pipes. There are no economies of scale. The data doesn't get better.

Walmart learned years ago that it was impractical to have a custom (and separate) distribution system for every supplier. They knew the cost benefits of a standard distribution system; this meant they needed to standardize the size of the trailers, the size of the boxes, and the way the boxes were packed and shipped. The benefits of a supply chain is that standardization occurs at the most cost effective point: the source. Walmart's distribution success was measured by its ability to accept new suppliers and manage more shipments.

Most CIOs don't recognize that they have a data supply chain. Instead of building a custom distribution system for each suppler (each business application), they should be focused on a single data supply chain. Middleware supports the creation of custom distribution solutions, but not the standardization of data. A data supply chain can only be successful if the data is standardized. Otherwise everyone is forced to write custom code to standardize, clean, and integrate the data.

April 07, 2009

Blurring the Line Between SOA and BI

Toolbox_01

photo by Siomuzzz

I recently read with interest an article in the Microsoft Architect Journal on so-called Service-Oriented Business Intelligence or, as the article’s authors call it, “SoBI.” The article was well-intentioned but confusing. What it confirmed to me is that plenty of experienced IT professionals are struggling to reconcile Service Oriented Architecture (SOA) concepts with business intelligence.

SOA is certainly a valuable tool in the architecture and development toolbox; however, I think it’s only fair to keep SOA in perspective.  It’s an evolutionary  technology  in IT that has numerous benefits to developer productivity and application connectivity.  I’m not sure that injecting SOA into a data warehouse environment or framework will do anything more than freshen a few low-level building blocks that have been neglected in some data warehouse environments.  I’m certainly not challenging the value of SOA; I’m just trying to put in perspective to those folks that are focused on data warehouse and business intelligence activities.

The idea around SOA is to create services (or functions, procedures, etc.) that can be used by other systems.  The idea is simple: build once, use many times.  This ensures that important (and possibly complicated) application processes can be used by numerous disparate applications. It’s like an application processing supply chain:  let the most efficient resource build a service and provide to everyone else for use.   SOA provides a framework for allowing multiple applications access to common, well-defined services.  These services can contain code and/or data.  

The question for most data warehouse environment’s isn’t whether SOA can improve (or benefit) the data warehouse; it’s understanding how SOA can benefit a data warehouse. 

We’ve got lots of clients leveraging SOA to support their data warehouse.  They’ve learned they can leverage SOA techniques and coding to deliver standardized data cleansing and data validation to a range of business applications.  They have also upgraded the operational system data extraction code to leverage SOA which allowed other application systems (or data marts) to reuse their code.

However, their use of the SOA hasn’t been focused on enhancing the data warehouse environment as much as has been focused on packaging their development efforts for others to use.  Most data warehouse developers invest heavily in navigating IT’s labyrinth of operational systems and application data in order to identify, cleanse, and load data into their warehouses.  What they’ve learned is that for every new ETL script, there are probably 20 other systems that have to custom developed their own data retrieval code and never documented it.  The value that many data warehouse developers find with SOA isn’t that they are improving their data warehouse;  they’re just addressing the limitations of the application systems.

March 24, 2009

Not MDM, Not Data Governance: Data Management.

Duncecap
photo by garybirnie


Has everyone forgotten database development fundamentals?

In the hubbub of MDM and data governance, everyone’s lost track of the necessity of data standards and practices. All too often when my team and I get involved with a data warehouse review or BI scorecard project, we confront inconsistent column names in tables, meaningless table names, and different representations of the same database object. It’s as though the concepts of naming conventions and value standards never existed.

And now the master data millennium has begun! Every Tom, Dick, and Harry in the software world is espousing the benefits of their software to support MDM. “We can store your reference list!” they say. “We can ensure that all values conform to the same rules!” “Look, every application tied to this database will use the same names!”

Unfortunately this isn’t master data management. It’s what people should have been doing all along, and it’s establishing data standards. It’s called data management.

It’s not sexy, it’s not business alignment, and it doesn’t require a lot of meetings. It’s not data governance. Instead, it’s the day-to-day management of detailed data, including the dirty work of establishing standards. Standardizing terms, values, and definitions means that as we move data around and between systems it’s consistent and meaningful. This is Information Technology 101. You can’t go to IT 301—jeez, you can’t graduate!—without data management. It’s just one of those fundamentals.

March 10, 2009

The Flaw of the Hub-and-Spoke Architecture

By Evan Levy

I recently talked to a client who was fixated on a hub-and-spoke solution to support his company’s analytical applications. This guy had been around the block a few times and had some pretty set paradigms about how BI should work. In the world of software and data, the one thing I’ve learned is that there are no absolutes. And there’s no such thing as a universal architecture.

Wheel_spokes
photo by John-Morgan


The premise of a hub-and-spoke architecture is to have a data warehouse function as the clearing house for all the data a company’s applications might need. This can be a reasonable approach if data requirements are well-defined, predictable, and homogenous across the applications—and if data latency isn’t an issue.

First-generation data warehouses were originally built as reporting systems. But people quickly recognized the need for data provisioning (e.g., moving data between systems), and data warehouses morphed into storehouses for analytic data. This was out of necessity: developers didn’t have the knowledge or skills to retrieve data from operational systems. The data warehouse was rendered a data provisioning platform not because of architectural elegance but due to resource and skills limitations.

(And let’s not forget that the data contained in all these operational systems was rarely documented, whereas data in the warehouse was often supported by robust metadata.)

If everyone’s needs are homogenous and well-defined, using the data warehouse for data provisioning is just fine. The flaw of hub-and-spoke is that it doesn’t address issues of timeliness and latency.  After all, if it could why are programmers still writing custom code for data provisioning?

When an airline wants to adjust the cost of seats, it can’t formulate new pricing based on old data—it needs up-to-the-minute pricing details. Large distribution networks, like retailing and shipping, have learned that hub-and-spoke systems are not the most efficient or cost-effective models.

Nowadays most cutting-edge analytic tools are focused on allowing the business to quickly respond to events and circumstances. And most companies have adopted packaged applications for their core financial and operations. Unlike the proprietary systems of the past, these applications are in fact well-documented, and many come with utilities and standard extracts as part of initial delivery. What’s changed in the last 15 years is that operational applications are now built to share data. And most differentiating business processes require direct source system access.

Many high-value business needs require fine-grained, non-enterprise data. To move this specialized, business function-centric content through a hub-and-spoke network designed to support large-volume, generalized data is not only inefficient but more costly. Analytic users don’t always need the same data. Moreover, these users now know where the data is, so time-sensitive information can be available on-demand.

The logistics and shipping industries learned that you can start with a hub-and-spoke design, but when volume reaches critical mass, direct source-to-destination links are more efficient, and more profitable. (If this wasn’t the case, there would be no such thing as the non-stop flight.) When business requirements are specialized and high-value (e.g., low-latency, limited content), provisioning data directly from the source system is not only justified, it’s probably the most efficient solution.

February 17, 2009

SOA Mandates Data Management

By Evan Levy

I've always said that the focus on SOA is too much on the "A" for "architecture." The whole idea of SOA is to define application functions and services that need to be accessible to other systems. Prior to SOA, it was always hairy to move data from one system to another.

Cellphone-main_Full  But everyone thinks that SOA is an integration framework. In fact it's a means of remotely accessing other systems and their related information without having to know the details. For instance, I don't need to know how my cell phone number was assigned; I just need to remember that number so I can share it with my friends.

As I've said before, SOA is the evolutionary result of all the middleware companies trying to convince us to buy their hardware-independent products. SOA's ability to business flexibility today is just as remote as the code objects of decade ago promising to make business more nimble. SOA isn't a business term. It's a technical term for technical people to focus on re-use, standard parts, and standardized processes.

Holygrail Companies turning to SOA are looking for the holy grail. Consider the emergence of the term "SOA governance" to address the conundrum of SOA development planning. The core issue is how to simplify developers' work in building applications without having to understand the technical details and obstacles in between. Just because I have advanced features and functions doesn't mean I don't still have to focus on software development fundamentals. Design reviews, code re-use, and development standards still matter.

The real challenge with implementing any kind of web service, or defining services that can be re-used, is in ensuring that the data they are dependent on is well-defined. Unfortunately there is no such thing as a business process that is data-independent. Until you've standardized your data, which means implementing data management and maintaining data in a sustainable way, you can't have re-usable services. Period.

February 04, 2009

Operational BI From the Trenches

By Evan Levy

Buzzword_box2 Operational BI is getting a lot of attention.  The idea is a reasonable one – using recent data to make timely decisions.  However as with any other current buzzword, the world seems to be piling on and the meaning of operational BI seems to be is evolving (or eroding).

BI has been around a while now.  The idea is to leverage technology to allow a business person to utilize detailed data to answer timely business questions.  The most well known BI tools come from established vendors: IBM, Microsoft, Business Objects, Microstrategy.  Most tools use relational databases and rely on the SQL language to navigate and manipulate the data.   Most data warehouses that provide data to BI tools have been built to support query flexibility, performance, and maintain a large volume of history data.  The trade-off is often that there are delays in getting data loaded.  Most high-value data warehouses rely on regular monthly, weekly, or daily updates.   They were never built to support “operational” functionality.

The fuzzy part is what we mean by "operational."  Rather than engaging in a semantic debate, I thought I'd share what we see at clients as the three common requirements where for truly operational BI:

  1. Load the data fast – usually right after it's created.
  2. Run a query fast. For instance, look up the customer’s billing history while he's waiting on the phone.
  3. Identify a specific business circumstance when it happens. For instance, tell the customer when she's exhausted her cell phone minutes.

As you can imagine, any one of these individual capabilities is likely to require specialized development work .  When you combine these functions, it becomes pretty clear that traditional data warehouses or business intelligence tools  can struggle to support Operational BI.  When a legitimate need for Operational BI arises, most IT departments simply build a separate reporting data mart or a reporting platform.  Why? Because the timeliness of loading and query processing makes it impractical to add on to an existing platform—unless of course they happen to have  a large-scale data warehouse with unused processing capacity just laying around.

The truth is, you may not need to limit your operational BI solution to relational database, or even to a BI tool! (I made this point on a recent broadcast of DM Radio and it invited a lot of post-show dialog.) The fact is that that relational databases and SQL aren't the best (or even the most efficient) technologies to support operational BI.   Indeed, there are other technologies that can support some of the Operational BI activities in a simpler and more efficient manner. We'll talk about those in another blog posting, after you've had a chance to consider this one.

About This Blog

Evan Levy, partner and co-founder of Baseline Consulting, offers his real-world insights into data integration, data delivery, and why data should be baked into every development lifecycle, every time.

About Evan

Baseline on Twitter

    follow me on Twitter