While startups and established data warehousing vendors such as Sybase and Teradata are embracing Hadoop and its Google-developed progenitor, MapReduce, Microsoft is resisting it.
"We'd never bring Hadoop code into one of our products," said Microsoft technical fellow and University of Wisconsin-Madison professor David J. DeWitt.
DeWitt's lack of interest is not surprising. DeWitt is an academic expert in parallel SQL databases, having co-invented three of them. He co-authored a paper this spring that argued that SQL databases still beat MapReduce at most tasks. He hasn't changed his mind.
"Every database vendor wants to claim that they're doing Hadoop because it's the popular thing," he said. "There's too much FUD. SQL databases still work pretty well." DeWitt leads a database research lab at Madison that is helping Microsoft with R&D for its upcoming Parallel Data Warehousing version of SQL Server 2008 R2, formerly known as Project Madison.
As such, he said that the new edition of SQL Server will add some analytic functions that roughly mimic some of the features of MapReduce/Hadoop. The additions are the result of incorporating technology from DATAllegro which Microsoft acquired, not Hadoop, DeWitt said.
He said does acknowledge, however, that MapReduce/Hadoop is better at keeping long running queries from crashing than SQL. Because of that, Microsoft may eventually try to incorporate those capabilities into future data warehousing-oriented versions of SQL Server, he said.
That would likely be a Microsoft-led effort, rather than a licensing of Hadoop's open source code, which is managed by the Apache Software Foundation.
IBM is the leading corporate supporter of Apache. Perhaps unsurprisingly, it is also "very bullish on Hadoop," said Anant Jhingran, CTO of IBM's information management division in the software group. "I'm not saying that mind-melding Hadoop with a database is the answer for everything," Jhingran said. "But in the end, I think every enterprise will want Hadoop. I'm just not sure in what form."
Questions remain about whether enterprises want Hadoop integrated into their SQL databases, as a separate data warehousing appliance, or as a web-only service where Hadoop is hidden underneath, as with IBM's experimental M2 service.
To determine this, IBM is running pilots with a dozen enterprise customers, as well as doing R&D work in the lab, Jhingran said. He declined to comment on the likelihood of Hadoop functionality making it into the next version of DB2 or Informix.
One thing is for certain, says Jhingran: Hadoop is best used to solve emerging problems such as web analytics, fraud, and analysis of unstructured and semi-structured data, rather than the problems that relational databases have already proven to excel with.
"For those vendors who simply want to use Hadoop to build a database replacement, I think they will fall flat on their faces," he said. SQL technology "supports a $300 billion ecosystem. It's extremely robust. I'm not that young [at 46], but I'll be retired before SQL is retired."
Oracle Database stands to lose the most if MapReduce/Hadoop takes off, critics say.
That's not just because of Oracle's longtime lead in the relational database market, but also because of its database's poor reputation for scaleout, a MapReduce/Hadoop strength.
Oracle did not respond to a request for comment. But in October, it published a blog which argued, in the words of independent analyst Curt Monash, that "actually, we've been doing MapReduce all along."
A senior product manager at Oracle, Jean-Pierre Dijcks, said parallel processing of large data sets been possible with Oracle Database using features first introduced with Oracle 9i back in 2001. He describes in detail how to implement it in a blog post.
"MapReduce in the end is a programming construct... SQL will allow for massive parallel processing as well. It is all a matter of looking beyond hype and finding a solution you are comfortable with," Dijcks wrote.
Return to microsoft news headlines
View Microsoft News Archive