Showing posts with label hadoop. Show all posts
Showing posts with label hadoop. Show all posts

06 July 2010

Open Source: It's all LinkedIn

As I noted in my post “Why No Billion-Dollar Open Source Companies?", one of the reasons there are no large pure-play open source companies is that their business model is based on giving back to customers most of the costs the latter have traditionally paid to software houses.

On Open Enterprise blog.

04 June 2009

This is the Future: the Grid Meets the Grid

Wow, this is cool:


At first glance it’s hard to see how the open-source software framework Hadoop, which was developed for analyzing large data sets generated by web sites, would be useful for the power grid — open-source tools and utilities don’t often mix. But that was before the smart grid and its IT tools started to squeeze their way into the energy industry. Hadoop is in fact now being used by the Tennessee Valley Authority (TVA) and the North American Electric Reliability Corp. (NERC) to aggregate and process data about the health of the power grid, according to this blog post from Cloudera, a startup that’s commercializing Hadoop.

The TVA is collecting data about the reliability of electricity on the power grid using phasor measurement unit (PMU) devices. NERC has designated the TVA system as the national repository of such electrical data; it subsequently aggregates info from more than 100 PMU devices, including voltage, current, frequency and location, using GPS, several thousand times a second. Talk about information overload.

But TVA says Hadoop is a low-cost way to manage this massive amount of data so that it can be accessed all the time. Why? Because Hadoop has been designed to run on a lot of cheap commodity computers and uses two distributed features that make the system more reliable and easier to use to run processes on large sets of data.

What's interesting about this - aside from seeing yet more open source deployed in novel ways - is that it presages a day when the physical grid of electicity and its users are plugged into the digital grid, to allow massive real-time analysis of vast swathes of the modern world, and equally real-time control of it across the grid. Let's hope they get the security sorted out before then...

Follow me @glynmoody on Twitter or identi.ca.

17 March 2009

Open Enterprise Interview: Mike Olson, Cloudera

Yesterday, I wrote about the launch of the open source company Cloudera. It's always hard to tell whether startups will flourish, but among the most critical factors for survival are the skills of the management team. The fact that less than three hours after I sent out some questions about Cloudera to Mike Olson, one of the company's founders, I had the answers back would seem to augur well in this respect.

Olson explains the background to the company, and to Hadoop, the software it is based on: what it does, and why business might want to use it; he talks about his company's services and business model, and why he thinks cloud computing is neither a threat nor an opportunity for open source.

On Open Enterprise blog.

16 March 2009

Open Source Cloud Computing Made Easy

Creating a business around free software is hardly a new idea: Cygnus Solutions, based around Stallman's GCC, was set up in 1989. But here's one with a trendy twist: a company based on the open source *cloud computing* app Hadoop, an Apache Project...

On Open Enterprise blog.

08 January 2009

Open Cloud Conundrum, Open Cloud Consortium

One of the hot areas in 2008 was cloud computing, and 2009 looks likely to be a year that is equally occupied with the subject. But cloud computing represents something of a conundrum for the open source world.

On Open Enterprise blog.

17 October 2008

What Comes After “Embrace, Extend”?

Here are two small, but significant moves by Microsoft....

On Open Enterprise blog.

21 February 2008

Hip-hip-Hadoop!

Just one more reason why the Microsoft-Yahoo merger, if it happens, will be hell:


Yahoo is following in Google’s footsteps again in search. Today, it is shifting a crucial part of its search engine to Hadoop, software that handles large-scale distributed computing tasks particularly well. Hadoop is an open-source implementation of Google’s MapReduce software and file system.

...

Yahoo is replacing its own software with Hadoop and running it on a Linux server cluster with 10,000 core processors.

Go that? 10,000 core processors running GNU/Linux at the heart of Yahoo. Microsoft is damned if they do (rip and replace) and damned if they don't. Go on, make our day, Steve....

16 November 2007

Proprietary Software Does Not Scale

It used to be said that open source software does not scale - a reflection of both its immaturity at the time, and of the pious hopes of the proprietary world. Today, the reverse is true: it is proprietary software that does not scale, but in a slightly different sense.

This was brought home to me by IBM's fashionable Blue Cloud announcement:


Blue Cloud – based on IBM’s Almaden Research Center cloud infrastructure -- will include Xen and PowerVM virtualized Linux operating system images and Hadoop parallel workload scheduling. Blue Cloud is supported by IBM Tivoli software that manages servers to ensure optimal performance based on demand. This includes software that is capable of instantly provisioning resources across multiple servers to provide users with a seamless experience that speeds performance and ensures reliability even under the most demanding situations. Tivoli monitoring checks the health of the provisioned servers and makes sure they meet service level agreements.

The whole point about cloud computing is that it has to be effectively infinite - the more people want, the more they get. You can't do that with software that requires some kind of licensing payment, unless it's flat-fee. You either have to write the software yourself, or - much easier - you use free software (or, as with Google and now IBM, you do both.)

If cloud computing takes off, Microsoft is going to be faced with a difficult choice: see everyone migrate to open source, or offer its operating systems for a flat fee. Given its recent behaviour in places like China and Russia, where it has effectively given away its software just to stop open source, I think it will opt for the latter.

14 November 2007

Yahoo! Goes Whoop! About Hadoop! (and Pig!)

Now why on earth would Yahoo be doing this?

Yahoo! Inc., a leading global Internet company, today announced that it will be the first in the industry to launch an open source program aimed at advancing the research and development of systems software for distributed computing. Yahoo!'s program is intended to leverage its leadership in Hadoop, an open source distributed computing sub-project of the Apache Software Foundation, to enable researchers to modify and evaluate the systems software running on a 4,000 processor supercomputer provided by Yahoo!. Unlike other companies and traditional supercomputing centers, which focus on providing users with computers for running applications and for coursework, Yahoo!'s program focuses on pushing the boundaries of large-scale systems software research.

Currently, academic researchers lack the hardware and software infrastructure to support Internet-scale systems software research. To date, Yahoo! has been the primary contributor to Hadoop, an open source distributed file system and parallel execution environment that enables its users to process massive amounts of data. Hadoop has been adopted by many groups and is the software of choice for supporting university coursework in Internet-scale computing. Researchers have been eager to collaborate with Yahoo! and tap the company's technical leadership in Hadoop-related systems software research and development.

As a key part of the program, Yahoo! intends to make Hadoop available in a supercomputing-class data center to the academic community for systems software research. Called the M45, Yahoo!'s supercomputing cluster, named after one of the best known open star clusters, has approximately 4,000 processors, three terabytes of memory, 1.5 petabytes of disks, and a peak performance of more than 27 trillion calculations per second (27 teraflops), placing it among the top 50 fastest supercomputers in the world.

M45 is expected to run the latest version of Hadoop and other state-of-the-art, Yahoo!-supported, open-source distributed computing software such as the Pig parallel programming language developed by Yahoo! Research, the central advanced research organization of Yahoo! Inc.

It's cool that Yahoo's backing the open source Hadoop, and doubly cool that one of the projects is called Pig. But it's also shrewd. It's becoming abundantly clear that open beats closed; Google, for all its use of open source software, is remarkably closed at its core. Enter Hadoop, running on a 4,000 processor supercomputer provided by Yahoo, with the real possibility of spawning a truly open rival to Google.... (Via Matt Asay.)