Doug Cutting talks about Hadoop, and open source

Doug Cutting, Architect at Cloudera

Doug Cutting has changed the way that IT does Big Data. Hadoop, the Open Source project he started, has made it so that any company with access to a rack of commodity PCs and a reasonable amount of programming skill can do the type of large scale data analysis work that was previously done only on supercomputers. Enterprises such as Amazon, eBay, Facebook and IBM, all the way to the Federal Reserve Board of Governors are taking advantage of tremendous value offered by this Open Source hit. Hadoop is a game changer.

A lot has been written about Hadoop's technology. We wanted to go one step further, to learn about the man who's made large scale data analytics an everyday part of the IT experience.

[ Apache Hadoop to get more user friendly | Cloudera expands Hadoop ecosystem ]

ITworld: How did a guy with a linguistics degree from Stanford create an Open Source hit?

Cutting: The linguistics degree is a little deceptive. There was no Computer Science undergraduate degree offered at Stanford when I was there. You could go into electrical engineering. Other options where math, philosophy or linguistics, all of which involved studying computation. So I ended up taking a lot of Computer Science courses as well as Linguistics courses. I think that my sub-major was something like, Computational Linguistics.

ITworld: How did you get to Open Source?

Cutting: After close to 15 years in the software business, I had a piece of software that I'd written on my own time, figuring that I would commercialize it. That was Lucene. I wasn't very interested in building a business. Negotiating license fees and paperwork around that was stuff that I didn't enjoy. What I really wanted was for people to use the software, which was a theme I found through my career.

I had been involved with Excite in the 90s. I'd gotten to the point where I spent many years writing software there and the software was gone from the Earth for all practical purposes. The company went bankrupt and all the software was swallowed into some intellectual property black hole.

Open Source seemed to offer the option to have the software that I'd written, this particular one, Lucene, live on and have the opportunity for people to use it. Maybe somehow there would be some revenue for me, although frankly when I first got into it, that wasn't at all an interest. I had no business aspirations around Lucene at all. I just wanted to see this software written and not go to waste.

That was my start with Open Source in 2000, with Lucene, putting it up on SourceForge under the GPL.

ITworld: When you were doing Lucene, were you working by yourself or were you collaborating with others?

Cutting: Before I made Lucene an Open Source project, it was something that I did entirely by myself. But at all the jobs I had, I always collaborated heavily on software projects. A lot of times I'd go off and start something by myself and get it to the point where other people could really evaluate it and say “Hey it does something and this could be interesting”, and then get more people to work on it and adopt it. That's more or less the pattern I've followed in Open Source as well: build the proof of concept, evangelize it, and get other people to adopt it as a platform.

ITworld: When did you start coding?

Cutting: I started coding in college. I took my first programming course in 1982 or 1983.

ITworld: What language did you write under?

(Laughs) The first language I studied was Pascal. By the time I'd graduated from college I was predominantly a LISP programmer. I'd fallen in with this research institute at Stanford, CSLI, The Center for the Study of Language and Information. They had a bunch of these Xerox LISP machines; so I spent a lot of time working on those. My programming language of choice was Interlisp.

ITworld: When did Java come on the horizon?

Cutting: In the late 90s I noticed it. But, I had a friend, my freshman roommate actually, who was part of the original Java team at Sun. So I knew about it from the outset.

At Excite we were doing dynamic web sites, one of the first to do a personalized home page at the time, free email, calendaring and all that kind of stuff. Programming in languages in C and C++ was very fragile and slow to program in. Scripting languages were just too slow. Java seemed to be a pretty good compromise: performance with a lot of safety checks.

I was an early proponent of using Java at Excite. I first learned Java writing Lucene on my own time. Then I went about working on a framework that Excite could use.

ITworld: What about other object-oriented languages?

Cutting: LISP became object-oriented in the years that I was using it. I did a lot of object-oriented programming in LISP in my days at PARC. I worked at Apple for a few years, and learned to develop in C++. I wrote a full text search engine there. I think that it was used in Spotlight, eventually.

ITworld: Did you get OOP right off the bat?

Cutting: There wasn't a lot of object-oriented programming that I ran into as an undergrad. We had an assignment in a LISP class to build a system that was object-oriented and I was totally baffled by it. I remember in retrospect being totally at sea in that assignment, and I find it humorous years later that I totally failed to get it.

Working on Interlisp there were object-oriented patterns. I became very comfortable with the patterns. By the time I ran into it in C++ and Java, it seemed very natural. There wasn't a big leap to be made. But, the first time I saw [OOP], it was definitely very strange to me, and I did not get it.

ITworld: I've come across SQL programmers that have trouble with the Java end of Hadoop, and report feeling a bit "not good enough" as a software developer. Do you find this self-deprecation to be a fallacy?

Cutting: It's definitely a fallacy. The way I've always thought about the object-oriented [paradigm] is that it's a pattern, that even if you don't have support for it in the language, it's a way of structuring software. Object-oriented languages make it easier.

ITworld: When you were doing Hadoop, did you have any idea that it would become a hit?

Cutting: No. I saw this marvelous technology, but you couldn't take advantage of it unless you worked at Google, which is a pretty small portion of the software development community. The papers were hits. But to a large degree they were irrelevant because nobody had [the framework to use it], except for these guys. So I thought, I know how to solve that. You write an Open Source implementation and everyone will have it.

I was interested in providing the world with great tools to make search engines. I was excited to be able to try to bring this technology to the world through Open Source. Google was able to come up with the ideas behind [Hadoop]. But, because of the way they were structured, they weren't able to give it to the world, except as ideas, which is great. They could have kept it as their secret sauce, but they choose to promulgate it.

ITworld: What were your boyhood ambitions?

Cutting: My grandfather was a professor at the University of Hawaii. We would visit every year, and I fell in love with skin diving. So I would tell anybody that asked that I wanted to be an oceanographer. I loved the fact that other kids would say fireman or something like that, and I had “oceanography” which sounded very mysterious to me. I wanted to be a scientist of some sort, that was my ambition overall.

ITworld: So you fulfilled that in a way.

Cutting: I don't think of myself as a scientist, I think of myself as an engineer. What I enjoy doing is building things that people use.

ITworld: If a thirteen year old kid came up to you and asked, “Hey Mr. Cutting what's the one thing that I have to do to become a really good software developer?” how would you answer?

Cutting: I'd say two things (laughs). Having some good academic background in data science and software is really valuable. Some people try to come to software without that, and you gotta pick it up. You gotta read a book, you gotta learn that stuff.

The other half is learning how to read and write good code by reading other people's code, and collaborating with people to build code. Open Source is really great for that. The social aspect of developing software is critical to developing good software. It's not a solo endeavor.

ITworld: This has been a great interview. Thanks!

Insider: How the basic tech behind the Internet works
Join the discussion
Be the first to comment on this article. Our Commenting Policies