Not getting performance with MapReduce

PriyankaShinde

I am working on hadoop mapreduce to get performance benefit but when I run my program on hadoop it takes about 37 minutes where as it takes only about 5 minutes for simple C++ program for doing the same task..

Topic: Software
Answer this Question

Answers

3 total
jimlynch
Vote Up (9)

Here's some additional info.

Map/Reduce Tutorial
http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html

"This document comprehensively describes all user-facing facets of the Hadoop Map/Reduce framework and serves as a tutorial."

tswayne
Vote Up (7)

I'm not sure what the issue is or how familiar you are with MapReduce.  Yahoo has a pretty good Hadoop tutorial that has two modules on MapReduce.  Perhaps that may be of some help.  Good luck! 

http://developer.yahoo.com/hadoop/tutorial/module4.html

http://developer.yahoo.com/hadoop/tutorial/module5.html

PriyankaShinde

I am new to Mapreduce and I am using Hadoop Pipes for that. I have an input file which contains the number of records, one per line. I have written one simple program to print those lines in which three words are common. In map function i have emiited the word as a key and record as a value and compared those records in reduce function. I compared Hadoop's performance with simple C++ program in which I read the records from file and split it into words and load the data, word as a key and record as a value in map and after loading all the data I compared that data. But I found that for doing the same task Hadoop MapReduce takes long time compared with plain C++ program.

Ask a question

Join Now or Sign In to ask a question.
Microsoft paid more than $7 billion for Nokia's handset and services business, and the jury is still out as to what it means for its future. In the past quarter it boosted Microsoft's revenue but also ate into its profit.
Strong sales of cloud products to businesses helped lift Microsoft's revenue by 18 percent last quarter, though its profits declined.
Oracle is combining its BlueKai consumer data aggregation platform with other parts of its catalog to create Oracle Data Cloud, a data-as-a-service offering aimed at companies that want to reach customers and prospects across multiple channels.
Teradata has bought the assets of Revelytix and Hadapt in a bid to grow out its capabilities for the Hadoop big-data processing framework.
Email encryption startup Virtru has launched a version of its service for businesses using Google Apps, a market segment that the company thinks is showing increased interest in secure communications.
Researchers have concluded that those billions of connected devices could help save lives in the event of disaster, even one that knocks out the Internet
Everyone should have these techniques in their arsenal. Here's how they work in Microsoft PowerPoint for Mac 2011, Apple's Keynote 6.2, and Google Docs.
Parallels Access 2.0 is a remote-access app that lets you view and control your Mac or Windows machine from any iOS device. You can connect either over a local network, or (as long as both machines are connected to the Internet) over the Internet.
The World Wide Web Consortium wants to bring the power of social media to the enterprise.
The humble lockscreen is about to become the most important interface on your smartphone, says columnist Mike Elgan.
randomness