Tags

, , ,

Hadoop provides a benchmarking mechanism for the cluster. The steps to benchmark cloudera cluster file system is below.

set the HADOOP_HOME.

HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/

Run TestDFSIO as below.

#hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.3.0-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 1000

#hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.3.0-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

Once you run the test you will see TestDFSIO_results.log  file in the same directory. The content of the file would look below :

—– TestDFSIO —– : write
Date & time: Wed Oct 09 14:56:14 PDT 2013
Number of files: 10
Total MBytes processed: 10000.0
Throughput mb/sec: 5.382930941302368
Average IO rate mb/sec: 5.390388488769531
IO rate std deviation: 0.20763769922620628
Test exec time sec: 211.457
—– TestDFSIO —– : read
Date & time: Wed Oct 09 14:57:47 PDT 2013
Number of files: 10
Total MBytes processed: 10000.0
Throughput mb/sec: 48.88230607167124
Average IO rate mb/sec: 49.50707244873047
IO rate std deviation: 5.8465670196729596
Test exec time sec: 39.954

Based on the numbers above, below would be the read and write Throughput across the cluster.
Total Read Throughput Across Clusters (Number of files * Throughput mb/sec) = 488.8MB/Sec
Total Write Throughput Across Clusters(Number of files * Throughput mb/sec) = 53.82 MB/Se
Advertisements