So What? – Monitoring Hadoop Beyond Ganglia: Over the last couple of months I have been talking to more and more customers who are either bringing their Hadoop clusters into production or have already done so and are now getting serious about operations. This leads to some interesting discussions about how to monitor Hadoop properly and one thing pops up quite often: Do they need anything beyond Ganglia? If yes, what should they do beyond it?
Like in every other system, monitoring in a Hadoop environment starts with the basics: System Metrics – CPU, Disk, Memory you know the drill. Of special importance in a Hadoop system is a well-balanced cluster; you don’t want to have some nodes being much more (or less) utilized then others. Besides CPU and memory utilization, Disk utilization and of course I/O throughput is of high importance. After all the most likely bottleneck in a Big Data system is I/O – either with ingress (network and disk), moving data around (e.g., MapReduce shuffle on the network) and straightforward read/write to disk.
read more
DIGITAL JUICE
No comments:
Post a Comment
Thank's!