Expert.Hadoop.Administration.Managing.Tuning.and.Securing.Spark.YARN.and.HDFS

上传:u657779766 浏览: 35 推荐: 0 文件:pdf 大小:16.97MB 上传时间:2019-04-17 12:47:02 版权申诉
In Expert Hadoop® Administration, leading Hadoop administrator Sam R. Alapati brings together authoritative knowledge for creating, configuring, securing, managing, and optimizing production Hadoop cl usters in any environment. Drawing on his experience with large-scale Hadoop administration, Alapati integrates action-oriented advic e with carefully researched explanations of both problems and solutions. He covers an unmatched range of topics and offers an unparalleled collection of realistic examples. Alapati demystifies complex Hadoop environments, helping you understand exactly what happens behind the scenes when you administer your cluster. You’ll gain unprecedented insight as you walk through building clusters from scratch and configuring high availability, performance, security, encryption, and other key attributes. The high-value administration skills you learn here will be indispensable no matter what Hadoop distribution you use or what Hadoop applications you run. Understand Hadoop’s architecture from an administrator’s standpoint Create simple and fully distributed clusters Run MapReduce and Spark applications in a Hadoop cluster Manage and protect Hadoop data and high availability Work with HDFS commands, file permissions, and storage management Move data, and use YARN to allocate resources and schedule jobs Manage job workflows with Oozie and Hue Secure, monitor, log, and optimize Hadoop Benchmark and troubleshoot Hadoop Table of Contents Part I: Introduction to Hadoop—Architecture and Hadoop Clusters Chapter 1 Introduction to Hadoop and Its Environment Chapter 2 An Introduction to the Architecture of Hadoop Chapter 3 Creating and Configuring a Simple Hadoop Cluster Chapter 4 Planning for and Creating a Fully Distributed Cluster Part II: Hadoop Application Frameworks Chapter 5 Running Applications in a Cluster—The MapReduce Framework (and Hive and Pig) Chapter 6 Running Applications in a Cluster—The Spark Framework Chapter 7 Running Spark Applications Part III: Managing and Protecting Hadoop Data and High Availability Chapter 8 The Role of the NameNode and How HDFS Works Chapter 9 HDFS Commands, HDFS Permissions and HDFS Storage Chapter 10 Data Protection, File Formats and Accessing HDFS Chapter 11 NameNode Operations, High Availability and Federation Part IV: Moving Data, Allocating Resources, Scheduling Jobs and Security Chapter 12 Moving Data Into and Out of Hadoop Chapter 13 Resource Allocation in a Hadoop Cluster Chapter 14 Working with Oozie to Manage Job Workflows Chapter 15 Securing Hadoop Part V: Monitoring, Optimization and Troubleshooting Chapter 16 Managing Jobs, Using Hue and Performing Routine Tasks Chapter 17 Monitoring, Metrics and Hadoop Logging Chapter 18 Tuning the Cluster Resources, Optimizing MapReduce Jobs and Benchmarking Chapter 19 Configuring and Tuning Apache Spark on YARN Chapter 20 Optimizing Spark Applications Chapter 21 Troubleshooting Hadoop—A Sampler Chapter 22 Installing VirtualBox and Linux and Cloning the Virtual Machines e with carefully researched explanations of both problems and solutions. He covers an unmatched range of topics and offers an unparalleled collection of realistic examples. Alapati demystifies complex Hadoop environments, helping you understand exactly what happens behind the scenes when you administer your cluster. You’ll gain unprecedented insight as you walk through building clusters from scratch and configuring high availability, performance, security, encryption, and other key attributes. The high-value administration skills you learn here will be indispensable no matter what Hadoop distribution you use or what Hadoop applications you run. Understand Hadoop’s architecture from an administrator’s standpoint Create simple and fully distributed clusters Run MapReduce and Spark applications in a Hadoop cluster Manage and protect Hadoop data and high availability Work with HDFS commands, file permissions, and storage management Move data, and use YARN to allocate resources and schedule jobs Manage job workflows with Oozie and Hue Secure, monitor, log, and optimize Hadoop Benchmark and troubleshoot Hadoop Table of Contents Part I: Introduction to Hadoop—Architecture and Hadoop Clusters Chapter 1 Introduction to Hadoop and Its Environment Chapter 2 An Introduction to the Architecture of Hadoop Chapter 3 Creating and Configuring a Simple Hadoop Cluster Chapter 4 Planning for and Creating a Fully Distributed Cluster Part II: Hadoop Application Frameworks Chapter 5 Running Applications in a Cluster—The MapReduce Framework (and Hive and Pig) Chapter 6 Running Applications in a Cluster—The Spark Framework Chapter 7 Running Spark Applications Part III: Managing and Protecting Hadoop Data and High Availability Chapter 8 The Role of the NameNode and How HDFS Works Chapter 9 HDFS Commands, HDFS Permissions and HDFS Storage Chapter 10 Data Protection, File Formats and Accessing HDFS Chapter 11 NameNode Operations, High Availability and Federation Part IV: Moving Data, Allocating Resources, Scheduling Jobs and Security Chapter 12 Moving Data Into and Out of Hadoop Chapter 13 Resource Allocation in a Hadoop Cluster Chapter 14 Working with Oozie to Manage Job Workflows Chapter 15 Securing Hadoop Part V: Monitoring, Optimization and Troubleshooting Chapter 16 Managing Jobs, Using Hue and Performing Routine Tasks Chapter 17 Monitoring, Metrics and Hadoop Logging Chapter 18 Tuning the Cluster Resources, Optimizing MapReduce Jobs and Benchmarking Chapter 19 Configuring and Tuning Apache Spark on YARN Chapter 20 Optimizing Spark Applications Chapter 21 Troubleshooting Hadoop—A Sampler Chapter 22 Installing VirtualBox and Linux and Cloning the Virtual Machines
上传资源
用户评论