多语言展示
当前在线:792今日阅读:3今日分享:40

基于cloudera CDH5的环境搭建

通过利用hadoop2.3cdh5+hbase0.96.1.1cdh5+zookeeper3.4.5实现在linux环境下的开发环境集群搭配。三个网络互通的机子组成一个局域网,分别为192.168.17.129、192.168.17.128、192.168.17.131。其中192.168.17.129为master,其它两台为slave1和slave2。还需要下载apache-maven-3.0.5,要进行编译hadoop,在$HADOOP_HOME/lib/native下生成需要的libhadoop.a、libhadoop.so、libhadoop.so.1.0.0文件
工具/原料
1

下载hadoop2.3cdh5+hbase0.96.1.1cdh5+zookeeper3.4.5,地址:http://archive-primary.cloudera.com/cdh5/cdh/5/

2

apache-maven-3.0.5,下载地址:http://maven.apache.org/download.cgi

3

jdkjdk1.7.0_51

方法/步骤
1

分别配置三台机子hosts映射文件,并关闭iptables。配置如图,以129为master,128和131为slave1和slave2,并注释掉可能存在的IPV6的配置项信息。 修改三台机器的hostname,改成对应的master、slave1和slave2命令:vi /ets/hostname 通过运行service iptables status来确保三台机器的iptables状态为unrecognized service

2

安装jdk,最好是安装1.7以上的版本,因为1.7以下运行maven3.0.5会有问题。

3

配置3台机器的ssh服务,实现无密码登陆效果。1、增加一个用户组用户,用于hadoop运行及访问。 root@ubuntu:~# sudo addgroup hadoop root@ubuntu:~# sudo adduser –ingroup hadoop hadoop注:建议将hadoop用户加到sodu列表:vi /etc/sudoers  (hadoop  ALL=(ALL)      ALL) 2、    Ⅰ).用hadoop登入master,cd到用户目录下,如/home/hadoop/ 运行ssh-keygen –t rsa  (连续3次回车即可)    Ⅱ).ssh拷贝到其他机器上scp ~/.ssh/id_rsa.pub hadoop@slave1:~/temp_keyscp ~/.ssh/id_rsa.pub hadoop@slave2:~/temp_key     Ⅲ).登入都各server上创建并改变.ssh权限  chmod 700 ~/.ssh     Ⅳ).转换内容及改变权限             cat ~/temp_key >>~/.ssh/authorized_keys        chmod 600 ~/.ssh/authorized_keys    Ⅴ).验证:从master上ssh salve1和slave2 ,看看能不能直接登入,如果直接能登入不需要输入密码,则表示配置成功。

4

配置hadoop。在master上操作以下步骤。1、将hadoop-2.3.0-cdh5.0.0-src.tar.gz解压到/usr/cdh下,配置HADOOP_HOME环境。修改/etc/profile文件,添加export HADOOP_HOME=/usr/cdh/hadoop-2.3.0-cdh5.0.0,在export PATH下添加$HADOOP_HOME/bin:$HADOOP_HOME/sbin的配置;2、修改$HADOOP_HOME/etc/hadoop下的core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml四个配置文件。     core-site.xml                        io.native.lib.available                true                                fs.default.name                hdfs://master:9000                The name of the default file system.Either the literal string 'local' or a host:port for NDFS.                true                                hadoop.tmp.dir                /tmp/hadoop/hadoop-hadoop                hdfs-site.xml                         dfs.namenode.name.dir                /usr/cdh/hadoop/dfs/name                Determines where on the local filesystem the DFS name node should store the name table.If this is a comma-delimited list of directories,then name table is replicated in all of the directories,for redundancy.                true                                 dfs.datanode.data.dir                /usr/cdh/hadoop/dfs/data                Determines where on the local filesystem an DFS data node should store its blocks.If this is a comma-delimited list of directories,then data will be stored in all named directories,typically on different devices.Directories that do not exist are ignored.                                true                                 dfs.replication                1                                 dfs.permission                false                mapred-site.xml                         mapreduce.framework.name                yarn                                 mapreduce.job.tracker                hdfs://master:9001                true                                 mapreduce.map.memory.mb                1536                                 mapreduce.map.java.opts                -Xmx1024M                                 mapreduce.reduce.memory.mb                3072                                 mapreduce.reduce.java.opts                -Xmx1024M                                 mapreduce.task.io.sort.mb                512                                 mapreduce.task.io.sort.factor                100                                 mapreduce.reduce.shuffle.parallelcopies                50                                 mapred.system.dir                /tmp/hadoop/mapred/system                true                                 mapred.local.dir                /tmp/hadoop/mapred/local                true              yarn-site.xml                          yarn.resourcemanager.address                master:8080                                 yarn.resourcemanager.scheduler.address                master:8081                                 yarn.resourcemanager.resource-tracker.address                master:8082                                 yarn.nodemanager.aux-services                mapreduce_shuffle                                 yarn.nodemanager.aux-services.mapreduce.shuffle.class                org.apache.hadoop.mapred.ShuffleHandler           3、 运行前的准备a)     在$HADOOP_HOME/bin下运行hdfs namenode –formatb)     在$HADOOP_HOME/sbin下运行./start-all.sh如遇namenode或datanode无法正常启动,则需要查询日志是否因为namenode和datanode的缓存引起。其缓存文件的位置配置是在$HADOOP_HOME/etc/hadoop/hdfs-site.xml中配置 如果以上在本机运行正常,通过jps命令看到NameNode、SecondaryNameNode、ResourceManager、NodeManager、DataNode,则表示运行正常,配置正常。继续以下操作。  1、修改$HADOOP_HOME/etc/hadoop/savles文件,修改其内容为slave1slave2   2、复制到slave1和slave2。先在slave1和slave2上分别添加/usr/cdh文件夹,然后运行scp -r /usr/cdh/hadoop-2.3.0-cdh5.0.0 hadoop@slave1:/usr/cdhscp -r /usr/cdh/hadoop-2.3.0-cdh5.0.0 hadoop@slave2:/usr/cdh   3、修改slave1和slave2上的/usr/cdh的所属用户和运用权限为700 注******************************************************************:使用hadoop fs -ls 查看文件系统的时候会遇到报错WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable原因是缺少libhadoop.so文件 在src目录或者hadoop-common子项目中重新build,命令:mvn package -DskipTests -Pdist,native,docs -Dtar再次遇到报错[ERROR] class file for org.mortbay.component.AbstractLifeCycle not found这次是遇到BUG了按照https://issues.apache.org/jira/browse/HADOOP-10110官方说明在hadoop-common-project/hadoop-auth/pom.xml文件中添加           org.mortbay.jetty       jetty-util       test     再次编译遇到报错Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project hadoop-common:这是没有安装zlib1g-dev的关系,这个可以 使用apt-get安装 最后把生成的.so文件全部拷贝到lib/native/目录下,再次运行hadoop fs -ls没有报错信息

5

配置zookeeper。以下部署均在master上操作。1、解压zookeeper-3.4.5.tar.gz到/usr/cdh下。修改配置文件$ZOOKEEPER_HOME/conf/zoo.cfg文件,内容如下:# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial# synchronization phase can takeinitLimit=10# The number of ticks that can pass between# sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just# example sakes.dataDir=/usr/cdh/zookeeper-3.4.5/datadataLogDir=/usr/cdh/zookeeper-3.4.5/log# the port at which the clients will connectclientPort=2181## Be sure to read the maintenance section of the# administrator guide before turning on autopurge.##http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to '0' to disable auto purge feature#autopurge.purgeInterval=1 server.1=master:2881:3881server.2=slave1:2882:3882server.3=slave2:2883:3883 建立对应的dataDir和dataLogDir文件夹,在dataDir指定的文件夹内新增名为myid的文件,对应的内容就是server.1/2/3后的数字(1/2/3)。master在$ZOOKEEPER_HOME/data目录下新建myid,其值为1。 2、复制到slave1和slave2。先在slave1和slave2上分别添加/usr/cdh文件夹,然后运行 scp -r /usr/cdh/zookeeper-3.4.5 hadoop@slave1:/usr/cdh scp -r /usr/cdh/zookeeper-3.4.5 hadoop@slave2:/usr/cdh  3、修改slave1和slave2上的/usr/cdh的所属用户和运用权限为700 分别在master、slave1和slave2上运行$ZOOKEEPER_HOME/bin/zkServer.sh文件,运行zookeeper。如果三台机器均正常出现了QuorumPeerMain,则表示zookeeper运行正常。

6

配置hbase。以下配置均在master上操作。1、配置hbase环境。修改/etc/profile文件,添加如下内容:export HBASE_HOME=/usr/cdh/hbase-0.96.1.1-cdh5.0.0,在export PATH=后添加$HBASE_HOME/bin 2、修改$HBASE_HOME/conf/hbase-site.xml文件,修改内容如下:     hbase.rootdir    hdfs://master:9000/hbase    The directory shared by RegionServers.            hbase.master    master:60000      hbase.zookeeper.quorum    master,slave1,slave2      hbase.cluster.distributed    true           hbase.zookeeper.property.dataDir         /usr/cdh/zookeeper-3.4.5/data        3、修改$HBASE_HOME/conf/hbase-env.sh export JAVA_HOME=/usr/java/jdk1.7.0_51 export HBASE_CLASSPATH=/usr/cdh/hadoop-2.3.0-cdh5.0.0/etc/hadoop export HBASE_MANAGES_ZK=false   4、修改$HBASE_HOME/conf/regionservers,内容如下:masterslave1slave2  5、复制到slave1和slave2。先在slave1和slave2上分别添加/usr/cdh文件夹,然后运行 scp -r /usr/cdh/hbase-0.96.1.1-cdh5.0.0 hadoop@slave1:/usr/cdh scp -r /usr/cdh/hbase-0.96.1.1-cdh5.0.0 hadoop@slave2:/usr/cdh  6、修改slave1和slave2上的/usr/cdh的所属用户和运用权限为700

7

以上如果配置正确,接下来就是要测试总体情况了。启动顺序为hadoop>zookeeper>hbase。启动hadoop:在master上运行start-all.sh;启动zookeeper:在三台机器上分别运行/usr/cdh/zookeeper-3.4.5/bin/zkServer.sh;启动hbase:在master上运行star-hbase.sh; 为防止hbase的master单点问题,可在slave1或slave2上运行/usr/cdh/hbase-0.96.1.1-cdh5.0.0/bin/hbase-daemon.sh start master

注意事项

本人环境是ubuntu12.04server 版,期间有很多系统补丁要打。像protobuf-2.5.0、findbugs-2.0.3、cmake-2.8.9-Linux-i386

推荐信息