Hadoop源码编译 Hadoop源码编译的意义 重新编译后的Hadoop源码更符合具体的操作系统,同时重新编译的Hadoop源码还能够根据不同的需求实现某些功能,例如支持snappy压缩等。 Hadoop官网从2.5版本开始就提供了编译好的64位的源码。
如何查看Hadoop官网提供的源码版本 我们从官网下载好对应的安装包上传到Linux服务器解压,进入$hadoop_home/lib/native,使用file命令即可查看。 这里以hadoop-2.4.0.tar.gz和hadoop-2.9.2.tar.gz为例。
1 2 3 4 5 6 7 8 9 10 11 12 COPY # [root@hadoop01 tools]# tar -zxvf hadoop-2.4.0.tar.gz -C /opt/ [root@hadoop01 native]# pwd /opt/hadoop-2.4.0/lib/native [root@hadoop01 native]# tar -zxvf hadoop-2.9.2.tar.gz -C /opt/ [root@hadoop01 native]# pwd /opt/hadoop-2.9.2/lib/native # [root@hadoop01 native]# file libhadoop.so.1.0.0 libhadoop.so.1.0.0: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, BuildID[sha1]=c85266df5aef5060a7072c125dc51fffe4fad456, not stripped [root@hadoop01 native]# file libhadoop.so.1.0.0 libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=29b9c1d91d6663329facd486ded93c77c5341442, not stripped
源码编译 下面基于Linux CentOS7环境下对Hadoop进行源码编译。
下载tar包 官网地址:https://archive.apache.org/dist/hadoop/common/ 根据个人实际情况选择合适的tar包,这里我们选择hadoop-2.9.2-src.tar.gz
解压并查看BUILDING.txt 上传到Linux服务器,解压并查看BUILDING.txt文件。
1 COPY tar -xvf hadoop-2.9.2-src.tar.gz
根据BUILDING.txt文件内容,我们首先准备相关编译环境
Requirements:
Unix System
JDK 1.7 or 1.8
Maven 3.0 or later
Findbugs 1.3.9 (if running findbugs)
ProtocolBuffer 2.5.0
CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
Zlib devel (if compiling native code)
openssl devel (if compiling native hadoop-pipes and to get the best HDFS encryption performance)
Linux FUSE (Filesystem in Userspace) version 2.6 or above (if compiling fuse_dfs)
Internet connection for first build (to fetch all Maven and Hadoop dependencies)
python (for releasedocs)
Node.js / bower / Ember-cli (for YARN UI v2 building)
安装JDK 上传相关tar包到指定路径,解压并配置环境变量
1 2 3 4 5 6 7 8 9 10 11 12 13 COPY # tar -zxvf jdk-8u162-linux-x64.tar.gz -C /opt/ # export JAVA_HOME=/opt/jdk1.8.0_162 export PATH=$JAVA_HOOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar # [root@hadoop01 jdk1.8.0_162]# source /etc/profile # [root@hadoop01 jdk1.8.0_162]# java -version java version "1.8.0_162" Java(TM) SE Runtime Environment (build 1.8.0_162-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
安装Maven 上传相关tar包到指定路径,解压并配置环境变量
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 COPY # tar -zxvf apache-maven-3.6.0-bin.tar.gz -C /opt/ # export JAVA_HOME=/opt/jdk1.8.0_162 export MAVEN_HOME=/opt/apache-maven-3.6.0 export PATH=$MAVEN_HOME/bin:$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar # [root@hadoop01 apache-maven-3.6.0]# source /etc/profile # [root@hadoop01 apache-maven-3.6.0]# mvn -version Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-25T02:41:47+08:00) Maven home: /opt/apache-maven-3.6.0 Java version: 1.8.0_162, vendor: Oracle Corporation, runtime: /opt/jdk1.8.0_162/jre Default locale: zh_CN, platform encoding: UTF-8 OS name: "linux", version: "3.10.0-693.el7.x86_64", arch: "amd64", family: "unix"
修改settings.xml配置文件,将远程仓库换成国内阿里云镜像仓库,提高效率。
1 2 3 4 5 6 COPY <mirror> <id>nexus-aliyun</id> <mirrorOf>central</mirrorOf> <name>Nexus aliyun</name> <url>http://maven.aliyun.com/nexus/content/groups/public</url> </mirror>
安装Findbugs 1.3.9 注意:这里明确指定了版本信息,我们下载对应版本。 官网地址:https://sourceforge.net/projects/findbugs/files/findbugs/1.3.9/
1 2 3 4 5 6 7 8 9 10 11 12 13 COPY # tar -zxvf findbugs-1.3.9.tar.gz -C /opt/ # export JAVA_HOME=/opt/jdk1.8.0_162 export MAVEN_HOME=/opt/apache-maven-3.6.0 export FINDBUGS_HOME=/opt/findbugs-1.3.9 export PATH=$FINDBUGS_HOME/bin:$MAVEN_HOME/bin:$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar # [root@hadoop01 findbugs-1.3.9]# source /etc/profile # [root@hadoop01 findbugs-1.3.9]# findbugs -version 1.3.9
安装ProtocolBuffer 2.5.0 官网地址:https://github.com/protocolbuffers/protobuf/releases/tag/v2.5.0
1 2 COPY # [root@hadoop01 tools]# tar -zxvf protobuf-2.5.0.tar.gz
提前安装相关依赖 注意:这里需要提前安装gcc等相关依赖,否则在编译的过程中会报错。
1 2 COPY [root@hadoop01 protobuf-2.5.0]# yum -y install gcc gcc-c++ [root@hadoop01 protobuf-2.5.0]# yum install -y autoconf automake libtool curl
错误信息如下
1 2 3 4 5 6 COPY checking for gcc... no checking for cc... no checking for cl.exe... no configure: error: in `/opt/tools/protobuf-2.5.0': configure: error: no acceptable C compiler found in $PATH See `config.log' for more details
安装好依赖库之后继续执行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 COPY # [root@hadoop01 protobuf-2.5.0]# ./configure --prefix=/opt/protobuf-2.5.0 [root@hadoop01 protobuf-2.5.0]# make [root@hadoop01 protobuf-2.5.0]# make install # [root@hadoop01 ~]# ls /opt/protobuf-2.5.0/ bin include lib # export JAVA_HOME=/opt/jdk1.8.0_162 export MAVEN_HOME=/opt/apache-maven-3.6.0 export FINDBUGS_HOME=/opt/findbugs-1.3.9 export PROTOC_HOME=/opt/protobuf-2.5.0 export PATH=$PROTOC_HOME/bin:$FINDBUGS_HOME/bin:$MAVEN_HOME/bin:$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar # [root@hadoop01 protobuf-2.5.0]# source /etc/profile # [root@hadoop01 protobuf-2.5.0]# protoc --version libprotoc 2.5.0
安装cmake、zlib-devel、openssl、openssl-devel 根据Building.txt说明,还需要安装相关依赖。
1 COPY [root@hadoop01 opt]# yum -y install cmake zlib-devel openssl openssl-devel
编译Hadoop源码 1 2 3 4 5 6 7 8 9 10 COPY [root@hadoop01 conf]# cd /opt/hadoop-2.9.2-src/ # [root@hadoop01 hadoop-2.9.2-src]# mvn clean package -Pdist,native -DskipTests -Dtar # [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 16:09 min [INFO] Finished at: 2019-08-09T19:03:59+08:00 [INFO] ------------------------------------------------------------------------
查看编译成功之后的文件 在Hadoop源码编译成功之后,进入hadoop-2.9.2-src的hadoop-dist/target目录,可以看到我们编译成功的Hadoop文件。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 COPY [root@hadoop01 target]# cd /opt/hadoop-2.9.2-src/hadoop-dist/target/ [root@hadoop01 target]# ll -h 总用量 898M drwxr-xr-x. 2 root root 28 8月 9 19:02 antrun drwxr-xr-x. 3 root root 22 8月 9 19:02 classes -rw-r--r--. 1 root root 2.2K 8月 9 19:02 dist-layout-stitching.sh -rw-r--r--. 1 root root 634 8月 9 19:03 dist-tar-stitching.sh drwxr-xr-x. 9 root root 149 8月 9 19:03 hadoop-2.9.2 -rw-r--r--. 1 root root 299M 8月 9 19:03 hadoop-2.9.2.tar.gz -rw-r--r--. 1 root root 31K 8月 9 19:03 hadoop-dist-2.9.2.jar -rw-r--r--. 1 root root 599M 8月 9 19:03 hadoop-dist-2.9.2-javadoc.jar -rw-r--r--. 1 root root 29K 8月 9 19:03 hadoop-dist-2.9.2-sources.jar -rw-r--r--. 1 root root 29K 8月 9 19:03 hadoop-dist-2.9.2-test-sources.jar drwxr-xr-x. 2 root root 51 8月 9 19:03 javadoc-bundle-options drwxr-xr-x. 2 root root 28 8月 9 19:03 maven-archiver drwxr-xr-x. 3 root root 22 8月 9 19:02 maven-shared-archive-resources drwxr-xr-x. 3 root root 22 8月 9 19:02 test-classes drwxr-xr-x. 2 root root 6 8月 9 19:02 test-dir
编译过程中出现的问题 最常见的情况就是编译过程中下载某个jar包或者pom文件的时候卡死,建议clean之后重新编译。
打赏