以下是一个简化的Dockerfile,用于创建一个基于Ubuntu的Hadoop环境:
# 使用Ubuntu官方镜像作为基础镜像
FROM ubuntu:20.04
# 安装OpenJDK 8
RUN apt-get update && \
apt-get install -y openjdk-8-jdk && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# 设置环境变量
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV PATH $PATH:$JAVA_HOME/bin
# 安装Hadoop
RUN apt-get update && \
apt-get install -y wget && \
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz && \
tar -xzf hadoop-3.2.2.tar.gz && \
mv hadoop-3.2.2 /usr/local/hadoop && \
rm hadoop-3.2.2.tar.gz && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# 配置Hadoop环境变量
ENV HADOOP_HOME /usr/local/hadoop
ENV PATH $PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# 配置Hadoop,设置SSH,并复制配置文件
RUN ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' && \
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
chmod 0600 ~/.ssh/authorized_keys && \
echo "127.0.0.1 $(hostname)" >> /etc/hosts
COPY hadoop-env.sh /usr/local/hadoop/etc/hadoop/hadoop-env.sh
COPY core-site.xml /usr/local/hadoop/etc/hadoop/core-site.xml
COPY hdfs-site.xml /usr/local/hadoop/etc/hadoop/hdfs-site.xml
COPY mapred-site.xml /usr/local/hadoop/etc/hadoop/mapred-site.xml
COPY yarn-site.xml /usr/local/hadoop/etc/hadoop/yarn-site.xml
# 格式化HDFS,启动YARN
RUN /usr/local/hadoop/bin/hdfs namenode -format && \
/usr/local/hadoop/sbin/start-dfs.sh && \
/usr/local/hadoop/sbin/start-yarn.sh
# 暴露Hadoop相关端口
EXPOSE 50070 8020 8088 19888
# 容器启动时运行Hadoop
CMD ["/usr/local/hadoop/sbin/start-all.sh"]
这个Dockerfile演示了如何在Ubuntu环境中安装OpenJDK 8,并安装Hadoop。它还包括了配置SSH和Hadoop环境的步骤,并且展示了如何将Hadoop配置文件复制到容器中。最后,它暴露了Hadoop所需的端口,并在容器启动时运行Hadoop。这个示例为学习者提供了一个简明的Hadoop分布式环境配置参考。