Skip to content

Initial runtime docker image files #707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
updated for dotnet-spark version 1.0.0
  • Loading branch information
indy committed Oct 21, 2020
commit 3eff302c71b1c9c7af8e8328cd870c0d7e717491
13 changes: 6 additions & 7 deletions docker/images/runtime/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ By using these images, you can run and debug your .NET for Apache Spark projects

If you do not want to build those images yourself, you can get our pre-built images directly from docker hub at [https://hub.docker.com/r/3rdman/dotnet-spark](https://hub.docker.com/r/3rdman/dotnet-spark).

Additional information on how to use the images can be found under [3rdman.de](https://3rdman.de/tag/net-for-apache-spark/), or the docker hub page mentioned above.
Additional information on how to use the images can be found at [3rdman.de](https://3rdman.de/tag/net-for-apache-spark/), or the docker hub page mentioned above.

## Building

Expand All @@ -31,7 +31,6 @@ For more details please run
build.sh -h
```


Please note, that not all version combinations are supported, however.

## The image build stages
Expand All @@ -45,7 +44,7 @@ The three stages used in the build process are:

Downloads and installs the specified .NET Core SDK into a base Ubuntu 18.04 image along with some other tools that might be required by later stages or for debugging. The resulting image is tagged with the .NET Core version number.

- ### **dotnet-spark-runtime-base**
- ### **dotnet-spark-base (runtime)**

Adds the specified .NET for Apache Spark version to the dotnet-sdk image and also copies/builds the HelloSpark example into the image. HelloSpark is also use to install the correct microsoft-spark-*.jar version that is required for using the image for debugging [debugging .NET for Apache Spark](https://docs.microsoft.com/en-us/dotnet/spark/how-to-guides/debug) via Visual Studio, or Visual Studio Code.

Expand All @@ -63,25 +62,25 @@ As mentioned earlier, the dotnet-spark runtime image can be used in multiple way
- ### master and one slave in a single container

```bash
docker run -d --name dotnet-spark -p 8080:8080 -p 8081:8081 -e SPARK_DEBUG_DISABLED=true mcr.microsoft.com/dotnet-spark:runtime-latest
docker run -d --name dotnet-spark -p 8080:8080 -p 8081:8081 -e SPARK_DEBUG_DISABLED=true 3rdman/dotnet-spark:latest
```

- ### master and two slaves in a single container

```bash
docker run -d --name dotnet-spark -p 8080:8080 -p 8081:8081 -p 8081:8081 -e SPARK_DEBUG_DISABLED=true -e SPARK_WORKER_INSTANCES=2 mcr.microsoft.com/dotnet-spark:runtime-latest
docker run -d --name dotnet-spark -p 8080:8080 -p 8081:8081 -p 8081:8081 -e SPARK_DEBUG_DISABLED=true -e SPARK_WORKER_INSTANCES=2 3rdman/dotnet-spark:latest
```

- ### master only

```bash
docker run -d --name dotnet-spark-master -p 8080:8080 -p 7077:7077 -e SPARK_DEBUG_DISABLED=true -e SPARK_WORKER_INSTANCES=0 mcr.microsoft.com/dotnet-spark:runtime-latest
docker run -d --name dotnet-spark-master -p 8080:8080 -p 7077:7077 -e SPARK_DEBUG_DISABLED=true -e SPARK_WORKER_INSTANCES=0 3rdman/dotnet-spark:latest
```

- ### slave only, connecting to external master

```bash
docker run -d --name dotnet-spark-slave -p 8080:8080 -e SPARK_DEBUG_DISABLED=true -e SPARK_MASTER_DISABLED=true -e SPARK_MASTER_URL="spark://master-hostname:7077" mcr.microsoft.com/dotnet-spark:runtime-latest
docker run -d --name dotnet-spark-slave -p 8080:8080 -e SPARK_DEBUG_DISABLED=true -e SPARK_MASTER_DISABLED=true -e SPARK_MASTER_URL="spark://master-hostname:7077" 3rdman/dotnet-spark:latest
```

For details about how to use the image for .NET for Apache Spark debugging, please have a look at one of the following posts:
Expand Down
37 changes: 0 additions & 37 deletions docker/images/runtime/apache-spark/Dockerfile

This file was deleted.

58 changes: 37 additions & 21 deletions docker/images/runtime/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,19 @@ set -o errexit # abort on nonzero exitstatus
set -o nounset # abort on unbound variable
set -o pipefail # don't hide errors within pipes

readonly image_repository='mcr.microsoft.com'
readonly supported_apache_spark_versions=("2.3.3" "2.3.4" "2.4.0" "2.4.1" "2.4.3" "2.4.4" "2.4.5" "2.4.6")
readonly supported_dotnet_spark_versions=("0.9.0" "0.10.0" "0.11.0" "0.12.1")
readonly image_repository='3rdman'
readonly supported_apache_spark_versions=(
"2.3.0" "2.3.1" "2.3.2" "2.3.3" "2.3.4"
"2.4.0" "2.4.1" "2.4.3" "2.4.4" "2.4.5" "2.4.6" "2.4.7"
"3.0.0" "3.0.1"
)
readonly supported_dotnet_spark_versions=("1.0.0")
readonly dotnet_core_version=3.1

dotnet_spark_version=0.12.1
apache_spark_version=2.4.6
dotnet_spark_version=1.0.0
apache_spark_version=3.0.1
apache_spark_short_version="${apache_spark_version:0:3}"
scala_version=2.11

main() {
# Parse the options an set the related variables
Expand All @@ -34,8 +39,9 @@ main() {
# execute the different build stages
cleanup

set_scala_version
build_dotnet_sdk
build_dotnet_spark_runtime_base
build_dotnet_spark_base_runtime
build_dotnet_spark_runtime

trap finish EXIT ERR
Expand Down Expand Up @@ -113,6 +119,16 @@ replace_text_in_file() {
sh -c 'sed -i.bak "s/$1/$2/g" "$3" && rm "$3.bak"' _ "${search_string}" "${replacement_string}" "${filename}"
}

#######################################
# Sets the Scala version depending on the Apache Spark version
#######################################
set_scala_version() {
case "${apache_spark_version:0:1}" in
2) scala_version=2.11 ;;
3) scala_version=2.12 ;;
esac
}

#######################################
# Runs the docker build command with the related build arguments
# Arguments:
Expand Down Expand Up @@ -144,43 +160,45 @@ build_dotnet_sdk() {
}

#######################################
# Use the Dockerfile in the sub-folder dotnet-spark to build the image of the second stage
# Use the Dockerfile in the sub-folder dotnet-spark-base to build the image of the second stage
# The image contains the specified .NET for Apache Spark version plus the HelloSpark example
# for the correct TargetFramework and Microsoft.Spark package version
# Result:
# A dotnet-spark-runtime-base docker image tagged with the .NET for Apache Spark version
# A dotnet-spark-base-runtime docker image tagged with the .NET for Apache Spark version
#######################################
build_dotnet_spark_runtime_base() {
local image_name="dotnet-spark-runtime-base:${dotnet_spark_version}"
build_dotnet_spark_base_runtime() {
local image_name="dotnet-spark-base-runtime:${dotnet_spark_version}"
local msspark_short_string=${apache_spark_short_version//./-}

cd dotnet-spark
cd dotnet-spark-base
cp --recursive templates/HelloSpark ./HelloSpark

replace_text_in_file HelloSpark/HelloSpark.csproj "<TargetFramework><\/TargetFramework>" "<TargetFramework>netcoreapp${dotnet_core_version}<\/TargetFramework>"
replace_text_in_file HelloSpark/HelloSpark.csproj "PackageReference Include=\"Microsoft.Spark\" Version=\"\"" "PackageReference Include=\"Microsoft.Spark\" Version=\"${dotnet_spark_version}\""

replace_text_in_file HelloSpark/README.txt "netcoreappX.X" "netcoreapp${dotnet_core_version}"
replace_text_in_file HelloSpark/README.txt "spark-X.X.X" "spark-${apache_spark_short_version}.x"
replace_text_in_file HelloSpark/README.txt "spark-${apache_spark_short_version}.x-X.X.X.jar" "spark-${apache_spark_short_version}.x-${dotnet_spark_version}.jar"
replace_text_in_file HelloSpark/README.txt "microsoft-spark-${apache_spark_short_version}.x-X.X.X.jar" "microsoft-spark-${msspark_short_string}_${scala_version}-${dotnet_spark_version}.jar"

build_image "${image_name}"
cd ~-

}

#######################################
# Use the Dockerfile in the sub-folder apache-spark to build the image of the last stage
# Use the Dockerfile in the sub-folder dotnet-spark to build the image of the last stage
# The image contains the specified Apache Spark version
# Result:
# A dotnet-spark docker image tagged with the Apache Spark version, .NET for Apache Spark version and the suffix -runtime
# A dotnet-spark docker image tagged with the .NET for Apache Spark version and the Apache Spark version.
#######################################
build_dotnet_spark_runtime() {
local image_name="${image_repository}/dotnet-spark:${apache_spark_version}-${dotnet_spark_version}-runtime"
local image_name="${image_repository}/dotnet-spark:${dotnet_spark_version}-${apache_spark_version}"
local msspark_short_string=${apache_spark_short_version//./-}

cd apache-spark
cd dotnet-spark
cp --recursive templates/scripts ./bin

replace_text_in_file bin/start-spark-debug.sh "microsoft-spark-X.X.X" "microsoft-spark-${apache_spark_short_version}.x"
replace_text_in_file bin/start-spark-debug.sh "microsoft-spark-X.X.X" "microsoft-spark-${msspark_short_string}_${scala_version}"

build_image "${image_name}"
cd ~-
Expand All @@ -192,12 +210,12 @@ build_dotnet_spark_runtime() {
cleanup()
{
(
cd apache-spark
cd dotnet-spark
rm --recursive --force bin
)

(
cd dotnet-spark
cd dotnet-spark-base
rm --recursive --force HelloSpark
)
}
Expand All @@ -209,8 +227,6 @@ finish()
exit ${result}
}



#######################################
# Display the help text
#######################################
Expand Down
15 changes: 11 additions & 4 deletions docker/images/runtime/dotnet-sdk/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,21 @@ ENV DOTNET_CORE_VERSION=$DOTNET_CORE_VERSION
ENV DOTNET_CLI_TELEMETRY_OPTOUT=1

RUN apt-get update \
&& apt-get install -y dialog apt-utils wget ca-certificates openjdk-8-jdk bash software-properties-common supervisor unzip socat net-tools vim \
&& apt-get install -y --no-install-recommends \
apt-utils \
ca-certificates \
dialog \
openjdk-8-jdk \
socat \
software-properties-common \
supervisor \
unzip \
wget \
&& wget -q --show-progress --progress=bar:force:noscroll https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb \
&& dpkg -i packages-microsoft-prod.deb \
&& add-apt-repository universe \
&& apt-get install -y apt-transport-https \
&& apt-get update \
&& apt-get install -y dotnet-sdk-$DOTNET_CORE_VERSION \
&& apt-get autoremove -y --purge \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean && rm -rf /var/lib/apt/lists/* \
&& rm -rf packages-microsoft-prod.deb
24 changes: 24 additions & 0 deletions docker/images/runtime/dotnet-spark-base/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
ARG DOTNET_CORE_VERSION=3.1
FROM dotnet-sdk:$DOTNET_CORE_VERSION

ARG DOTNET_SPARK_VERSION=1.0.0
ENV DOTNET_SPARK_VERSION=$DOTNET_SPARK_VERSION
ENV DOTNET_WORKER_DIR=/dotnet/Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION}

RUN mkdir -p /dotnet/HelloSpark \
&& mkdir -p /dotnet/Debug/netcoreapp${DOTNET_CORE_VERSION} \
&& mkdir /tempdata \
&& chmod 777 /tempdata \
&& wget -q --show-progress --progress=bar:force:noscroll https://github.com/dotnet/spark/releases/download/v${DOTNET_SPARK_VERSION}/Microsoft.Spark.Worker.netcoreapp${DOTNET_CORE_VERSION}.linux-x64-${DOTNET_SPARK_VERSION}.tar.gz \
&& tar -xvzf Microsoft.Spark.Worker.netcoreapp${DOTNET_CORE_VERSION}.linux-x64-${DOTNET_SPARK_VERSION}.tar.gz \
&& mv Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION} /dotnet/ \
&& cp /dotnet/Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION}/Microsoft.Spark.Worker /dotnet/Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION}/Microsoft.Spark.Worker.exe \
&& chmod 755 /dotnet/Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION}/Microsoft.Spark.Worker \
&& chmod 755 /dotnet/Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION}/Microsoft.Spark.Worker.exe \
&& rm Microsoft.Spark.Worker.netcoreapp${DOTNET_CORE_VERSION}.linux-x64-${DOTNET_SPARK_VERSION}.tar.gz

COPY HelloSpark /dotnet/HelloSpark

RUN cd /dotnet/HelloSpark \
&& dotnet build \
&& cp /dotnet/HelloSpark/bin/Debug/netcoreapp${DOTNET_CORE_VERSION}/microsoft-spark-*.jar /dotnet/Debug/netcoreapp${DOTNET_CORE_VERSION}/
51 changes: 31 additions & 20 deletions docker/images/runtime/dotnet-spark/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,26 +1,37 @@
ARG DOTNET_CORE_VERSION=3.1
FROM dotnet-sdk:$DOTNET_CORE_VERSION
ARG DOTNET_SPARK_VERSION=1.0.0
FROM dotnet-spark-base-runtime:$DOTNET_SPARK_VERSION

ARG DOTNET_SPARK_VERSION=0.12.1
ENV DOTNET_SPARK_VERSION=$DOTNET_SPARK_VERSION
ENV DOTNET_WORKER_DIR=/dotnet/Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION}
ARG SPARK_VERSION=3.0.1

RUN mkdir -p /dotnet/HelloSpark \
&& mkdir -p /dotnet/Debug/netcoreapp${DOTNET_CORE_VERSION} \
&& mkdir /tempdata \
&& chmod 777 /tempdata \
&& wget -q --show-progress --progress=bar:force:noscroll https://github.com/dotnet/spark/releases/download/v${DOTNET_SPARK_VERSION}/Microsoft.Spark.Worker.netcoreapp${DOTNET_CORE_VERSION}.linux-x64-${DOTNET_SPARK_VERSION}.tar.gz \
&& tar -xvzf Microsoft.Spark.Worker.netcoreapp${DOTNET_CORE_VERSION}.linux-x64-${DOTNET_SPARK_VERSION}.tar.gz \
&& mv Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION} /dotnet/ \
&& cp /dotnet/Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION}/Microsoft.Spark.Worker /dotnet/Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION}/Microsoft.Spark.Worker.exe \
&& chmod 755 /dotnet/Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION}/Microsoft.Spark.Worker \
&& chmod 755 /dotnet/Microsoft.Spark.Worker-${DOTNET_SPARK_VERSION}/Microsoft.Spark.Worker.exe \
&& rm Microsoft.Spark.Worker.netcoreapp${DOTNET_CORE_VERSION}.linux-x64-${DOTNET_SPARK_VERSION}.tar.gz
ENV DAEMON_RUN=true
ENV SPARK_VERSION=$SPARK_VERSION
ENV SPARK_HOME=/spark

COPY HelloSpark /dotnet/HelloSpark
ENV SPARK_MASTER_PORT 7077
ENV SPARK_MASTER_WEBUI_PORT 8080
ENV SPARK_MASTER_DISABLED=""
ENV SPARK_MASTER_URL=""

RUN cd /dotnet/HelloSpark \
&& dotnet build \
&& cp /dotnet/HelloSpark/bin/Debug/netcoreapp${DOTNET_CORE_VERSION}/microsoft-spark-*.jar /dotnet/Debug/netcoreapp${DOTNET_CORE_VERSION}/
ENV SPARK_WORKER_PORT 7078
ENV SPARK_WORKER_WEBUI_PORT 8081
ENV SPARK_WORKER_INSTANCES=1

ENV SPARK_SUBMIT_PACKAGES=""
ENV SPARK_DEBUG_DISABLED=""
ENV HADOOP_VERSION=2.7
ENV PATH="${SPARK_HOME}/bin:${DOTNET_WORKER_DIR}:${PATH}"

COPY bin/* /usr/local/bin/
COPY supervisor.conf /etc/supervisor.conf

RUN wget -q --show-progress --progress=bar:force:noscroll https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
&& tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
&& mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \
&& rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
&& chmod 755 /usr/local/bin/start-spark-slave.sh \
&& chmod 755 /usr/local/bin/start-spark-master.sh \
&& chmod 755 /usr/local/bin/start-spark-debug.sh

EXPOSE 8080 8081 8082 7077 6066 5567 4040

CMD ["supervisord", "-c", "/etc/supervisor.conf"]