Previously I talked about my overall goal about installing a cluster , I started with a simple mesos installation which will mostly a recap of a standard DCOS cluster installation with some minor differences:
* underlying distribution is Ubuntu-14.04 (but I will install mesos via deb packages from mesosphere, thanks to Mesosphere guys for these properly working packages)
* 3 nodes used as both mesos-master and mesos-slave
* mesos-dns is running via marathon in a docker
* I'm trying not to depend on and non-free parts like DCOS' awesome dcos package
feature to install software packages from mesosphere repository.
Planning
Since I have only 6 machines at home, I'll use 3 of those both mesos-master and mesos-slave, this is not ideal but they can work together.
Hosts with mesos-master
(only 3) on it will run these components:
* mesos-master (from Mesosphere deb package)
* zookeeper (from Mesosphere deb package)
* marathon (from Mesosphere deb package)
* chronos (from Mesosphere deb package)
* mesos-dns (via Marathon+docker)
And hosts with mesos-slave
(all 6 hosts) will run these:
* mesos-slave (from Mesosphere deb package, and enable docker)
* docker (installed as suggested by docker)
* haproxy-marathon-bridge (manual install from Marathon github)
Somewhere in cluster:
* docker-registry (via Marathon+docker)
Cluster will look like this:
Preparing hosts
Here are the commands to setup this cluster.
I will have these variables defined before running any command below:
MASTER1_IP=192.168.0.23
MASTER2_IP=192.168.0.15
MASTER3_IP=192.168.0.25
Preparation needed for every host
I'm using ubuntu
user on my hosts, and adding it to docker group. You may need to change it with your username.
And I will also use domains ending with '.mesos', please don't mind that domains for now, I'll explain it under the mesos-dns section.
# Install docker
wget -qO- https://get.docker.com/ | sh
sudo adduser ubuntu docker
sudo sed -i 's/#DOCKER_OPTS.*/DOCKER_OPTS="--insecure-registry docker-registry.marathon.mesos:5000"/' /etc/default/docker
sudo service docker restart
# Install mesos
echo "deb http://repos.mesosphere.io/ubuntu trusty main" | sudo tee /etc/apt/sources.list.d/mesosphere.list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
sudo apt-get update
sudo apt-get install mesos
echo "zk://$MASTER1_IP:2181,$MASTER2_IP:2181,$MASTER3_IP:2181/mesos" | sudo tee /etc/mesos/zk
Packages from Mesosphere are using a mesos-init-wrapper to allow easy configuration of the services, all I need to do is creating some files under /etc/mesos
, /etc/mesos-master
or /etc/mesos/slave
, documentation about how to use these is already in the file.
# Prepare mesos-slave
hostname -i | sudo tee /etc/mesos-slave/hostname
echo docker,mesos | sudo tee /etc/mesos-slave/containerizers
echo 10mins | sudo tee /etc/mesos-slave/executor_registration_timeout
# Preparation for mesos-dns (which I don't have it yet)
echo -e "nameserver $MASTER1_IP\nnameserver $MASTER2_IP" | sudo tee -a /etc/resolvconf/resolv.conf.d/head
sudo service resolvconf restart
# Prepare zookeeper
sed -i \
-e "s/^#server.1.*/server.1=$MASTER1_IP:2888:3888/g" \
-e "s/^#server.2.*/server.2=$MASTER2_IP:2888:3888/g" \
-e "s/^#server.3.*/server.3=$MASTER3_IP:2888:3888/g" \
/etc/zookeeper/conf/zoo.cfg
haproxy-marathon-bridge
will soon be replaced by servicerouter.py
, but I'll go on using haproxy-marathon-bridge
for now since it's easier to install.
# Install haproxy-marathon-bridge, ignore `leader.mesos` part for now
wget https://github.com/mesosphere/marathon/raw/a53a43583cf1f674ca33f6b2f2c3a1d4ff0b9b21/bin/haproxy-marathon-bridge
chmod +x haproxy-marathon-bridge
haproxy-marathon-bridge install_haproxy_system leader.mesos:8080
And lets install jq
,I'm using this later for dealing with JSON responses from API calls
sudo apt-get install jq
Preparation needed for hosts with mesos-master
:
I installed mesos from Mesosphere's mesos
deb package, and that package includes both mesos-master
and mesos-slave
services.
sudo apt-get install chronos marathon
# Since I have 3 mesos-master hosts, quorum size should be 2, I will have a single node tolerance
echo 2 | sudo tee /etc/mesos-master/quorum
# This number should be different for each master node, I used 1, 2 and 3 for each
echo 1 | sudo tee /etc/zookeeper/conf/myid
# I'm setting an mesos-slave attribute for nodes also runs mesos-master stack
echo true | sudo tee /etc/mesos-slave/attributes/master
# (Re)Start services
sudo service restart zookeeper
sudo service restart mesos-master
sudo service restart mesos-slave
sudo service restart marathon
sudo service restart chronos
Preparation needed for only mesos-slave
hosts:
For the slave nodes, I need to explicitly disable mesos-master
service, since Mesosphere's mesos deb package enables it by default. zookeeper
package is also installed on slaves because it's recommended by mesos
package, I also need to disable it's service.
# Prevent start of mesos-master and zookeeper services on these hosts
echo manual | sudo tee /etc/init/mesos-master.override
sudo service stop mesos-master
echo manual | sudo tee /etc/init/zookeeper.override
sudo service stop zookeeper
And I have a running cluster, I can reach both mesos and marathon with using any of the master node's ip address, since they are all running both:
* mesos: http://$MASTER1_IP:5050
* marathon: http://$MASTER1_IP:8080
mesos-dns
mesos-dns
and haproxy
are marathon
's suggested methods for service discovery
Mesosphere already prepared a mesos-dns docker image but current image forces me to create a config file first and upload it somewhere and let marathon pull that file before running the container.This introduces another problem, if I need to update some parameters in mesos-dns config, I need to update the remotely hosted file and restart service. This adds an extra step to managing a simple service.
Mesos-dns does not need many configuration parameters, so all configuration parameters can easily be passed from environment variables. I found that mesos-dns docker image is generated by a different repository called mesos-dns-pkg to unify mesos-dns package creation for various environments. I tried to hack around this repo to add this feature to upstream repo so that I'll have free updates for later versions, but after looking around the the build process in that repo I gave up, created my own mesos-dns image instead and created an issue for upstream repo.
Now I can just start mesos-dns service on marathon using some environment variables without dealing with config file hassle.
cat <<EOF > mesos-dns-marathon.json
{
"cpus": 0.1,
"mem": 30,
"id": "/mesos-dns",
"instances": 3,
"constraints": [
["master", "CLUSTER", "true"],
["hostname", "UNIQUE"]
],
"env": {
"MESOS_DNS_ZK": "zk://$MASTER1_IP:2181,$MASTER2_IP:2181,$MASTER3_IP:2181/mesos",
"MESOS_DNS_RESOLVERS": "8.8.8.8",
"MESOS_DNS_REFRESHSECONDS": "10",
"MESOS_DNS_TTL": "10",
},
"container": {
"docker": { "image": "bergerx/mesos-dns" }
},
"healthChecks": [{
"protocol": "COMMAND",
"command": { "value": "dig A leader.mesos @$HOST | grep 'status: NOERROR'" }
}],
"upgradeStrategy": {
"minimumHealthCapacity": 0.5,
"maximumOverCapacity": 0
}
}
EOF
There are several tricks needed to be uncovered here but for keeping this post sane, I'll leave them for a later blog post. But for now in this example the combination of instances
and constraints
will allow me to run 3 instances of mesos-dns
on my master nodes. And lets post this json to marathon:
curl \
-X POST \
-H 'Content-type: application/json' \
-d@mesos-dns-marathon.json \
$MASTER1_IP:8080/v2/apps | jq .
And let's see the service in marathon interface:
http://$MASTER1_IP:8080
In my case I can reach the host with using it's IP address,if you are trying this in a cloud environment you may need to map a public IP to the host and allow port 8080.
And test if it's running:
$ dig +short leader.mesos @$MASTER1_IP
192.168.0.27
Now prepare each hosts to use this DNS servers:
echo -e "nameserver $MASTER1_IP\nnameserver $MASTER2_IP" | sudo tee -a /etc/resolvconf/resolv.conf.d/head
sudo service resolvconf restart
From now on I don't need to use IP addresses for anything, I can use *.mesos
domain name to reach services in my cluster. mesos-dns also supports SRV records, which can also gives you the port number of each instance (3rd field) and hence can be used as a service discovery mechanism. You can see the DNS name mapping rules in mesos-dns naming docs.
For example lets check where is mesos-dns instances are running:
$ dig +short mesos-dns.marathon.mesos
192.168.0.23
192.168.0.15
192.168.0.25
$ dig +short SRV _mesos-dns._tcp.marathon.mesos
0 0 31551 mesos-dns-3305-s0.marathon.mesos.
0 0 31574 mesos-dns-26005-s0.marathon.mesos.
0 0 31281 mesos-dns-47049-s3.marathon.mesos.
$
From now on no more need for using IP addresses around, I can use DNS names like:
* leader.mesos
* <marathon-id>.marathon.mesos
docker-registry
And now I can create my own docker-registry
service using DNS names instead of IP addresses.
Since I don't have a persistent storage solution myself, for now I'll use S3 back-end for docker registry.
First I created an S3 bucket called bdgn-docker-registry
and created an IAM user to reach it. You need to replace S3 parameters below with your bucket and IAM user. I will replace this S3 back-end with a local one once I have it.
$ cat docker-registry-aws-marathon.json
{
"cpus": 0.01,
"mem": 30,
"id": "/docker-registry",
"instances": 1,
"env": {
"REGISTRY_STORAGE": "s3",
"REGISTRY_STORAGE_S3_ACCESSKEY": "AKIA***",
"REGISTRY_STORAGE_S3_SECRETKEY": "***",
"REGISTRY_STORAGE_S3_BUCKET": "bdgn-docker-registry",
"REGISTRY_STORAGE_S3_REGION": "us-east-1",
"REGISTRY_STORAGE_S3_ROOTDIRECTORY": "registry"
},
"container": {
"docker": {
"image": "registry:2",
"network": "BRIDGE",
"portMappings": [{"containerPort": 5000, "servicePort": 5000}]
}
},
"healthChecks": [{
"protocol": "COMMAND",
"command": {"value": "curl -f http://$HOST:$PORT0/v2/ | grep '{}'"}
}]
}
$
And post this to marathon:
curl \
-X POST \
-H 'Content-type: application/json' \
-d@docker-registry-aws-marathon.json \
leader.mesos:8080/v2/apps | jq .
Since the service-id is /docker-registry
, DNS name for this service is docker-registry.marathon.mesos
. Let's check DNS records:
$ dig +short docker-registry.marathon.mesos
192.168.0.27
$ dig +short SRV _docker-registry._tcp.marathon.mesos
0 0 31793 docker-registry-55352-s3.marathon.mesos.
$ dig SRV _docker-registry._tcp.marathon.mesos
; <<>> DiG 9.9.5-3ubuntu0.4-Ubuntu <<>> SRV _docker-registry._tcp.marathon.mesos
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42447
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; QUESTION SECTION:
;_docker-registry._tcp.marathon.mesos. IN SRV
;; ANSWER SECTION:
_docker-registry._tcp.marathon.mesos. 10 IN SRV 0 0 31793 docker-registry-55352-s3.marathon.mesos.
;; ADDITIONAL SECTION:
docker-registry-55352-s3.marathon.mesos. 10 IN A 192.168.0.27
;; Query time: 4 msec
;; SERVER: 192.168.0.23#53(192.168.0.23)
;; WHEN: Tue Aug 11 05:04:39 IST 2015
;; MSG SIZE rcvd: 172
Now I have docker-registry running, and here is my first push to my registry:
ubuntu@ubuntu1:~$ docker pull hello-world
latest: Pulling from hello-world
535020c3e8ad: Pull complete
af340544ed62: Already exists
hello-world:latest: The image you are pulling has been verified. Important: image verification is a tech preview feature and should not be relied on to provide security.
Digest: sha256:d5fbd996e6562438f7ea5389d7da867fe58e04d581810e230df4cc073271ea52
Status: Downloaded newer image for hello-world:latest
ubuntu@ubuntu1:~$ docker tag hello-world docker-registry.marathon.mesos:5000/hello-world
ubuntu@ubuntu1:~$ docker push docker-registry.marathon.mesos:5000/hello-world
The push refers to a repository [docker-registry.marathon.mesos:5000/hello-world] (len: 1)
af340544ed62: Image already exists
535020c3e8ad: Image successfully pushed
Digest: sha256:a08871794872b68a429231d4c53a894881974ea58dc75a5479b9104478a4ae6f
ubuntu@ubuntu1:~$
And try running this container on another host:
ubuntu@ubuntu4:~$ docker run docker-registry.marathon.mesos:5000/hello-world
Unable to find image 'docker-registry.marathon.mesos:5000/hello-world:latest' locally
latest: Pulling from docker-registry.marathon.mesos:5000/hello-world
535020c3e8ad: Pull complete
af340544ed62: Already exists
Digest: sha256:a08871794872b68a429231d4c53a894881974ea58dc75a5479b9104478a4ae6f
Status: Downloaded newer image for docker-registry.marathon.mesos:5000/hello-world:latest
Hello from Docker.
This message shows that your installation appears to be working correctly.
...
In this example we also need to uncover the port mappings between inside the container, on the host and the service level on the haproxy. But I'll leave this to another blog post as well.
And I did try killing the container several times and every time I tried it, a new container showed up in another random host again.
ubuntu@ubuntu4:~$ dig +short docker-registry.marathon.mesos
192.168.0.27
ubuntu@ubuntu4:~$
bekir@laptop:~$ ssh ubuntu@192.168.0.27
ubuntu@ubuntu5:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e5eae4eaa21c registry:2 "registry cmd/regist 20 hours ago Up 20 hours 0.0.0.0:31793->5000/tcp mesos-20150630-030502-436250816-5050-18122-S3.adecda4e-f3c5-4f79-9d77-6a1c5b3b7a74
ubuntu@ubuntu5:~$ docker kill e5eae4eaa21c
e5eae4eaa21c
ubuntu@ubuntu5:~$
I'll then explain how I'm running OpenVPN on this cluster.
A DevOps guy with peculiar aura and an inappeasable appetite for all wonderful niche technologies