2

Jenkins Part 4.1: Functional Java Tests via JUnit


2016-11-30-18_19_38

You also think that functional tests are one of the most important ingredients for delivering high quality software? You share my opinion that we should help the developer automating this task in order to get comparable results and to receive meaningful trend reports?

I will cover functional tests here. Instructions on how to perform code quality tests and performance tests are in draft status and will be covered in the next two blog posts.

Any questions and/or comments are highly welcome.

Introduction

As a developer you try hard to deliver high quality software.

You hate searching for this nasty bug that had been introduced unnoticed days ago. Or was it weeks ago? By whom? In which code?

Manual functional and performance testing after each commited code change quickly becomes a NO-GO as the number of features is rising constantly. In this blog post, we will show, how Jenkins can help you with both: delivering high quality software and minimizing the time needed to find the cause of a bug.

How about …

  1. creating automated tests for each functionality and performance at different levels (end to end, and unit tests)
  2. running the automated tests after each code change
  3. keeping track of the test results

… in order to avoid any bad surprises late in the game?

Okay: for 1., the developer needs to create automated functional and perfomance tests; I guess, there is no way around this. Better do this even before writing the actual code. For 2. and 3., however, automation tools like Jenkins step in and can be of great help. The developers checks in the code and Jenkins can do the rest for you.

In the current  blog post, we will show how to integrate automated JUnit functional tests into a Jenkins build pipeline. We will see that JUnit tests can be invoked easily via Gradle (Okay, Maven is more popular than Gradle, I guess, but I like Gradle because of some advantages I have discussed here; However, just give me a hint in a comment to this blog and I will prioritize the creation of a Maven version of this blog post). The Jenkins JUnit plug-in will be used to

  1. display reports on single build runs as well as
  2. display trend analysis graphs like the following one I have borrowed from here:
2016-12-30-18_41_45-jenkins-junit-project-home-jpg-826x707
Source: http://nelsonwells.net/2012/09/how-jenkins-ci-parses-and-displays-junit-output/

In this and the next two blog posts, we plan to cover following quality gate measures:

  • Part 4.1: Functional Tests (this blog post): we will use Java JUnit tests performed before building the executable JAR. Jenkins will report the test trend
  • Part 4.2: Code Quality Tests (coming soon): we will use the Checkstyle Gradle plugin for reporting to which degree the code adheres to the Apache Foundations formal rules
  • Part 4.3: Performance Tests (planned): we will use JMeter for testing and reporting the performance trend performed after the Java build using external performance testers like JMeter

Older blogs of this series:

This blog post series about Jenkins build pipelines is divided into following parts:

    • Part 1: Installation and Configuration of Jenkins, loading Plugins
    • Part 2: Creating our first Jenkins job: GitHub download and Software build
    • Part 3: Periodic and automatically triggered Builds

What is Jenkins?

Jenkins is the leading open source automation server mostly used in continuous integration and continuous deployment pipelines. Jenkins provides hundreds of plugins to support building, deploying and automating any project.

2016-12-30-21_04_46

A typical workflow is visualized above: a developer checks in the code changes into the repository. Jenkins will detect the change, build (compile) the software, test it and prepare to deploy it on a system. Depending on the configuration, the deployment is triggered by a human person, or automatically performed by Jenkins. After each step, the developer is informed depending on the priorites defined.

For more information, see the introduction found in part 1 of this blog series.

Automated Functional Testing based on JUnit

In this blog post, we will show how we need to configure Gradle and Jenkins for automated JUnit testing and reporting. In order to build a quality gate, we will reverse the original order and perform the JUnit tests before we build the executable JAR file (we do not want to create JAR files that are not functional):

2016-12-28-12_50_23

Tools used

      • Vagrant 1.8.6
      • Virtualbox 5.0.20
      • Docker 1.12.1
      • Jenkins 2.19.3
        • JUnit Plug-in 1.19

Prerequisites:

      • Free DRAM for the a Docker Host VM >~ 4 GB
      • Docker Host is available, Jenkins is installed and a build process is configured. For that, perform all steps in part 1 to part 3 of this blog series
      • Tested with 2 vCPU (1 vCPU might work as well)

Step 1: Start Jenkins in interactive Terminal Mode

Make sure that port 8080 is unused on the Docker host. If you were following all steps in part 1 of the series, you might need to stop cadvisor:

(dockerhost)$ sudo docker stop cadvisor

I assume that jenkins_home is already created, all popular plugins are installed and an Admin user has been created as shown in part 1 of the blog series. We start the Jenkins container with the jenkins_home Docker host volume mapped to /var/jenkins_home:

(dockerhost)$ cd <path_to_jenkins_home> # in my case: cd /vagrant/jenkins_home/
(dockerhost:jenkins_home)$ sudo docker run -it --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins
Running from: /usr/share/jenkins/jenkins.war
...
--> setting agent port for jnlp
--> setting agent port for jnlp... done

Step 2: Open Jenkins in a Browser

Now we want to connect to the Jenkins portal. For that, open a browser and open the URL

<your_jenkins_host>:8080

In our case, Jenkins is running in a container and we have mapped the container-port 8080 to the local port 8080 of the Docker host. On the Docker host, we can open the URL.

localhost:8080

Note: In case of Vagrant with VirtualBox, per default, there is only a NAT-based interface and you need to create port-forwarding for any port you want to reach from outside (also the local machine you are working on is to be considered as outside). In this case, we need to add an entry in the port forwarding list of VirtualBox:
2016-11-30-19_22_22-regel-fur-port-weiterleitung

We have created this entry in part 1 already, but I have seen that the entries were gone again, which seems to be a VirtualBox bug. I have added it again now.

Log in with the admin account we have created in the last session:

2016-12-09-10_24_00-jenkins

Step 3: Pre-Build JUnit Tests invoked by Gradle

In this step, we will invoke Gradle Tests before building the JAR. For that, we should verify locally that the Gradle tests are successful and then define a test Gradle task in the build process.

Step 3.1 (optional): Verify that Gradle Tests are successful

You can skip this test and directly let Jenkins do this for you. This may come handy, if you have not installed Git and/or Gradle locally.

Prerequisites

  • Your Java project has successful JUnit tests defined
  • Git is installed
  • The Project is cloned to a local directory
  • Gradle is installed

In order to test, whether the JUnit tests are successful, we can test those on a system with the project cloned (git, java and gradle must be installed):

(basesystem)$ gradle test
Starting a Gradle Daemon (subsequent builds will be faster)
Parallel execution is an incubating feature.
:compileJava UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:compileTestJava
warning: [options] bootstrap class path not set in conjunction with -source 1.6
1 warning
:processTestResources
:testClasses
:test

BUILD SUCCESSFUL

Total time: 29.9 secs

With that, we have verified that the command “gradle test”succeeds.

Note that the JUnit test must be designed in a way that they are independent of whether or not the JAR file is run in parallel. No simple way of running the executable JAR file in parallel to the execution of the JUnit tests seems to exist. In my case, I had to alter the JUnit tests to fulfill this prerequisite.

Step 3.2: Add Gradle test Task to Jenkins

As long as JUnit tests are defined in src/test of the project, adding Gradle tests to Jenkiny is as simple as adding “test” as a task to the list of Jenkins Build Gradle Tasks as follows:

On Dashboard -> Click on Project name -> Configure -> Build, add “task ” before the jar task:

2016-12-28-08_50_11-github-triggered-build-config-jenkins

Click Save.

If you have made local code changes on the project, now is the best time to commit and push them to the Git repository. If you have followed the steps in part 3, then this will automatically trigger a build process, so you do not need to click on “Build now” in that case. Otherwise, click on “Build now” on the Jenkins project page (e.g. Dashboard -> click on project name -> “Build now”).

Now we observe the result by clicking on the build process, then -> “Console Output”:

2016-12-28-09_45_34-github-triggered-build-724-console-jenkins

Don’t be confused by the blinking red ball on the upper left of the Console Output page: we see a BUILD SUCCESSFUL message and if we re-enter the same page, the ball is turned to static blue, indicating a successful build.

Step 4: Add JUnit Test Result Reporting to Jenkins

Now we will show how to add the JUnit test reports to the Jenkins build process.

Step 4.1: Install Jenkins JUnit Plugin

For Jenkins JUnit reporting, we need to install the JUnit Plug-in. For that, goto -> Jenkins Dashboard -> Manage Jenkins -> Manage Plugins -> Available -> Enter “JUnit Plugin” to the Find field -> Install

Note: If you do not find the plugin on the Available tab, search for it in the “Installed” tab.

You can install the plugin without reloading Jenkins.

Step 4.2: Configure Jenkins to collect and display the JUnit Test Results

In this step, we will configure Jenkins, so it will display the test results for individual builds as well as trend reporting. For that, navigate to:

Jenkins -> (choose Project) -> Configure -> Post-build Actions -> Publish JUnit test results report

2016-12-30-14_15_25-github-triggered-build-config-jenkins

Add

**/build/test-results/test/TEST-*.xml

to the “Test report XMLs” field, since this is the path, where Gradle is placing its JUnit test result reports (I have found the info here).

2016-12-30-14_18_51-github-triggered-build-config-jenkins

Now click Save.

Step 4.3: Verify JUnit individual Test Reporting

To test the Jenkins JUnit reporting feature, we trigger a clean build by adding “clean” to the Gradle tasks on Project -> Configure -> Build:

2016-12-30-17_43_59-github-triggered-build-config-jenkins

and clicking Save.

Then trigger a new build by clicking on Project -> Build now.

Then click on the Build Process, and then on Console output:

2016-12-30-17_48_38-github-triggered-build-731-console-jenkins

…scrolling down…

2016-12-30-17_50_01-github-triggered-build-731-console-jenkins

Do not be confused that the build process never seems to finish. Just click the Back to Project link:

Back to Project

On the Status page, we see that there were no failed tests:

2016-12-30-17_55_34-github-triggered-build-731-jenkins-v2

When we click on the Tests Result link on the left (or on the lower middle part on the Status page), we will see more details:

2016-12-30-17_58_25-github-triggered-build-731-test-results-jenkins-v2

We can see that we have had four tests (Create/Read/Update/Delete a file) and 100% of them were successful.

Step 4.3: Verify JUnit Test Trend Reporting

On the project’s Status page, a Test Trend graph is automatically added, as soon as there are two or more tests available. For that, click on “Build Now” on the left for a second time and click ENABLE AUTO REFRESH on the upper right. After the second build is complete, the (hopefully) blue Test Result Trend graph is showing up on the project status page:

2016-12-30-18_12_21-github-triggered-build-jenkins

The new blue graph shows that we had 4 successful tests in the last two builds.

Note: disregard the red Checkstyle Trend graph for now. This is something we will cover in the next blog post.

Step 5: Verify failed Test Reporting

Per default, Gradle build will fail, if one of the JUnit tests has failed, so it is building a strict quality gate. Will the test result be collected and reported nevertheless?

Let us test this now by breaking one of the JUnit tests by purpose. We have added an assert message that is expected to fail in one of the tests:

2016-12-30-19_27_46-java-ee-simple-restful-file-storage_src_test_java_de_oveits_simplerestfulfiles-v2

Now we commit and push the change to the SW repository:

$ git clone <Repository-URL>
$ cd <Repository Dir>
<perform the code changes here...>
$ git diff src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java
diff --git a/src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java b/src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java
index 684d30f..10200d5 100644
--- a/src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java
+++ b/src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java
@@ -115,6 +115,9 @@ public class SimpleRestfulFileStorageTests extends CamelSpringTestSupport {
 // mock expectations need to be specified before sending the message:
 mock.expectedBodiesReceived("File ttt created: href=http://localhost:2005/files/ttt");
 mock.expectedMessageCount(1);
+ ^M
+ // In order to break this test for Jenkins test reporting, we temporarily add a requirement that will fail:^M
+ mock.expectedMessageCount(2);^M

 template.sendBodyAndHeaders("direct:recipientList", body, headers);

$ git add src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java
$ git commit -m "Breaking a JUnit test by purpose for Jenkins reporting tests"
[jenkinstest 33655b9] Breaking a JUnit test by purpose for Jenkins reporting tests
 1 file changed, 4 insertions(+), 1 deletion(-)

olive@LAPTOP-P5GHOHB7 /d/veits/eclipseWorkspaceRecent/simple-restful-file-storage (jenkinstest)
$ git push
Counting objects: 9, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (9/9), 744 bytes | 0 bytes/s, done.
Total 9 (delta 4), reused 0 (delta 0)
remote: Resolving deltas: 100% (4/4), completed with 4 local objects.
To https://github.com/oveits/simple-restful-file-storage.git
 edb49f7..33655b9 jenkinstest -> jenkinstest

This will automatically trigger a new build (if you have followed part 3 of this series; otherwise just press “Build Now” on Jenkin’s project page).

We can see on the dashboard, that the build has failed:

2016-12-30-19_36_12-dashboard-jenkins

This was expected. Now let us click on the Project name, and we will see, what happened:

2016-12-30-19_37_40-github-triggered-build-jenkins

Perfect, that is exactly, what I wanted to achieve: On the Test Result Trend, we can see that we have performed 4 tests, one of which has failed.

Let us fix the failed test by commenting out (or removing) the wrong code again:

2016-12-30-19_40_01-java-ee-simple-restful-file-storage_src_test_java_de_oveits_simplerestfulfiles-v2

After

$ git add <file>
$ git commit -m "Fixed JUnit test again to test Jenkins JUnit trend report"
$ git push

The next build should be successful again and we can see in the trends graph that the failed test is fixed again:

2016-12-30-19_47_48-github-triggered-build-jenkins

thumps_up_3

Summary

In this blog post, we have shown

  1. How to add Java functional tests to the Jenkins build pipeline based on Gradle JUnit Plugin
  2. How to install the JUnit plug-in to Jenkins for report collection
  3. How to display JUnit test results for individual builds on the Jenkins portal
  4. How to display JUnit trend analysis on the Jenkins portal

The only challenge I have encountered is, that I had to re-write my JUnit tests in a way that they were successful when run stand-alone. Before they were successful only, if the executable JAR file was started manually before running the JUnit tests. This was resolved in a way specific to the framework used (Apache Camel in this case).

Coming Soon: Code Analysis Trend Analysis via Jenkins Checkstyle plugin

Further Reading

7

Jenkins Part 3.1: periodic vs triggered Builds


2016-11-30-18_19_38

Today, we will make sure that Jenkins will detect a code change in the software repository without manual intervention. We will show two methods to do so:

  1. Periodic Builds via Schedulers: Jenkins periodically asks the software repository for any code changes
  2. Triggered Builds via Webhooks: Jenkins is triggered by the software repository to perform the build task

We will see that the triggering build processes is more challenging to set up, but has quite some advantages in terms of economics and handling, once it is set up properly. See also the Summary at the end of this post.

This blog post series is divided into following parts:

    • Part 1: Installation and Configuration of Jenkins, loading Plugins
    • Part 2: Creating our first Jenkins job: GitHub download and Software build
    • Part 3 (this blog): Periodic and automatically triggered Builds
    • Part 4 (planned): running automated tests

What is Jenkins?

Jenkins is the leading open source automation server mostly used in continuous integration and continuous deployment pipelines. Jenkins provides hundreds of plugins to support building, deploying and automating any project.

 

Jenkins build, test and deployment pipeline
Jenkins build, test and deployment pipeline

A typical workflow is visualized above: a developer checks in the code changes into the repository. Jenkins will detect the change, build (compile) the software, test it and prepare to deploy it on a system. Depending on the configuration, the deployment is triggered by a human person, or automatically performed by Jenkins.

For more information, see the introduction found in part 1 of this blog series.

Automatic Jenkins Workflow: Periodic Polling

In this chapter, we will show how we need to configure Jenkins for automatic polling of the Software repository and start the build process, if code changes are detected.

2016-12-09-10_12_31

Tools used

      • Vagrant 1.8.6
      • Virtualbox 5.0.20
      • Docker 1.12.1
      • Jenkins 2.19.3

Prerequisites:

      • Free DRAM for the a Docker Host VM >~ 4 GB
      • Docker Host is available, Jenkins is installed and a build process is configured. For that, perform all steps in part 1 and part 2 of this blog series
      • Tested with 2 vCPU (1 vCPU might work as well)

Step 1: Start Jenkins in interactive Terminal Mode

Make sure that port 8080 is unused on the Docker host. If you were following all steps in part 1 of the series, you might need to stop cadvisor:

(dockerhost)$ sudo docker stop cadvisor

I assume that jenkins_home is already created, all popular plugins are installed and an Admin user has been created as shown in part 1 of the blog series. We start the Jenkins container with the jenkins_home Docker host volume mapped to /var/jenkins_home:

(dockerhost)$ cd <path_to_jenkins_home> # in my case: cd /vagrant/jenkins_home/
(dockerhost:jenkins_home)$ sudo docker run -it --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins
Running from: /usr/share/jenkins/jenkins.war
...
--> setting agent port for jnlp
--> setting agent port for jnlp... done

Step 2: Open Jenkins in a Browser

Now we want to connect to the Jenkins portal. For that, open a browser and open the URL

<your_jenkins_host>:8080

In our case, Jenkins is running in a container and we have mapped the container-port 8080 to the local port 8080 of the Docker host. On the Docker host, we can open the URL.

localhost:8080

Note: In case of Vagrant with VirtualBox, per default, there is only a NAT-based interface and you need to create port-forwarding for any port you want to reach from outside (also the local machine you are working on is to be considered as outside). In this case, we need to add an entry in the port forwarding list of VirtualBox:
2016-11-30-19_22_22-regel-fur-port-weiterleitung

We have created this entry in part 1 already, but I have seen that the entries were gone again, which seems to be a VirtualBox bug. I have added it again now.

Log in with the admin account we have created in the last session:

2016-12-09-10_24_00-jenkins

Step 3: Configure Project for periodic Polling of SW Repository

Step 3.1: Goto Build Trigger Configuration

On the Jenkins Dashboard, find the hidden triangle right of the project name,

2016-12-09-18_17_35-dashboard-jenkins

In the drop-down list, choose “Configure”

2016-12-09-18_18_06-dashboard-jenkins

(also possible: on the Dashboard, click on the project name and then “Configure”).

Step 3.2: Configure a Schedule

We scroll down to “Build Triggers” and check “Build periodically” and specify that it will be done every 10 minutes (H/10 * * * *). I do not recommend to use lower values than that since I have seen that even my monster notebook with i7-6700HQ and 64GB RAM is quite a bit stressed by the build those many build processes.

2016-12-22-23_54_06-github-triggered-build-config-jenkins

Note that this is a very short polling period for our test purposes only; we do not want to wait very long after a code change is detected.

Note also: you can click the Blue Question Markright of the Schedule text box to get help with the scheduler syntax.

Step 3.2: Save

Click Save

Step 4: Change the content of the Software Repository

Now we expect that a change of the SW repository is detected latest 2 minutes after new code is checked in. Let us do so now: In this case, I have changed the content of README.md and commited the change:

(local repository)$ git add README.md
(local repository)$ git commit -m "changed README"
(local repository)$ git push

Within 2 minutes, I see a new job #24 running on the lower left:

2016-12-09-18_35_13-dashboard-jenkins

It seems that the page needs to be reloaded by refreshing the browser, so the dashboard displays the #24 build process as “Last Success”:

The build process was very quick, since we have not changed any relevant source code. The console log can be reached via the Jenkins -> Project Link -> Build History -> click on build number -> Console:

2016-12-11-21_55_22-github-triggered-build-687-console-jenkins

As you can see, after some hours, the git repository is downloaded even if there was no code change at all. However, Gradle will detect that the JAR file is up-to-date and it will not re-build the JAR file, unless there is a code change.

The disadvantage of a scheduled build process with high frequency is that the number of builds in the build history is increasing quickly:

2016-12-11-22_02_24-github-triggered-build-jenkins

Note: The build history is spammed by many successful builds with no code change, and it is not easy to find the interesting build among all those many unnecessary builds. Let us try to improve the situation by replacing periodic, scheduled builds by triggered builds:

Step 5: Triggered Builds

In Step 4, we have seen that periodic builds should not be performed in a very short timeframe, because:

  1. the Jenkins server is stressed quite a bit by configuring a too low build frequency
  2. the build history is polluted by information of many irrelevant build processes with no changed code.

Therefore, it is much better to create a triggered build. The target is to trigger a build process every time the developer is checking in new code to the software repository:

2016-12-21-15_12_25

In this way, a periodic build is not necessary, or can be done much less frequently.

What do we need to do?

  1. Make sure that the Jenkins server is reachable from the SW repository
  2. Configure the SW repository with a web hook for informing Jenkins upon each code change
  3. Configure Jenkins for triggered build

Let us start:

Step 5.1 Configure Jenkins for triggered Build

On the Jenkins Dashboard, click on the project:

2016-12-22-18_56_44-dashboard-jenkins

and then “Configure” on the left pane:

2016-12-22-18_58_28-github-triggered-build-jenkins

Scroll down to Build Triggers and check the “Trigger build remotels (e..g. , from scripts)” checkbox and choose an individual secret token (do not use the one you see here):

2016-12-22-19_03_16-github-triggered-build-config-jenkins

You will be provided with the build trigger URL, which is in my case:

JENKINS_URL/job/GitHub%20Triggered%20Build/build?token=TOKEN_NAME

And the JENKINS_URL is the URL needed to be contacted by the Git Repository. Save the URL above for later use.

Now click Save.

Step 5.2 Test Trigger URL locally

Now we can test the trigger URL locally on the Docker Host as follows (as found on this StackOverflow Q&A):

We need to retrieve a so-called Jenkins-Crumb:

(dockerhost)$ CRUMB=$(curl -s 'http://admin:your_admin_password@localhost:8080/crumbIssuer/api/xml?xpath=concat(//crumbRequestField,":",//crumb)')
(dockerhost)$ echo $CRUMB
Jenkins-Crumb:CCCCCCCCCCCCCCCCCCCCCCCCCC

Please make a note of the returned Jenkins-Crumb, since we will need this value in the next step.

Then we can use the Jenkins-Crumb as header in the build trigger request:

(dockerhost)$ curl -H $CRUMB 'http://admin:your_admin_password@localhost:8080/job/GitHub%20Triggered%20Build/build?token=hdsghewriohwziowrhwsn'

This should trigger a new build on Jenkins:

2016-12-22-21_56_44-dashboard-jenkins-v2

By clicking on the build and then the “Console Output”, we see a successful build with no changed data:

2016-12-22-22_01_36-github-triggered-build-712-console-jenkins

Step 5.3 Make sure that the Jenkins Server is reachable from the SW repository

We are running the Jenkins server as a Docker container within a Vagrant VM as host. In step 2 we have made sure that the Docker container is reachable from the local network by exposing the Docker ports and by configuring port forwarding in VirtualBox. However, the Docker container is not yet reachable from the Git Repository, since the Router will block all requests, as long as no pot forwarding is configured on the router:

2016-12-21-16_02_27

Let us fix that now:

In my case, the (sorry, German) input mask of the router looks like follows:

2016-12-22-19_30_22-fritzbox-7490

I am mapping outside port 8080 to the internal machine running the Docker Host VM.

Now, the routing should work. We will test this in the next step.

2016-12-22-23_59_49

Step 5.4: Add Webhook to Git SW Repository

Now we need to add a Webhook to the Git repository. In my case, the repository is located at https://github.com/oveits/simple-restful-file-storage. On that page, goto

Settings -> Webhooks-> Add webhook -> Confirm password

Then copy&paste the URL of Step 5.1 into the Payload URL with following changes:

  • Change JENKINS_URL by the IP address or DNS name your router is reachable from the Internet.
  • Choose a port that you intend to open for this service (e.g. 8080) in the next step.
  • Add admin:your_admin_password@ before the JENKINS_URL; use your own username and password here
  • append &Jenkins-Crumb=CCCCCCCCCCCCCCCC to the URL with the value of the Jenkins-Crumb we have retrieved in the previous step

Example with the items to change in red:

http://admin:your_admin_password@your_public_ip_or_name:8080/job/GitHub%20Triggered%20Build/build?token=TTTTTTTTTTTTTTTTT&Jenkins-Crumb=CCCCCCCCCCCCCCCCCCCC

2016-12-22-22_08_39-webhook-http___veits-no-ip-biz_8080_job_github%20triggered%20build_build

 

For the other fields, keep the defaults and klick Add Webhook.

If everything works fine, we already should see a successful delivery of the trigger on the lower end of the Github page:

2016-12-22-22_17_57-webhook-http___veits-no-ip-biz_8080_job_github%20triggered%20build_build-v2

If it was not successful, you can see more details by clicking on the request:

2016-12-22-22_20_52-webhook-http___veits-no-ip-biz_8080_job_github%20triggered%20build_build

Step 6: Test triggered Build upon Code Push

This is the final step of this tutorial: we now will test that a build is triggered each time a user pushes new code to the repository.

Step 6.1: Install Git locally

If Git is not installed locally, so do it now.

Step 6.2: Download the Project Repository

We now clone the project by issuing the command

$ git clone https://github.com/oveits/simple-restful-file-storage

Step 6.3: Change Code

You can perform a minor change the content of the README.md in order to test the triggered build.

Step 6.4: Push Code to the Repository

With the commands

$ git commit -am "Minor change of README.md to trigger a Jenkins build"
$ git push

We push the changed code to the SW repository.

If everything works correctly, we will immediately see, that Git has triggered Jenkins to perform a build by reloading the Jenkins Dashboard (32 sec ago, in this screenshot):

2016-12-22-22_50_14-dashboard-jenkins-v2

We can check the build by clicking on the Last Success build and then “Console Output”:

2016-12-22-22_53_08-github-triggered-build-713-console-jenkins

Gradle was clever enough to detect that no relevant code had been changed, so everything is still up to date.

With this procedure we have made sure that the Software repository will trigger a new build process on each and every code change. Moreover, the Jenkins server is not polluted with unnecessary builds anymore, since we have switched off periodic builds.
thumps_up_3

Summary

In this blog post we have performed following tasks:

  1. Started Jenkins in a Docker container
  2. Configured and tested periodic builds
  3. Configured and tested triggered builds
  4. Made sure that the Git Software repository is triggering such a build at every code change

As in the other parts of this series, we have run Jenkins interactively in a Docker container. See below a discussion of the advantages of periodic and triggered builds:

Periodic Builds vs Triggered Builds

When we compare periodic builds with triggered builds, we see following advantages/disadvantages:

Complexity of Setup: periodic builds are much easier to set up. They only need to be configured on Jenkins. Triggered builds requires setup steps on the Jenkins Server, the Software Repository and intermediate Firewalls, if the Jenkins Server is located in a private network.

Economics: Triggered builds are more economic in terms of Jenkins Server load. The build processes run only, when needed.

Handling: Triggered builds have important handling advantages compared to triggered builds: firstly, each and every code change can be tested helping the programmer to get near immediate feedback for every code change. Secondly, the build log is not polluted by hundreds of irrelevant builds.

In my opinion, a clear winner is: triggered builds. Those may be combined with periodic clean builds at certain milestones.

2016-12-22-23_39_41

 

References

 

1

Getting Started with Mesos Resource Reservation & Marathon Watchdog – A “Hello World” Example


2016-12-14-01_22_32

Today, we will introduce Apache Mesos, an open source distributed computing system with the target to allow applications to run on a computer cluster as if it was running on a single computer. On top of a Mesos cluster, we will run Mesosphere Marathon, an open source container orchestration platform. Similar to a watchdog, Marathon helps running and maintaining long-running applications. However, unlike a mere watchdog, Marathon runs the applications in containers, and it provides a modern Web Portal and a modern RESTful API.

With the help of Marathon, we will

  • run several instances of a simple “Hello World” script on the cluster (within and outside of Docker containers);
  • see, what happens, if an application dies unexpectedly;
  • see, what happens, if an application reservation request exceeds the available resources.

For simplicity and quick installation purposes, all components of the Mesos architecture will be run within Docker containers.

What is Mesos?

Mesos is an open source framework and provides a distributed computer system. Mesos provides applications (e.g. Hadoop, Spark, Kafka, Elasticsearch) with APIs for resource management and scheduling across entire datacenters and cloud environments.

 

2016-12-14-02_28_36-mesos-marathon-architecture-google-slides

The Mesos Agents advertise the available resources (CPU, DRAM, …) to the master, which will relay those offers to frameworks like Marathon, Hadoop and Jenkins and many more. The frameworks may reserve all or part of the offered resources and run the application on the Mesos agents (slave).

What is Marathon?

Mesosphere, the owner of Marathon calls Marathon “a production-grade container orchestration platform for Mesosphere’s Datacenter Operating System (DC/OS) and Apache Mesos.”

Among others, it offers:

  • active-standby redundancy for increased availability
  • Marathon is starting containers on the Mesos Agent. Both, Mesos containers (using cgroups) and Docker containers are supported.
  • It offers a powerful GUI and
  • a REST API for easier integration.

A more complete feature list can be found here.

Compared to other schedulers like Apache Aurora used by Twitter, Marathon seems to be much easier to handle. On the other side, Aurora offers elaborate prioritization and preemption features. Those may be important, if the same resources are shared between production and development: if a production workload does not find any resources on a Mesos slave, Aurora will kill of less important applications in order to free up resources.

A good comparison between Mesosphere Marathon and Apache Aurora can be found on this Hootsuite Development’s web page.

Target Configuration for this Blog Post

In this “Hello World” example, we will create a simple Mesos configuration with a Marathon framework, a single Zookeeper, a single master and a single agent (slave):

2016-12-14-02_14_43-mesos-marathon-architecture-google-slides

We will also show what happens, if we kill a Marathon app.

Versions & Tools used

Prerequisites:

  • >~4 GB RAM: after starting the a Mesos master, a Mesos agent (slave), a ZooKeeper and a Marathon Docker container, I have observed a DRAM usage of ~3.8 GB

Step 1: Install a Docker Host via Vagrant and Connect to the Host via SSH

If you are using an existing docker host, make sure that your host has enough free memory.

We will run the applications in Docker containers in order to allow for maximum interoperability. This way, we always can use the latest versions without the need to control the java version used.

If you are new to Docker, you might want to read this blog post.

Installing Docker on Windows and Mac can be a real challenge, but no worries: we will show an easy way here, that is much quicker than the one described in Docker’s official documentation:

Prerequisites of this step:

  • I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy. However, if you cannot get rid of your HTTP proxy, read this blog post.
  • Administration rights on you computer.

Steps to install a Docker Host VirtualBox VM:

Download and install Virtualbox (if the installation fails with error message “<to be completed> see Appendix A of this blog post: Virtualbox Installation Workaround)

1. Download and Install Vagrant (requires a reboot)

2. Download Vagrant Box containing an Ubuntu-based Docker Host and create a VirtualBox VM like follows:

basesystem# mkdir ubuntu-trusty64-docker ; cd ubuntu-trusty64-docker
basesystem# vagrant init williamyeh/ubuntu-trusty64-docker
basesystem# vagrant up
basesystem# vagrant ssh

Now you are logged into the Docker host and we are ready for the next step: to create the Docker images.

Note: I have experienced problems with the vi editor when running vagrant ssh in a Windows terminal. In case of Windows, consider to follow Appendix C of this blog post and to use putty instead.

Step 2 (recommended): Download Docker Images

This step is optional, since the download will be done automatically with each docker run command, if the image is not available on the Docker host. However, I recommend to download the images in advance, so if you run the applications, you can observe the logs and other feedback (syntax errors) immediately.

Step 2.1 Download Zookeeper

By looking at the Mesosphere Github documentation, it seems like a Zookeeper (Exhibitor) is needed always. The only tag available as of today is the tag 1.5.2. Let us download it first:

(dockerhost)$ sudo docker pull netflixoss/exhibitor:1.5.2
1.5.2: Pulling from netflixoss/exhibitor

a3ed95caeb02: Pull complete
831a6feb5ab2: Pull complete
b32559aac4de: Pull complete
5e99535a7b44: Pull complete
aa076096cff1: Pull complete
423664404a49: Pull complete
929c1efe4d14: Pull complete
387bf8857f2e: Pull complete
5efe9ea3de0d: Pull complete
a53f74fd9d17: Pull complete
78b42a885be7: Pull complete
684d8691844e: Pull complete
Digest: sha256:9b384a431d2e231f0bd3fcda5eff20d5eabd5ba1e3361764a4834d3401fbc4d4
Status: Downloaded newer image for netflixoss/exhibitor:1.5.2

Step 2.2 Download Mesos Master

It is quite confusing, how many distributions of Mesos exist. The Docker Hub distribution from Mesoscloud seems to offer most of the stars:

$ sudo docker search mesos
NAME                                DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
mesoscloud/mesos-master             Mesos Master                                    50                   [OK]
mesoscloud/mesos-slave              Mesos Slave                                     31                   [OK]
...

However, if I search for mesos-master, I find another image with even more stars and downloads:

$ sudo docker search mesos-master
NAME                                         DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
mesosphere/mesos-master                      Mesos-Master in Docker                          71
mesoscloud/mesos-master                      Mesos Master                                    50                   [OK]
...

Mesosphere also offers an two-in-one solution with master and slave combined in a single container: mesosphere/mesos. However, the recommended way of running Mesos is to run a master on one Docker host and slaves on other Docker hosts.

Let us download the files mesosphere/mesos-master and mesosphere/mesos-slave. Note that the “latest” tag does not exist, so we need to specify the tag explicitly. Let us try with the latest version I have found to exist for both, master and slave: i.e. 1.1.01.1.0-2.0.107.ubuntu1404.

(dockerhost)$ docker pull mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404
1.1.01.1.0-2.0.107.ubuntu1404: Pulling from mesosphere/mesos-master

bf5d46315322: Already exists
9f13e0ac480c: Already exists
e8988b5b3097: Already exists
40af181810e7: Already exists
e6f7c7e5c03e: Already exists
a3ed95caeb02: Already exists
01a862c74d96: Already exists
651b06ceb77e: Already exists
Digest: sha256:a011e002d641c6ba8361c542bd9429af721b7c7434598a9615cbd5b05511af7f
Status: Downloaded newer image for mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404

The version of the downloaded Mesos Master image can be checked with following command:

(dockerhost)$ sudo docker run -it --rm --name mesos-master mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404 --version
mesos 1.1.0

We are using version 1.1.0, currently.

Step 2.3 Download Mesos Slave

(dockerhost)$ docker pull mesosphere/mesos-slave:1.1.01.1.0-2.0.107.ubuntu1404
1.1.01.1.0-2.0.107.ubuntu1404: Pulling from mesosphere/mesos-slave

bf5d46315322: Already exists
9f13e0ac480c: Already exists
e8988b5b3097: Already exists
40af181810e7: Already exists
e6f7c7e5c03e: Already exists
a3ed95caeb02: Already exists
01a862c74d96: Already exists
651b06ceb77e: Already exists
Digest: sha256:bb75cc78c6880a2faa5307e3d8caa806105c673e9002429e60e3ae858d162999
Status: Downloaded newer image for mesosphere/mesos-slave:1.1.01.1.0-2.0.107.ubuntu1404

Step 2.4 Download Marathon

While Mesos will offer computing resources, Marathon is a framework that will ask for those resources.

(dockerhost)$ sudo docker pull mesosphere/marathon
Using default tag: latest
latest: Pulling from mesosphere/marathon

43c265008fae: Pull complete
af36d2c7a148: Pull complete
143e9d501644: Pull complete
bfc4cdbc8d81: Pull complete
38c6fc3e9968: Pull complete
0bfa8d5153bb: Pull complete
05bc8d0fffca: Pull complete
f1266a2a7ecb: Pull complete
f505e7ed4b7e: Pull complete
219f8c7fc022: Pull complete
Digest: sha256:9c881ff6f46a0da69f622a19a1677f1424a12ef37d076ec439854f15b97179fa
Status: Downloaded newer image for mesosphere/marathon:latest

Marathon does not offer a –version option or alike. The Marathon version can only be seen in the log, when running Marathon:

(dockerhost)$ sudo docker run -it --net=host -v `pwd`:/work_dir --entrypoint=bash mesosphere/marathon
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[2016-12-12 21:01:46,268] INFO Starting Marathon 1.3.6/unknown with --master local --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main)

Step 3: Run Mesos

In this step, we will run Mesos Master interactively (with -it switch instead of -d switch) to better see, what is happening. In a productive environment, you will use the detached mode -d instead of the interactive terminal mode -it.

We have found out by analyzing the Dockerfiles of mesos-master and mesos-slave images that both are based on mesosphere/mesos with different entrypoints and commands:

  • mesosphere/mesos-master: entrypoint: mesos-master with default option --registry=in_memory
  • mesosphere/mesos-slave: entrypoint: mesos-slave with no default options

Interestingly, the Dockerfiles have no exposed ports specified. The reason is, that the Docker images are supposed to run in host network mode, and by that, they are sharing the interface including IP address(es) and Ports with the Docker host.

The usage of the docker images is documented on Github. It seems like a zookeeper is needed:

Step 3.1: Run Zookeeper (Exhibitor) interactively in a Container

In order to better see, what is happening, we will run the Zookeeper in interactive terminal (-it) mode instead of detached mode (-d), as described in the documentation. With the --nat=host option, we share the network with the Docker host, so we do not need to explicitly expose the used TCP ports:

(dockerhost)$ sudo docker run -it --net=host --name=zookeeper netflixoss/exhibitor:1.5.2
v1.0
INFO com.netflix.exhibitor.core.activity.ActivityLog Exhibitor started [main]
Dec 12, 2016 5:05:28 PM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
INFO org.mortbay.log Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog [main]
INFO org.mortbay.log jetty-1.0 [main]
Dec 12, 2016 5:05:29 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9.1 09/14/2011 02:36 PM'
INFO org.mortbay.log Started SocketConnector@0.0.0.0:8080 [main]
INFO com.netflix.exhibitor.core.activity.ActivityLog State: down [ActivityQueue-0]
Dec 12, 2016 5:05:30 PM java.util.prefs.FileSystemPreferences$6 run
WARNING: Prefs file removed in background /root/.java/.userPrefs/prefs.xml
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to stop instance [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to start/restart ZooKeeper [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog jps didn't find instance - assuming ZK is not running [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Starting in standalone mode [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: JMX enabled by default [pool-2-thread-1]
INFO com.netflix.exhibitor.core.activity.ActivityLog Process started via: /zookeeper/bin/zkServer.sh [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Using config: /zookeeper/bin/../conf/zoo.cfg [pool-2-thread-1]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2]
INFO com.netflix.exhibitor.core.activity.ActivityLog State: serving [ActivityQueue-0]

In the log, we see, that the ZooKeeper is using port 8080. Therefore, we will see in Step 3.3 that the TCP port for Marathon needs to be changed in order to avoid a port resource collision.

Step 3.2: Run Master interactively in a Container

Here again, we are following the documentation of the images, but we are starting the Mesos master in interactive mode in a different terminal in order to see all logs, and we use the latest version 1.1.0 instead of 0.28.0:

(dockerhost)$ sudo docker run -it --net=host \
  --name mesos-master \
  -e MESOS_PORT=5050 \
  -e MESOS_ZK=zk://127.0.0.1:2181/mesos \
  -e MESOS_QUORUM=1 \
  -e MESOS_REGISTRY=in_memory \
  -e MESOS_LOG_DIR=/var/log/mesos \
  -e MESOS_WORK_DIR=/var/tmp/mesos \
  -v "$(pwd)/log/mesos:/var/log/mesos" \
  -v "$(pwd)/tmp/mesos:/var/tmp/mesos" \
  mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1212 17:10:44.898630 1 main.cpp:263] Build: 2016-11-16 01:30:23 by ubuntu
I1212 17:10:44.898900 1 main.cpp:264] Version: 1.1.0
I1212 17:10:44.898916 1 main.cpp:267] Git tag: 1.1.0
I1212 17:10:44.898927 1 main.cpp:271] Git SHA: a44b077ea0df54b77f05550979e1e97f39b15873
I1212 17:10:44.903816 1 logging.cpp:194] INFO level logging started!
I1212 17:10:44.904436 1 main.cpp:370] Using 'HierarchicalDRF' allocator
2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer
2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic
2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016
2016-12-12 17:10:44,907:1(0x7ff4a6961700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@767: Client environment:user.dir=/
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7ff4b16c6200 sessionId=0 sessionPasswd=<null> context=0x7ff490000930 flags=0
I1212 17:10:44.909833 11 master.cpp:380] Master 917a95ab-7b77-4316-8e52-1431a8043af3 (openshift-installer-native-docker-compose) started on 127.0.0.1:5050
I1212 17:10:44.912077 11 master.cpp:382] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --port="5050" --quiet="false" --quorum="1" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/tmp/mesos" --zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
2016-12-12 17:10:44,909:1(0x7ff4a8164700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-12-12 17:10:44,913:1(0x7ff4a8164700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-12-12 17:10:44,913:1(0x7ff4a8164700):ZOO_INFO@log_env@767: Client environment:user.dir=/
2016-12-12 17:10:44,913:1(0x7ff4a8164700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7ff4b16c6200 sessionId=0 sessionPasswd=<null> context=0x7ff4980038a0 flags=0
W1212 17:10:44.913079 11 master.cpp:385]
**************************************************
Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address.
**************************************************
I1212 17:10:44.914429 11 master.cpp:434] Master allowing unauthenticated frameworks to register
I1212 17:10:44.914448 11 master.cpp:448] Master allowing unauthenticated agents to register
I1212 17:10:44.914455 11 master.cpp:462] Master allowing HTTP frameworks to register without authentication
I1212 17:10:44.914474 11 master.cpp:504] Using default 'crammd5' authenticator
W1212 17:10:44.914487 11 authenticator.cpp:512] No credentials provided, authentication requests will be refused
I1212 17:10:44.914687 11 authenticator.cpp:519] Initializing server SASL
2016-12-12 17:10:44,922:1(0x7ff497fff700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
2016-12-12 17:10:44,923:1(0x7ff497fff700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f400557d0003, negotiated timeout=10000
2016-12-12 17:10:44,923:1(0x7ff4977fe700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
2016-12-12 17:10:44,924:1(0x7ff4977fe700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f400557d0002, negotiated timeout=10000
I1212 17:10:44.924991 8 group.cpp:340] Group process (zookeeper-group(2)@127.0.0.1:5050) connected to ZooKeeper
I1212 17:10:44.925303 8 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1212 17:10:44.925424 8 group.cpp:418] Trying to create path '/mesos' in ZooKeeper
I1212 17:10:44.925204 5 group.cpp:340] Group process (zookeeper-group(1)@127.0.0.1:5050) connected to ZooKeeper
I1212 17:10:44.925606 5 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1212 17:10:44.925617 5 group.cpp:418] Trying to create path '/mesos' in ZooKeeper
I1212 17:10:44.931725 8 contender.cpp:152] Joining the ZK group
I1212 17:10:44.932301 11 master.cpp:1951] Successfully attached file '/var/log/mesos/mesos-master.INFO'
I1212 17:10:44.936324 9 contender.cpp:268] New candidate (id='1') has entered the contest for leadership
I1212 17:10:44.937194 5 detector.cpp:152] Detected a new leader: (id='1')
I1212 17:10:44.937408 5 group.cpp:697] Trying to get '/mesos/json.info_0000000001' in ZooKeeper
I1212 17:10:44.939241 5 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
I1212 17:10:44.939414 5 master.cpp:2017] Elected as the leading master!
I1212 17:10:44.939437 5 master.cpp:1560] Recovering from registrar
I1212 17:10:44.941402 5 registrar.cpp:362] Successfully fetched the registry (0B) in 1.7152ms
I1212 17:10:44.941573 5 registrar.cpp:461] Applied 1 operations in 5462ns; attempting to update the registry
I1212 17:10:44.946907 5 registrar.cpp:506] Successfully updated the registry in 5.135104ms
I1212 17:10:44.947170 5 registrar.cpp:392] Successfully recovered registrar
I1212 17:10:44.947314 5 master.cpp:1676] Recovered 0 agents from the registry (184B); allowing 10mins for agents to re-register
2016-12-12 17:11:11,640:1(0x7ff497fff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 14ms
2016-12-12 17:11:11,641:1(0x7ff4977fe700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 15ms
2016-12-12 17:11:35,045:1(0x7ff497fff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 50ms
2016-12-12 17:11:35,046:1(0x7ff4977fe700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 51ms

Step 3.3: Run Slave interactively in a Container

Here again, we are following the documentation of the images, but we are starting the Mesos slave in interactive mode (-it) in a third terminal in order to see all log, and we use the latest version 1.1.0 instead of 0.28.0:

(dockerhost)$ sudo docker run -it --net=host --privileged \
  -e MESOS_PORT=5051 \
  -e MESOS_MASTER=zk://127.0.0.1:2181/mesos \
  -e MESOS_SWITCH_USER=0 \
  -e MESOS_CONTAINERIZERS=docker,mesos \
  -e MESOS_LOG_DIR=/var/log/mesos \
  -e MESOS_WORK_DIR=/var/tmp/mesos \
  -v "$(pwd)/log/mesos:/var/log/mesos" \
  -v "$(pwd)/tmp/mesos:/var/tmp/mesos" \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /cgroup:/cgroup \
  -v /sys:/sys \
  -v /usr/local/bin/docker:/usr/local/bin/docker \
  mesosphere/mesos-slave:1.1.01.1.0-2.0.107.ubuntu1404
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1212 17:18:39.260704 1 main.cpp:243] Build: 2016-11-16 01:30:23 by ubuntu
I1212 17:18:39.261031 1 main.cpp:244] Version: 1.1.0
I1212 17:18:39.261075 1 main.cpp:247] Git tag: 1.1.0
I1212 17:18:39.261108 1 main.cpp:251] Git SHA: a44b077ea0df54b77f05550979e1e97f39b15873
I1212 17:18:39.265000 1 logging.cpp:194] INFO level logging started!
I1212 17:18:39.400902 1 containerizer.cpp:200] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
I1212 17:18:39.429229 1 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
2016-12-12 17:18:39,438:1(0x7ff8f7601700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer
2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic
2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016
I1212 17:18:39.438886 1 slave.cpp:208] Mesos agent started on (1)@127.0.0.1:5051
2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
I1212 17:18:39.439324 1 slave.cpp:209] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="linux" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://127.0.0.1:2181/mesos" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="false" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/tmp/mesos"
2016-12-12 17:18:39,441:1(0x7ff8f7601700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-12-12 17:18:39,442:1(0x7ff8f7601700):ZOO_INFO@log_env@767: Client environment:user.dir=/
2016-12-12 17:18:39,442:1(0x7ff8f7601700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7ff9023a0200 sessionId=0 sessionPasswd=<null> context=0x7ff8d4001e00 flags=0
2016-12-12 17:18:39,445:1(0x7ff8f4dfc700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
2016-12-12 17:18:39,446:1(0x7ff8f4dfc700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f400557d0004, negotiated timeout=10000
W1212 17:18:39.440688 1 slave.cpp:212]
**************************************************
Agent bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address.
**************************************************
I1212 17:18:39.448009 8 group.cpp:340] Group process (zookeeper-group(1)@127.0.0.1:5051) connected to ZooKeeper
I1212 17:18:39.448274 8 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1212 17:18:39.448418 8 group.cpp:418] Trying to create path '/mesos' in ZooKeeper
I1212 17:18:39.450460 1 slave.cpp:533] Agent resources: cpus(*):2; mem(*):2928; disk(*):1902607; ports(*):[31000-32000]
I1212 17:18:39.452038 8 detector.cpp:152] Detected a new leader: (id='1')
I1212 17:18:39.452385 8 group.cpp:697] Trying to get '/mesos/json.info_0000000001' in ZooKeeper
I1212 17:18:39.452198 1 slave.cpp:541] Agent attributes: [ ]
I1212 17:18:39.454398 1 slave.cpp:546] Agent hostname: openshift-installer-native-docker-compose
I1212 17:18:39.459702 8 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
I1212 17:18:39.462762 9 state.cpp:57] Recovering state from '/var/tmp/mesos/meta'
I1212 17:18:39.464331 9 status_update_manager.cpp:203] Recovering status update manager
I1212 17:18:39.465109 9 docker.cpp:764] Recovering Docker containers
I1212 17:18:39.465148 11 containerizer.cpp:555] Recovering containerizer
I1212 17:18:39.472216 6 provisioner.cpp:253] Provisioner recovery complete
I1212 17:18:39.917292 8 slave.cpp:5281] Finished recovery
I1212 17:18:39.938516 8 slave.cpp:915] New master detected at master@127.0.0.1:5050
I1212 17:18:39.939077 8 slave.cpp:936] No credentials provided. Attempting to register without authentication
I1212 17:18:39.939728 8 slave.cpp:947] Detecting new master
I1212 17:18:39.938575 9 status_update_manager.cpp:177] Pausing sending status updates
E1212 17:18:40.622269 8 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected
I1212 17:18:40.631978 8 slave.cpp:4179] Got exited event for master@127.0.0.1:5050
W1212 17:18:40.634527 8 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected
E1212 17:18:40.632072 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E1212 17:18:42.052165 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected
I1212 17:18:42.052312 7 slave.cpp:4179] Got exited event for master@127.0.0.1:5050
W1212 17:18:42.057737 7 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected
I1212 17:18:45.093793 6 slave.cpp:4179] Got exited event for master@127.0.0.1:5050
W1212 17:18:45.094339 6 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected
E1212 17:18:45.093952 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected
I1212 17:18:48.748410 9 slave.cpp:4179] Got exited event for master@127.0.0.1:5050
W1212 17:18:48.748883 9 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected
E1212 17:18:48.748569 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected
2016-12-12 17:18:49,470:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 12ms
I1212 17:18:50.030248 8 detector.cpp:152] Detected a new leader: None
I1212 17:18:50.030894 8 slave.cpp:908] Lost leading master
I1212 17:18:50.031404 8 slave.cpp:947] Detecting new master
I1212 17:18:50.031327 7 status_update_manager.cpp:177] Pausing sending status updates
2016-12-12 17:18:53,405:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 43ms
I1212 17:19:02.637687 7 slave.cpp:1324] Skipping registration because no master present
I1212 17:19:22.959288 8 detector.cpp:152] Detected a new leader: (id='2')
I1212 17:19:22.961132 8 group.cpp:697] Trying to get '/mesos/json.info_0000000002' in ZooKeeper
I1212 17:19:22.965332 8 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
I1212 17:19:22.971281 8 slave.cpp:915] New master detected at master@127.0.0.1:5050
I1212 17:19:22.972019 8 slave.cpp:936] No credentials provided. Attempting to register without authentication
I1212 17:19:22.975116 8 slave.cpp:947] Detecting new master
I1212 17:19:22.971410 9 status_update_manager.cpp:177] Pausing sending status updates
I1212 17:19:23.927111 11 slave.cpp:1115] Registered with master master@127.0.0.1:5050; given agent ID a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0
I1212 17:19:23.931748 12 status_update_manager.cpp:184] Resuming sending status updates
I1212 17:19:24.781162 11 slave.cpp:1175] Forwarding total oversubscribed resources {}
I1212 17:19:39.456789 10 slave.cpp:5044] Current disk usage 62.36%. Max allowed age: 1.934608332272824days
2016-12-12 17:19:46,345:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 20ms
2016-12-12 17:20:19,750:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 41ms

Step 4: Connect to Mesos Web Portal

Mesos’ web portal is running on port 5050 per default. In my case, I am running Mesos on Docker containers, which are hosted by a VirtualBox Docker host VM. Since the VM only has a NATed virtual Ethernet interface, I need to specify a port forwarding rule in VirtualBox, so I can reach port 5050 of the Docker host:

2016-12-12-18_41_57-regel-fur-port-weiterleitung

After that, we can access the Mesos Master Portal from a Browser on the base system:

2016-12-12-18_48_01-mesos

Note: if you see the error “Failed to connect to 127.0.0.1:5050!”, try reloading (instead of retrying) the page via the browser reload function. See Appendix A.1 for details.

We can see the single Mesos slave (i.e. agent) connected to the master on the left:

2016-12-12-19_02_19-mesos

The details of the agents (slaves) can be seen by clicking on the Agents link in the menu:

2016-12-16-20_34_44-mesos

So, what is next? It looks like we need a framework like Marathon in order to start applications on Mesos. Let us start Marathon now.

Step 5: Start Marathon

Let us start Marathon in a container as described here. However, we will use port 7070 instead of 8080, because 8080 collides with a port that is used  by the ZooKeeper. In addition, we will change the master URI and let it point to the external Mesos master we have started in step 3. Moreover, we need to set the environment variable MESOS_WORK_DIR because of a Mesos bug. See Appendix B for details.

(dockerhost)$ sudo docker run -it --name marathon --rm --net=host -e MESOS_WORK_DIR=/var/lib/mesos --entrypoint=bash mesosphere/marathon
(container)# ./bin/start --master zk://127.0.0.1:2181/mesos --zk zk://localhost:2181/marathon --http_port=7070
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[2016-12-13 14:46:09,798] INFO Starting Marathon 1.3.6/unknown with --master zk://127.0.0.1:2181/mesos --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main)
[2016-12-13 14:46:10,148] WARN Method [public javax.ws.rs.core.Response mesosphere.marathon.api.MarathonExceptionMapper.toResponse(java.lang.Throwable)] is synthetic and is being intercepted by [mesosphere.marathon.DebugModule$MetricsBehavior@985696]. This could indicate a bug. The method may be intercepted twice, or may not be intercepted at all. (com.google.inject.internal.ProxyFactory:main)
[2016-12-13 14:46:10,336] INFO Logging initialized @1841ms (org.eclipse.jetty.util.log:main)
[2016-12-13 14:46:10,849] INFO Slf4jLogger started (akka.event.slf4j.Slf4jLogger:marathon-akka.actor.default-dispatcher-2)
[2016-12-13 14:46:11,060] INFO Started TaskTrackerUpdateStepsProcessorImpl with steps:
* continueOnError(notifyHealthCheckManager)
* continueOnError(notifyRateLimiter)
* continueOnError(notifyLaunchQueue)
* continueOnError(emitUpdate)
* continueOnError(postTaskStatusEvent)
* continueOnError(scaleApp) (mesosphere.marathon.core.task.tracker.impl.TaskTrackerUpdateStepProcessorImpl:main)
[2016-12-13 14:46:11,128] INFO Calling reviveOffers is enabled. Use --disable_revive_offers_for_new_apps to disable. (mesosphere.marathon.core.flow.FlowModule:main)
[2016-12-13 14:46:11,195] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authenticator' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 14:46:11,200] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 14:46:11,202] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authorizer' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 14:46:11,203] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 14:46:11,205] INFO Started status update processor (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$ca7c928f:main)
[2016-12-13 14:46:11,321] INFO All actors suspended:
* Actor[akka://marathon/user/groupManager#1830533307]
* Actor[akka://marathon/user/launchQueue#-485770292]
* Actor[akka://marathon/user/killOverdueStagedTasks#540935184]
* Actor[akka://marathon/user/offersWantedForReconciliation#-876030651]
* Actor[akka://marathon/user/taskKillServiceActor#1912899513]
* Actor[akka://marathon/user/rateLimiter#1490654495]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#-1930600924]
* Actor[akka://marathon/user/expungeOverdueLostTasks#-405721147]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1467955826]
* Actor[akka://marathon/user/taskTracker#623538008]
* Actor[akka://marathon/user/offerMatcherManager#1760835888] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-13 14:46:11,417] INFO Adding HTTP support. (mesosphere.chaos.http.HttpModule:main)
[2016-12-13 14:46:11,418] INFO No HTTPS support configured. (mesosphere.chaos.http.HttpModule:main)
[2016-12-13 14:46:11,497] INFO Starting up (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-13 14:46:11,497] INFO Beginning run (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-13 14:46:11,498] INFO Will offer leadership after 500 milliseconds backoff (mesosphere.marathon.core.election.impl.CuratorElectionService:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-13 14:46:11,504] INFO jetty-9.3.z-SNAPSHOT (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,535] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 14:46:11,704] INFO Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,705] INFO Registering mesosphere.marathon.api.MarathonExceptionMapper as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.AppsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.TasksResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.EventSubscriptionsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.QueueResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.GroupsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.InfoResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.LeaderResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.DeploymentsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.ArtifactsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.SchemaResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.PluginsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,710] INFO Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' (com.sun.jersey.server.impl.application.WebApplicationImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,765] INFO Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,782] INFO Binding mesosphere.marathon.api.MarathonExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,027] INFO Using HA and therefore offering leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,028] INFO Will do leader election through localhost:2181 (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,073] WARN session timeout [10000] is less than connection timeout [15000] (org.apache.curator.CuratorZookeeperClient:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,096] INFO Starting (org.apache.curator.framework.imps.CuratorFrameworkImpl:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:zookeeper.version=3.5.0-alpha-1615249, built on 08/01/2014 22:13 GMT (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:host.name=openshift-installer-native-docker-compose (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.class.path=./bin/../target/marathon-assembly-1.3.6.jar (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.version=4.2.0-42-generic (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:user.dir=/marathon (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.memory.free=110MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.memory.max=880MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.memory.total=158MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@3411e275 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,194] INFO Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-13 14:46:12,201] INFO Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-13 14:46:12,212] INFO Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x158f4c9e1f90014, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-13 14:46:12,217] INFO State change: CONNECTED (org.apache.curator.framework.state.ConnectionStateManager:ForkJoinPool-2-worker-13-EventThread)
[2016-12-13 14:46:12,279] INFO Elected (LeaderLatchListener Interface) (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1)
[2016-12-13 14:46:12,284] INFO As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-13 14:46:12,302] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@5301f666 (org.apache.zookeeper.ZooKeeper:pool-1-thread-1)
[2016-12-13 14:46:12,310] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-13 14:46:12,310] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-13 14:46:12,323] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f90015, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-13 14:46:12,367] INFO Binding mesosphere.marathon.api.v2.AppsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,391] INFO Binding mesosphere.marathon.api.v2.TasksResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,398] INFO Binding mesosphere.marathon.api.v2.EventSubscriptionsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,398] INFO Event notification disabled. (mesosphere.marathon.core.event.EventModule:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,404] INFO Binding mesosphere.marathon.api.v2.QueueResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,425] INFO Binding mesosphere.marathon.api.v2.GroupsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,429] INFO Binding mesosphere.marathon.api.v2.InfoResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,431] INFO Binding mesosphere.marathon.api.v2.LeaderResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,433] INFO Binding mesosphere.marathon.api.v2.DeploymentsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,441] INFO Binding mesosphere.marathon.api.v2.ArtifactsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,443] INFO Binding mesosphere.marathon.api.v2.SchemaResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,445] INFO Binding mesosphere.marathon.api.v2.PluginsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,445] INFO Loading plugins implementing 'mesosphere.marathon.plugin.http.HttpRequestHandler' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,446] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,449] INFO Migration successfully applied for version Version(1, 3, 6) (mesosphere.marathon.state.Migration:ForkJoinPool-2-worker-5)
[2016-12-13 14:46:12,451] INFO Call preDriverStarts callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCache(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(events:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-13 14:46:12,468] INFO Finished preDriverStarts callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-13 14:46:12,469] INFO Started o.e.j.s.ServletContextHandler@65817241{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,484] INFO ExpungeOverdueLostTasksActor has started (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-4)
[2016-12-13 14:46:12,487] INFO TaskTrackerActor is starting. Task loading initiated. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,489] INFO started RateLimiterActor (mesosphere.marathon.core.launchqueue.impl.RateLimiterActor:marathon-akka.actor.default-dispatcher-4)
[2016-12-13 14:46:12,492] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-13 14:46:12,494] INFO Started. Will remain interested in offer reconciliation for 17500 milliseconds when needed. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-13 14:46:12,514] INFO All actors active:
* Actor[akka://marathon/user/groupManager#1830533307]
* Actor[akka://marathon/user/launchQueue#-485770292]
* Actor[akka://marathon/user/killOverdueStagedTasks#540935184]
* Actor[akka://marathon/user/offersWantedForReconciliation#-876030651]
* Actor[akka://marathon/user/taskKillServiceActor#1912899513]
* Actor[akka://marathon/user/rateLimiter#1490654495]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#-1930600924]
* Actor[akka://marathon/user/expungeOverdueLostTasks#-405721147]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1467955826]
* Actor[akka://marathon/user/taskTracker#623538008]
* Actor[akka://marathon/user/offerMatcherManager#1760835888] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 14:46:12,520] INFO Started ServerConnector@7fc3545e{HTTP/1.1,[http/1.1]}{0.0.0.0:7070} (org.eclipse.jetty.server.ServerConnector:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,520] INFO Started @4026ms (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,521] INFO All services up and running. (mesosphere.marathon.Main$:main)
[2016-12-13 14:46:12,524] INFO About to load 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5)
[2016-12-13 14:46:12,536] INFO Loaded 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5)
[2016-12-13 14:46:12,545] INFO Task loading complete. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 14:46:12,555] INFO interested in offers for reservation reconciliation because of becoming leader (until 2016-12-13T14:46:29.995Z) (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-13 14:46:12,588] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,589] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,589] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,589] INFO => Schedule next revive at 2016-12-13T14:46:17.586Z in 4998 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,638] INFO Create new Scheduler Driver with frameworkId: Some(value: "44a35e16-dc32-4f91-afac-33dfff498944-0000"
) and scheduler mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$EnhancerByGuice$$c730b9b8@15d42ccb (mesosphere.marathon.MarathonSchedulerDriver$:pool-1-thread-1)
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1213 14:46:12.777942 192 sched.cpp:1697]
**************************************************
Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address.
**************************************************
I1213 14:46:12.791020 196 sched.cpp:226] Version: 1.0.1
2016-12-13 14:46:12,791:134(0x7f351b0c7700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-12-13 14:46:12,791:134(0x7f351b0c7700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@767: Client environment:user.dir=/marathon
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7f3522fb1d80 sessionId=0 sessionPasswd=<null> context=0x7f352403b5c0 flags=0
[2016-12-13 14:46:12,799] INFO Reset offerLeadership backoff (mesosphere.marathon.core.election.impl.ExponentialBackoff:pool-1-thread-1)
[2016-12-13 14:46:12,799] INFO Became active. Accepting event streaming requests. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,800] INFO Starting scheduler actor (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-8)
2016-12-13 14:46:12,803:134(0x7f35198c4700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
[2016-12-13 14:46:12,811] INFO Scheduler actor ready (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-6)
2016-12-13 14:46:12,826:134(0x7f35198c4700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f4c9e1f90016, negotiated timeout=10000
I1213 14:46:12.828322 205 group.cpp:349] Group process (group(1)@127.0.0.1:35047) connected to ZooKeeper
I1213 14:46:12.829164 205 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1213 14:46:12.829583 205 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
I1213 14:46:12.833498 205 detector.cpp:152] Detected a new leader: (id='1')
I1213 14:46:12.834187 205 group.cpp:706] Trying to get '/mesos/json.info_0000000001' in ZooKeeper
I1213 14:46:12.836730 205 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
I1213 14:46:12.837736 205 sched.cpp:330] New master detected at master@127.0.0.1:5050
I1213 14:46:12.838312 205 sched.cpp:341] No credentials provided. Attempting to register without authentication
I1213 14:46:12.843353 205 sched.cpp:743] Framework registered with 44a35e16-dc32-4f91-afac-33dfff498944-0000
[2016-12-13 14:46:12,846] INFO Creating tombstone for old twitter commons leader election (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1)
[2016-12-13 14:46:12,872] INFO Registered as 44a35e16-dc32-4f91-afac-33dfff498944-0000 to master '03f81417-51c9-4055-adbe-c1fb74fc8ab4' (mesosphere.marathon.MarathonScheduler$$EnhancerByGuice$$1ef061b0:Thread-14)
[2016-12-13 14:46:12,872] INFO Store framework id: value: "44a35e16-dc32-4f91-afac-33dfff498944-0000"
 (mesosphere.util.state.FrameworkIdUtil:Thread-14)
[2016-12-13 14:46:12,895] INFO Received reviveOffers notification: SchedulerRegisteredEvent (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-13 14:46:16,658] INFO 10.0.2.2 - - [13/Dec/2016:14:46:16 +0000] "GET //localhost:7070/v2/deployments HTTP/1.1" 200 22 "http://localhost:7070/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" 14 (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$c1e74978:qtp1755811644-35)
[2016-12-13 14:46:16,660] INFO 10.0.2.2 - - [13/Dec/2016:14:46:16 +0000] "GET //localhost:7070/v2/groups?embed=group.groups&embed=group.apps&embed=group.apps.deployments&embed=group.apps.counts&embed=group.apps.readiness HTTP/1.1" 200 95 "http://localhost:7070/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" 15 (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$c1e74978:qtp1755811644-36)
[2016-12-13 14:46:16,666] INFO 10.0.2.2 - - [13/Dec/2016:14:46:16 +0000] "GET //localhost:7070/v2/queue HTTP/1.1" 200 32 "http://localhost:7070/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" 15 (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$c1e74978:qtp1755811644-34)
[2016-12-13 14:46:17,603] INFO Received TimedCheck (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 14:46:17,604] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 14:46:17,609] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 14:46:17,612] INFO => Schedule next revive at 2016-12-13T14:46:22.602Z in 4998 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8)

Let us connect to the Marathon portal:

We have chosen Marathon to run on port 7070. In my case, the Docker host VirtualBox VM has a NATed interface only, so I need to open the Marathon port on VirtualBox:

2016-12-13-14_59_36-regel-fur-port-weiterleitung

Now we can access the Marathon dashboard:

2016-12-13-15_17_35-marathon

Let us see, whether Maraton is visible on Mesos:

2016-12-13-15_44_39-mesos

Yes, the Marathon framework can be seen on the Mesos master; perfect.

thumps_up_3

Now, our topology now looks like follows:
2016-12-14-02_14_43-mesos-marathon-architecture-google-slides

However, we have not started any Marathon application yet. Let us do so now.

Step 6: Start a “Hello World” Application via Marathon Web Portal in a Mesos Container

Let us create an application running in a Mesos container on the Marathon web portal by clicking the Create Application button.

We choose

  • ID: while-loop-hello-world
    • ID needs to consist of lowercase letters, digits, hyphens, “.”, “..”
  • CPUs: 0.1
    • If you keep the default of 1, then you might hit resource problems, if you have not 4 CPUs available for the 4 instances. In this case, one or more instances might be found in the “Waiting” state.
  • Memory: 32 MiB
    • 32 MiB is the minimum value supported
  • Disk Space: 0
  • Instances: 4
  • Command: while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
    • we have redirected the output of the script, because the STDERR/STDOUT retrieval does not work on the Marathon portal, currently (see Appendix A.5).

2016-12-18-12_46_08-marathon

After less than 5 sec, we see, that all of the 4 instances are up and running:

2016-12-18-12_46_16-marathon

On the Docker host, we can see the four instances running:

$ ps -ef | grep Hello | grep -v grep
root 14137 14095 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14138 14096 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14141 14116 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14142 14127 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log

We also can find the log files on the Docker host:

$ sudo find / -name this_is_my_output.log
/vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.927a5843-c517-11e6-ad85-02422b10e522/runs/0f079dcc-25e7-4ad6-b263-4761370a6173/this_is_my_output.log
/vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.9285a2e4-c517-11e6-ad85-02422b10e522/runs/44a5c7dd-59f3-4b59-8d94-33754a3cbdc3/this_is_my_output.log
/vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.92861815-c517-11e6-ad85-02422b10e522/runs/5400a1d9-582f-4d59-9b81-a719d21c60a1/this_is_my_output.log
/vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.9286db66-c517-11e6-ad85-02422b10e522/runs/f3145a9f-2c88-4f99-a8de-618a4157b9e8/this_is_my_output.log

And we can see the output:

$ sudo tail -F /vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.927a5843-c517-11e6-ad85-02422b10e522/runs/0f079dcc-25e7-4ad6-b263-4761370a6173/this_is_my_output.log
I am a Hello World script
I am a Hello World script
I am a Hello World script
I am a Hello World script
I am a Hello World script
...

thumps_up_3

Step 7: Test of Application Resiliency

Step 7.1: Killing a Process and Testing, whether it was restarted automatically

Now, let us see, what happens, if one of the processes dies. For that, we just will kill one of the “Hello World” processes found on the Docker host:

(dockerhost)$ ps -ef | grep Hello | grep -v grep
root 14137 14095 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14138 14096 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14141 14116 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14142 14127 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
(dockerhost)$ sudo kill -9 14137
(dockerhost)$ ps -ef | grep Hello | grep -v grep
root 14138 14096 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14141 14116 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14142 14127 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 19818 19807 1 12:04 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log

We can clearly see that the process was killed successfully, and a new process was started automatically.

thumps_up_3

Step 7.2: Find Traces of the killed Process in the Logs

The output log of the old process can still be retrieved:

$ cat -F /vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33
dfff498944-0000/executors/while-loop-hello-world.92861815-c517-11e6-ad85-02422b10e522/runs/5400a1d9-582f-4d59-9b81-a719d21c60a1/this_is_my_output.log
...
I am a Hello World script
I am a Hello World script
I am a Hello World script
I am a Hello World script
(EOF)

However, we cannot see any traces that something went wrong. So, how can we troubleshoot, if a process dies unexpectedly?

Here, we are: the Marathon portal is providing us with information on what happened:

2016-12-18-13_20_47-marathon

thumps_up_3

Step 8: Test: Resource Over-subsription not supported per Default?

In this test, we observe, what happens, if an application instance is exceeding the offered resource limits. For that , we will try to start four instances with 1 CPU core each, while the slave provides 2 CPUs and not the required 4 CPUs:

2016-12-16-20_52_42-marathon

We press Create Application again and see that 2 of the 4 applications are started soon:

2016-12-16-20_54_49-marathon

Two of the task instances were started successfully, and two instances are waiting for resources and are “Unscheduled”. This is the expected behavior:

2016-12-16-21_03_00-marathon

thumps_up_3

On the Mesos Dashboard, we can see that the offered 2 CPUs are used by two instances already, leaving no CPU resources for the other two instances:

2016-12-17-16_27_55-mesos

Aurora vs. Marathon: Consider a situation, where you want to run high priority production applications with low priority development applications on the same hardware. Mesosphere Marathon does not provide any good answer to such situations. In case you hit such a situation, consider to using Apache Aurora instead of Mesosphere Marathon. Apache Aurora allows high priority application to preempt low priority applications.

Mesos: about hard-limit of CPU resources: Mesos’ CPU reservation feels similar to a hard assignment of resources (since over-subscription is not supported per default, as we have seen above), but under the hood, Mesos does not apply hard limits on the CPU usage, unless the Mesos slave has --cgroups_enable_cfs  (CFS = Completely Fair Scheduler) enabled. See also the second answer on this StackOverflow question. For more information on over-subscription by Mesos, see this Apache Mesos Documentation page.

Conclusion: Per default, resource over-subscription is not allowed by Mesos. See this Apache Mesos Documentation page on more information about over-subscription.

Step 9: Start a “Hello World” Application via Marathon Web Portal in a Docker Container

In this step we learn how to run an application in a Docker container instead of a Mesos container. We will perform similar steps as in Step 6.

We choose

  • ID: while-loop-hello-world-container-small
    • ID needs to consist of lowercase letters, digits, hyphens, “.”, “..”
  • CPUs: 0.1
    • If you keep the default of 1, then you might hit resource problems, if you have not 4 CPUs available for the 4 instances. In this case, one or more instances might be found in the “Waiting” state.
  • Memory: 32 MiB
    • 32 MiB is the minimum value supported
  • Disk Space: 0
  • Instances: 4
  • Command: while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
    • we have redirected the output of the script, because the STDERR/STDOUT retrieval does not work on the Marathon portal, currently (see Appendix A.5).
  • Docker: ubuntu:latest

We click “Create Application”

2016-12-17-16_40_36-marathon

Choose a low number of CPUs (0.1 CPUs in my case), the lowest number of Memory and four instances, together with a while loop as command:

2016-12-18-15_39_21-marathon

In the Docker Container tab, we choose an Ubuntu image.

2016-12-17-16_51_26-marathon

First, we see “waiting”,

2016-12-17-16_51_48-marathon

then we see that all four instances are “running”:

2016-12-18-09_17_33-marathon

2016-12-18-15_43_26-marathon

On the Docker host, we can see the four Docker containers that have been started:

$ docker ps
CONTAINER ID        IMAGE                                                   COMMAND                  CREATED             STATUS              PORTS               NAMES
78e346ddb3eb        ubuntu:latest                                           "/bin/sh -c 'while tr"   About an hour ago   Up About an hour                        mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.13aeacaa-20d5-4628-8e65-f82a0fc71724
809ad6513236        ubuntu:latest                                           "/bin/sh -c 'while tr"   About an hour ago   Up About an hour                        mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.c42d9e0c-f66e-42be-ac0e-46bc8f44e47b
de7523c00d41        ubuntu:latest                                           "/bin/sh -c 'while tr"   About an hour ago   Up About an hour                        mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e7fcebfb-f301-4369-83eb-2782832d83a9
d3729dad1bed        ubuntu:latest                                           "/bin/sh -c 'while tr"   About an hour ago   Up About an hour                        mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e924e32b-521b-4708-b538-bc2973094718

The processes are:

$ ps -ef | grep Hello | grep -v grep
root     19306 19258  0 14:40 ?        00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31730 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f27ecd8a-c52f-11e6-ad85-02422b10e522 -e PORT=31730 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31730 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31730 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e924e32b-521b-4708-b538-bc2973094718 -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f27ecd8a-c52f-11e6-ad85-02422b10e522/runs/e924e32b-521b-4708-b538-bc2973094718:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e924e32b-521b-4708-b538-bc2973094718 ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19319 19280  0 14:40 ?        00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31873 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f27e5859-c52f-11e6-ad85-02422b10e522 -e PORT=31873 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31873 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31873 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.c42d9e0c-f66e-42be-ac0e-46bc8f44e47b -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f27e5859-c52f-11e6-ad85-02422b10e522/runs/c42d9e0c-f66e-42be-ac0e-46bc8f44e47b:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.c42d9e0c-f66e-42be-ac0e-46bc8f44e47b ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19320 19281  0 14:40 ?        00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31245 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f2792838-c52f-11e6-ad85-02422b10e522 -e PORT=31245 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31245 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31245 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e7fcebfb-f301-4369-83eb-2782832d83a9 -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f2792838-c52f-11e6-ad85-02422b10e522/runs/e7fcebfb-f301-4369-83eb-2782832d83a9:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e7fcebfb-f301-4369-83eb-2782832d83a9 ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19368 19352  0 14:40 ?        00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19379 19334  0 14:40 ?        00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31928 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f27f90db-c52f-11e6-ad85-02422b10e522 -e PORT=31928 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31928 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31928 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.13aeacaa-20d5-4628-8e65-f82a0fc71724 -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f27f90db-c52f-11e6-ad85-02422b10e522/runs/13aeacaa-20d5-4628-8e65-f82a0fc71724:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.13aeacaa-20d5-4628-8e65-f82a0fc71724 ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19422 19406  0 14:40 ?        00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19443 19427  0 14:40 ?        00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19473 19457  0 14:40 ?        00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log

We can see that there are four docker run commands and four while loops running. We had assigned 0.1 CPUs to each application instance. This has been translated into a CPU share of 1024/10 = ~102.

Note, that Docker does not allow to hard-limit the used CPU per default, if --cgroups_enable_cfs (CFS) is not set on the Docker run command. Instead, CPU shares are configured for the Docker container.

Appendix A: Errors & Caveats

Appendix A.1: Mesos Portal Error: Failed to connect to 127.0.0.1:5050!

Problem:

After starting the Mesos Portal in a Browser, we see following symptom:

2016-12-12-18_49_20-mesos

After clicking “Try now”, the problem shows up immediately again. After clicking the Browser’s page reload button, it looks better, but the problem will show up soon again.

Status: Open

I have not found any solution yet. I have not yet tested any older version.

Appendix A.2: Critical Error: Mesos Master: Lost leadership… committing suicide!

  • Mesos master 1.1.0 run in a Docker container (image mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404)
  • ZooKeeper 1.5.2 run in a Docker container (image netflixoss/exhibitor:1.5.2)

Symptoms:

From time to time, we get following critical error log of the Mesos Master:

...
2016-12-18 08:15:59,593:22(0x7f18037fe700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 31198753ms
2016-12-18 08:15:59,595:22(0x7f1820ef8700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 31198755ms
2016-12-18 08:15:59,597:22(0x7f1820ef8700):ZOO_ERROR@handle_socket_error_msg@1735: Socket [127.0.0.1:2181] zk retcode=-4, errno=32(Broken pipe): failed while flushing send queue
2016-12-18 08:15:59,597:22(0x7f18037fe700):ZOO_ERROR@handle_socket_error_msg@1746: Socket [127.0.0.1:2181] zk retcode=-4, errno=112(Host is down): failed while receiving a server response
2016-12-18 08:15:59,599:22(0x7f1820ef8700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
I1218 08:15:59.601364 25 group.cpp:451] Lost connection to ZooKeeper, attempting to reconnect ...
I1218 08:15:59.602128 26 group.cpp:451] Lost connection to ZooKeeper, attempting to reconnect ...
2016-12-18 08:15:59,604:22(0x7f1820ef8700):ZOO_ERROR@handle_socket_error_msg@1764: Socket [127.0.0.1:2181] zk retcode=-112, errno=116(Stale file handle): sessionId=0x158f4c9e1f90035 has expired.
I1218 08:15:59.605609 28 group.cpp:510] ZooKeeper session expired
I1218 08:15:59.606210 29 contender.cpp:217] Membership cancelled: 4
2016-12-18 08:15:59,606:22(0x7f1823737700):ZOO_INFO@zookeeper_close@2543: Freeing zookeeper resources for sessionId=0x158f4c9e1f90035

Lost leadership... committing suicide!
2016-12-18 08:15:59,607:22(0x7f182573b700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-12-18 08:15:59,609:22(0x7f182573b700):ZOO_INFO@log_env@767: Client environment:user.dir=/
2016-12-18 08:15:59,609:22(0x7f182573b700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7f182e49c200 sessionId=0 sessionPasswd=<null> context=0x7f1810008470 flags=0
2016-12-18 08:15:59,609:22(0x7f1803fff700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
(container)#

Resolution: None

I have found a similar issue for Marathon here. Both problems seem to be caused by ZooKeeper problems, but it is not clear, how to resolve the issue.

Appendix A.3: ZooKeeper: continuous warnings ‘Exceeded deadline’ and ‘Current disk usage’

After starting the ZooKeeper, I continuously see log messages like follows:

2016-12-18 10:24:14,664:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 77ms
I1218 10:24:58.446185 15 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days
I1218 10:25:58.450482 14 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days
2016-12-18 10:26:21,523:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 26ms
I1218 10:26:58.454617 9 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days
2016-12-18 10:27:58,326:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 15ms
I1218 10:27:58.458858 12 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days
2016-12-18 10:28:01,681:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 20ms

Resolution: None

Since the warnings do not seem to be critical, I have not yet dug into the problem.

Appendix A.4: Marathon Portal Error: no Reaction upon Restart Request

Symptoms:

If we try to restart an application, there is no reaction whatsoever:

2016-12-17-16_38_48-marathon

2016-12-16-21_07_20-marathon

2016-12-16-21_08_33-marathon

Running ps -ef on the Docker host yields the same process numbers before and after pressing the Restart button:

(dockerhost)$ ps -ef | grep while | grep -v grep
root 13142 29300 0 19:54 ? 00:00:00 mesos-containerizer launch --command={"arguments":["mesos-executor","--launcher_dir=\/usr\/libexec\/mesos"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-executor"} --environment={"HOST":"openshift-installer-native-docker-compose","LIBPROCESS_PORT":"0","MARATHON_APP_ID":"\/while-loop-hello-world","MARATHON_APP_LABELS":"","MARATHON_APP_RESOURCE_CPUS":"1.0","MARATHON_APP_RESOURCE_DISK":"0.0","MARATHON_APP_RESOURCE_GPUS":"0","MARATHON_APP_RESOURCE_MEM":"32.0","MARATHON_APP_VERSION":"2016-12-16T19:54:30.939Z","MESOS_AGENT_ENDPOINT":"127.0.0.1:5051","MESOS_CHECKPOINT":"1","MESOS_DIRECTORY":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522\/runs\/a67b861b-c20c-4d94-8db5-e956b3dea8ab","MESOS_EXECUTOR_ID":"while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522","MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD":"5secs","MESOS_FRAMEWORK_ID":"44a35e16-dc32-4f91-afac-33dfff498944-0000","MESOS_HTTP_COMMAND_EXECUTOR":"0","MESOS_NATIVE_JAVA_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_NATIVE_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_RECOVERY_TIMEOUT":"15mins","MESOS_SANDBOX":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522\/runs\/a67b861b-c20c-4d94-8db5-e956b3dea8ab","MESOS_SLAVE_ID":"a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0","MESOS_SLAVE_PID":"slave(1)@127.0.0.1:5051","MESOS_SUBSCRIPTION_BACKOFF_MAX":"2secs","MESOS_TASK_ID":"while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522","PATH":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin","PORT":"31690","PORT0":"31690","PORTS":"31690","PORT_10000":"31690"} --help=false --pipe_read=15 --pipe_write=16 --pre_exec_commands=[] --runtime_directory=/var/run/mesos/containers/a67b861b-c20c-4d94-8db5-e956b3dea8ab --unshare_namespace_mnt=false --working_directory=/var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522/runs/a67b861b-c20c-4d94-8db5-e956b3dea8ab
root 13144 29300 0 19:54 ? 00:00:00 mesos-containerizer launch --command={"arguments":["mesos-executor","--launcher_dir=\/usr\/libexec\/mesos"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-executor"} --environment={"HOST":"openshift-installer-native-docker-compose","LIBPROCESS_PORT":"0","MARATHON_APP_ID":"\/while-loop-hello-world","MARATHON_APP_LABELS":"","MARATHON_APP_RESOURCE_CPUS":"1.0","MARATHON_APP_RESOURCE_DISK":"0.0","MARATHON_APP_RESOURCE_GPUS":"0","MARATHON_APP_RESOURCE_MEM":"32.0","MARATHON_APP_VERSION":"2016-12-16T19:54:30.939Z","MESOS_AGENT_ENDPOINT":"127.0.0.1:5051","MESOS_CHECKPOINT":"1","MESOS_DIRECTORY":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522\/runs\/cb6ec83f-a228-47f5-ad06-ee6675134bb0","MESOS_EXECUTOR_ID":"while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522","MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD":"5secs","MESOS_FRAMEWORK_ID":"44a35e16-dc32-4f91-afac-33dfff498944-0000","MESOS_HTTP_COMMAND_EXECUTOR":"0","MESOS_NATIVE_JAVA_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_NATIVE_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_RECOVERY_TIMEOUT":"15mins","MESOS_SANDBOX":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522\/runs\/cb6ec83f-a228-47f5-ad06-ee6675134bb0","MESOS_SLAVE_ID":"a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0","MESOS_SLAVE_PID":"slave(1)@127.0.0.1:5051","MESOS_SUBSCRIPTION_BACKOFF_MAX":"2secs","MESOS_TASK_ID":"while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522","PATH":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin","PORT":"31996","PORT0":"31996","PORTS":"31996","PORT_10000":"31996"} --help=false --pipe_read=15 --pipe_write=16 --pre_exec_commands=[] --runtime_directory=/var/run/mesos/containers/cb6ec83f-a228-47f5-ad06-ee6675134bb0 --unshare_namespace_mnt=false --working_directory=/var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522/runs/cb6ec83f-a228-47f5-ad06-ee6675134bb0
root 13164 13143 0 19:54 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done
root 13165 13154 0 19:54 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done

Status: Workaround given

The workaround is to destroy the application and run a new application with the exact same parameters.

Appendix A.5: Marathon Portal Error: Retrieval of STDERR and STDOUT fails

Seen on Marathon 1.3.6 with both, a shell script on Mesos container as well as with a shell script on a Docker container.

Symptoms:

If you try to retrieve the error log of output of a hello world application on the Marathon portal, we get the error message “Sorry, there was a problem retrieving file. Click to retry.” Retrying does not help.

2016-12-16-21_04_29-marathon

Status: Open (idea for a workaround given below)

A possible workaround is to run the script with redirection into a file you can retrieve later, E.g. we could define a script like

while true; do echo "I am a Hello World script" 2>&1 1> this_is_my_output.log

After having found the working directory of Mesos-started scripts, you can retrieve the log file from the slave’s file system.

Appendix A.6: Critical Marathon Error: Flag 'work_dir' is required, but it was not provided

Marathon 1.3.6 run as Docker container (mesosphere/marathon:latest; image ID 9d03a8fd0fdd)

Symptoms

When we try to start Marathon as seen below, the attempt fails with the exception:

Failed to start a local cluster while loading agent flags from the environment: Flag 'work_dir' is required, but it was not provided

Full log:

(dockerhost)$ sudo docker run -it --net=host -v `pwd`:/work_dir --entrypoint=bash mesosphere/marathon
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[2016-12-12 21:01:46,268] INFO Starting Marathon 1.3.6/unknown with --master local --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main)
[2016-12-12 21:01:46,588] WARN Method [public javax.ws.rs.core.Response mesosphere.marathon.api.MarathonExceptionMapper.toResponse(java.lang.Throwable)] is synthetic and is being intercepted by [mesosphere.marathon.DebugModule$MetricsBehavior@985696]. This could indicate a bug. The method may be intercepted twice, or may not be intercepted at all. (com.google.inject.internal.ProxyFactory:main)
[2016-12-12 21:01:46,899] INFO Logging initialized @1979ms (org.eclipse.jetty.util.log:main)
[2016-12-12 21:01:47,322] INFO Slf4jLogger started (akka.event.slf4j.Slf4jLogger:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:01:47,517] INFO Started TaskTrackerUpdateStepsProcessorImpl with steps:
* continueOnError(notifyHealthCheckManager)
* continueOnError(notifyRateLimiter)
* continueOnError(notifyLaunchQueue)
* continueOnError(emitUpdate)
* continueOnError(postTaskStatusEvent)
* continueOnError(scaleApp) (mesosphere.marathon.core.task.tracker.impl.TaskTrackerUpdateStepProcessorImpl:main)
[2016-12-12 21:01:47,579] INFO Calling reviveOffers is enabled. Use --disable_revive_offers_for_new_apps to disable. (mesosphere.marathon.core.flow.FlowModule:main)
[2016-12-12 21:01:47,657] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authenticator' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:01:47,662] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:01:47,665] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authorizer' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:01:47,666] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:01:47,668] INFO Started status update processor (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$ca7c928f:main)
[2016-12-12 21:01:47,779] INFO All actors suspended:
* Actor[akka://marathon/user/expungeOverdueLostTasks#1718670650]
* Actor[akka://marathon/user/rateLimiter#1278378489]
* Actor[akka://marathon/user/groupManager#890512610]
* Actor[akka://marathon/user/taskTracker#-11699813]
* Actor[akka://marathon/user/launchQueue#1496971565]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#-557182315]
* Actor[akka://marathon/user/killOverdueStagedTasks#-1648400379]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1003103868]
* Actor[akka://marathon/user/offerMatcherManager#219115497]
* Actor[akka://marathon/user/offersWantedForReconciliation#-1104494480]
* Actor[akka://marathon/user/taskKillServiceActor#-521724399] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-12 21:01:47,891] INFO Adding HTTP support. (mesosphere.chaos.http.HttpModule:main)
[2016-12-12 21:01:47,892] INFO No HTTPS support configured. (mesosphere.chaos.http.HttpModule:main)
[2016-12-12 21:01:47,971] INFO Starting up (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:01:47,972] INFO Beginning run (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:01:47,974] INFO Will offer leadership after 500 milliseconds backoff (mesosphere.marathon.core.election.impl.CuratorElectionService:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:01:47,983] INFO jetty-9.3.z-SNAPSHOT (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,018] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-12 21:01:48,229] INFO Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.MarathonExceptionMapper as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.AppsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.TasksResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.EventSubscriptionsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.QueueResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.GroupsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.InfoResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.LeaderResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.DeploymentsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.ArtifactsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.SchemaResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.PluginsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,238] INFO Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' (com.sun.jersey.server.impl.application.WebApplicationImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,311] INFO Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,328] INFO Binding mesosphere.marathon.api.MarathonExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,527] INFO Using HA and therefore offering leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,528] INFO Will do leader election through localhost:2181 (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,553] WARN session timeout [10000] is less than connection timeout [15000] (org.apache.curator.CuratorZookeeperClient:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,565] INFO Starting (org.apache.curator.framework.imps.CuratorFrameworkImpl:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:zookeeper.version=3.5.0-alpha-1615249, built on 08/01/2014 22:13 GMT (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:host.name=openshift-installer-native-docker-compose (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.class.path=./bin/../target/marathon-assembly-1.3.6.jar (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,577] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,577] INFO Client environment:os.version=4.2.0-42-generic (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,577] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,577] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,577] INFO Client environment:user.dir=/marathon (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,578] INFO Client environment:os.memory.free=121MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,578] INFO Client environment:os.memory.max=880MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,578] INFO Client environment:os.memory.total=157MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,578] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@7531678e (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,623] INFO Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:01:48,633] INFO Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:01:48,653] INFO Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x158f4c9e1f90004, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:01:48,661] INFO State change: CONNECTED (org.apache.curator.framework.state.ConnectionStateManager:ForkJoinPool-2-worker-13-EventThread)
[2016-12-12 21:01:48,722] INFO Elected (LeaderLatchListener Interface) (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1)
[2016-12-12 21:01:48,723] INFO As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:01:48,734] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@5ca52bc0 (org.apache.zookeeper.ZooKeeper:pool-1-thread-1)
[2016-12-12 21:01:48,748] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:01:48,749] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:01:48,755] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f90005, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:01:48,854] INFO Migration successfully applied for version Version(1, 3, 6) (mesosphere.marathon.state.Migration:ForkJoinPool-2-worker-7)
[2016-12-12 21:01:48,854] INFO Call preDriverStarts callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCache(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(events:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:01:48,868] INFO Finished preDriverStarts callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:01:48,885] INFO TaskTrackerActor is starting. Task loading initiated. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-12 21:01:48,893] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:01:48,897] INFO ExpungeOverdueLostTasksActor has started (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-12 21:01:48,900] INFO About to load 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-3)
[2016-12-12 21:01:48,901] INFO Loaded 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-3)
[2016-12-12 21:01:48,902] INFO Started. Will remain interested in offer reconciliation for 17500 milliseconds when needed. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:01:48,911] INFO Task loading complete. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-7)
[2016-12-12 21:01:48,914] INFO Create new Scheduler Driver with frameworkId: None and scheduler mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$EnhancerByGuice$$c730b9b8@15d42ccb (mesosphere.marathon.MarathonSchedulerDriver$:pool-1-thread-1)
[2016-12-12 21:01:48,924] INFO started RateLimiterActor (mesosphere.marathon.core.launchqueue.impl.RateLimiterActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-12 21:01:48,925] INFO All actors active:
* Actor[akka://marathon/user/expungeOverdueLostTasks#1718670650]
* Actor[akka://marathon/user/rateLimiter#1278378489]
* Actor[akka://marathon/user/groupManager#890512610]
* Actor[akka://marathon/user/taskTracker#-11699813]
* Actor[akka://marathon/user/launchQueue#1496971565]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#-557182315]
* Actor[akka://marathon/user/killOverdueStagedTasks#-1648400379]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1003103868]
* Actor[akka://marathon/user/offerMatcherManager#219115497]
* Actor[akka://marathon/user/offersWantedForReconciliation#-1104494480]
* Actor[akka://marathon/user/taskKillServiceActor#-521724399] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-12 21:01:48,994] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-7)
[2016-12-12 21:01:48,994] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-7)
[2016-12-12 21:01:48,995] INFO interested in offers for reservation reconciliation because of becoming leader (until 2016-12-12T21:02:06.407Z) (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:01:49,008] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-12 21:01:49,008] INFO => Schedule next revive at 2016-12-12T21:01:53.993Z in 4986 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5)
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1212 21:01:49.134382 63 sched.cpp:1697]
**************************************************
Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address.
**************************************************
I1212 21:01:49.179211 63 leveldb.cpp:174] Opened db in 40.639535ms
I1212 21:01:49.188115 63 leveldb.cpp:181] Compacted db in 8.801601ms
I1212 21:01:49.188176 63 leveldb.cpp:196] Created db iterator in 11072ns
I1212 21:01:49.188189 63 leveldb.cpp:202] Seeked to beginning of db in 758ns
I1212 21:01:49.188194 63 leveldb.cpp:271] Iterated through 0 keys in the db in 202ns
I1212 21:01:49.188232 63 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1212 21:01:49.189082 70 recover.cpp:451] Starting replica recovery
I1212 21:01:49.189720 70 recover.cpp:477] Replica is in EMPTY status
I1212 21:01:49.189486 73 master.cpp:375] Master 8be08601-c962-42fa-9f78-7beda337b644 (openshift-installer-native-docker-compose) started on 127.0.0.1:35322
I1212 21:01:49.190399 73 master.cpp:377] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/tmp/mesos/local/9Y1S6q" --zk_session_timeout="10secs"
W1212 21:01:49.190558 73 master.cpp:380]
**************************************************
Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address.
**************************************************
I1212 21:01:49.190712 73 master.cpp:429] Master allowing unauthenticated frameworks to register
I1212 21:01:49.191285 73 master.cpp:443] Master allowing unauthenticated agents to register
I1212 21:01:49.191337 73 master.cpp:457] Master allowing HTTP frameworks to register without authentication
I1212 21:01:49.191366 73 master.cpp:499] Using default 'crammd5' authenticator
Failed to start a local cluster while loading agent flags from the environment: Flag 'work_dir' is required, but it was not provided
W1212 21:01:49.192327 73 authenticator.cpp:512] No credentials provided, authentication requests will be refused
I1212 21:01:49.192360 73 authenticator.cpp:519] Initializing server SASL
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 --work_dir
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[scallop] Error: Unknown option 'work_dir'
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 -work_dir=/work_dir
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[scallop] Error: Unknown option 'w'
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 --work_dir=/work_dir
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[scallop] Error: Unknown option 'work_dir=/work_dir'
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[2016-12-12 21:05:35,546] INFO Starting Marathon 1.3.6/unknown with --master local --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main)
[2016-12-12 21:05:35,823] WARN Method [public javax.ws.rs.core.Response mesosphere.marathon.api.MarathonExceptionMapper.toResponse(java.lang.Throwable)] is synthetic and is being intercepted by [mesosphere.marathon.DebugModule$MetricsBehavior@985696]. This could indicate a bug. The method may be intercepted twice, or may not be intercepted at all. (com.google.inject.internal.ProxyFactory:main)
[2016-12-12 21:05:36,115] INFO Logging initialized @1856ms (org.eclipse.jetty.util.log:main)
[2016-12-12 21:05:36,621] INFO Slf4jLogger started (akka.event.slf4j.Slf4jLogger:marathon-akka.actor.default-dispatcher-4)
[2016-12-12 21:05:36,800] INFO Started TaskTrackerUpdateStepsProcessorImpl with steps:
* continueOnError(notifyHealthCheckManager)
* continueOnError(notifyRateLimiter)
* continueOnError(notifyLaunchQueue)
* continueOnError(emitUpdate)
* continueOnError(postTaskStatusEvent)
* continueOnError(scaleApp) (mesosphere.marathon.core.task.tracker.impl.TaskTrackerUpdateStepProcessorImpl:main)
[2016-12-12 21:05:36,864] INFO Calling reviveOffers is enabled. Use --disable_revive_offers_for_new_apps to disable. (mesosphere.marathon.core.flow.FlowModule:main)
[2016-12-12 21:05:36,930] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authenticator' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:05:36,936] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:05:36,940] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authorizer' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:05:36,940] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:05:36,942] INFO Started status update processor (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$ca7c928f:main)
[2016-12-12 21:05:37,047] INFO All actors suspended:
* Actor[akka://marathon/user/taskKillServiceActor#-984511042]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#671912224]
* Actor[akka://marathon/user/killOverdueStagedTasks#-698941106]
* Actor[akka://marathon/user/groupManager#-958454684]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1917290502]
* Actor[akka://marathon/user/offerMatcherManager#2056148696]
* Actor[akka://marathon/user/expungeOverdueLostTasks#-308686880]
* Actor[akka://marathon/user/rateLimiter#-1808696027]
* Actor[akka://marathon/user/offersWantedForReconciliation#362759878]
* Actor[akka://marathon/user/launchQueue#-1787605975]
* Actor[akka://marathon/user/taskTracker#1265821735] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-12 21:05:37,115] INFO Adding HTTP support. (mesosphere.chaos.http.HttpModule:main)
[2016-12-12 21:05:37,116] INFO No HTTPS support configured. (mesosphere.chaos.http.HttpModule:main)
[2016-12-12 21:05:37,174] INFO Starting up (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:05:37,181] INFO jetty-9.3.z-SNAPSHOT (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,185] INFO Beginning run (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:05:37,188] INFO Will offer leadership after 500 milliseconds backoff (mesosphere.marathon.core.election.impl.CuratorElectionService:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:05:37,219] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:05:37,416] INFO Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,425] INFO Registering mesosphere.marathon.api.MarathonExceptionMapper as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,425] INFO Registering mesosphere.marathon.api.v2.AppsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,426] INFO Registering mesosphere.marathon.api.v2.TasksResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,428] INFO Registering mesosphere.marathon.api.v2.EventSubscriptionsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,429] INFO Registering mesosphere.marathon.api.v2.QueueResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,430] INFO Registering mesosphere.marathon.api.v2.GroupsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,431] INFO Registering mesosphere.marathon.api.v2.InfoResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,432] INFO Registering mesosphere.marathon.api.v2.LeaderResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,432] INFO Registering mesosphere.marathon.api.v2.DeploymentsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,433] INFO Registering mesosphere.marathon.api.v2.ArtifactsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,433] INFO Registering mesosphere.marathon.api.v2.SchemaResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,434] INFO Registering mesosphere.marathon.api.v2.PluginsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,437] INFO Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' (com.sun.jersey.server.impl.application.WebApplicationImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,491] INFO Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,508] INFO Binding mesosphere.marathon.api.MarathonExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,717] INFO Using HA and therefore offering leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,719] INFO Will do leader election through localhost:2181 (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,775] WARN session timeout [10000] is less than connection timeout [15000] (org.apache.curator.CuratorZookeeperClient:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,840] INFO Starting (org.apache.curator.framework.imps.CuratorFrameworkImpl:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,857] INFO Client environment:zookeeper.version=3.5.0-alpha-1615249, built on 08/01/2014 22:13 GMT (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,863] INFO Client environment:host.name=openshift-installer-native-docker-compose (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,864] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,864] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,865] INFO Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,866] INFO Client environment:java.class.path=./bin/../target/marathon-assembly-1.3.6.jar (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,866] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,867] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,867] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,868] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,868] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,869] INFO Client environment:os.version=4.2.0-42-generic (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,869] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,870] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,870] INFO Client environment:user.dir=/marathon (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,870] INFO Client environment:os.memory.free=128MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,871] INFO Client environment:os.memory.max=880MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,871] INFO Client environment:os.memory.total=165MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,872] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@2fba82d1 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,922] INFO Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:05:37,939] INFO Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:05:37,971] INFO Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x158f4c9e1f90006, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:05:37,978] INFO State change: CONNECTED (org.apache.curator.framework.state.ConnectionStateManager:ForkJoinPool-2-worker-13-EventThread)
[2016-12-12 21:05:38,049] INFO Elected (LeaderLatchListener Interface) (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1)
[2016-12-12 21:05:38,052] INFO As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:05:38,075] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@2c6c37f7 (org.apache.zookeeper.ZooKeeper:pool-1-thread-1)
[2016-12-12 21:05:38,082] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:05:38,084] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:05:38,088] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f90007, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:05:38,174] INFO Migration successfully applied for version Version(1, 3, 6) (mesosphere.marathon.state.Migration:ForkJoinPool-2-worker-7)
[2016-12-12 21:05:38,176] INFO Call preDriverStarts callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCache(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(events:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:05:38,188] INFO Finished preDriverStarts callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:05:38,195] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-12 21:05:38,200] INFO Started. Will remain interested in offer reconciliation for 17500 milliseconds when needed. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-12 21:05:38,208] INFO started RateLimiterActor (mesosphere.marathon.core.launchqueue.impl.RateLimiterActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:05:38,209] INFO ExpungeOverdueLostTasksActor has started (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:05:38,216] INFO TaskTrackerActor is starting. Task loading initiated. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,222] INFO Create new Scheduler Driver with frameworkId: None and scheduler mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$EnhancerByGuice$$c730b9b8@46383a78 (mesosphere.marathon.MarathonSchedulerDriver$:pool-1-thread-1)
[2016-12-12 21:05:38,224] INFO All actors active:
* Actor[akka://marathon/user/taskKillServiceActor#-984511042]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#671912224]
* Actor[akka://marathon/user/killOverdueStagedTasks#-698941106]
* Actor[akka://marathon/user/groupManager#-958454684]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1917290502]
* Actor[akka://marathon/user/offerMatcherManager#2056148696]
* Actor[akka://marathon/user/expungeOverdueLostTasks#-308686880]
* Actor[akka://marathon/user/rateLimiter#-1808696027]
* Actor[akka://marathon/user/offersWantedForReconciliation#362759878]
* Actor[akka://marathon/user/launchQueue#-1787605975]
* Actor[akka://marathon/user/taskTracker#1265821735] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-4)
[2016-12-12 21:05:38,230] INFO About to load 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5)
[2016-12-12 21:05:38,239] INFO Loaded 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5)
[2016-12-12 21:05:38,257] INFO Task loading complete. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,284] INFO interested in offers for reservation reconciliation because of becoming leader (until 2016-12-12T21:05:55.712Z) (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-12 21:05:38,322] INFO Binding mesosphere.marathon.api.v2.AppsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,333] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,335] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,338] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,339] INFO => Schedule next revive at 2016-12-12T21:05:43.333Z in 4995 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,341] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,356] INFO Binding mesosphere.marathon.api.v2.TasksResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,359] INFO Binding mesosphere.marathon.api.v2.EventSubscriptionsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,361] INFO Event notification disabled. (mesosphere.marathon.core.event.EventModule:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,375] INFO Binding mesosphere.marathon.api.v2.QueueResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,385] INFO Binding mesosphere.marathon.api.v2.GroupsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,390] INFO Binding mesosphere.marathon.api.v2.InfoResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,396] INFO Binding mesosphere.marathon.api.v2.LeaderResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,400] INFO Binding mesosphere.marathon.api.v2.DeploymentsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,418] INFO Binding mesosphere.marathon.api.v2.ArtifactsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,423] INFO Binding mesosphere.marathon.api.v2.SchemaResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,428] INFO Binding mesosphere.marathon.api.v2.PluginsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,429] INFO Loading plugins implementing 'mesosphere.marathon.plugin.http.HttpRequestHandler' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,431] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,452] INFO Started o.e.j.s.ServletContextHandler@33ffb91e{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1212 21:05:38.488497 223 sched.cpp:1697]
**************************************************
Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address.
**************************************************
[2016-12-12 21:05:38,494] INFO Started ServerConnector@23580056{HTTP/1.1,[http/1.1]}{0.0.0.0:7070} (org.eclipse.jetty.server.ServerConnector:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,496] INFO Started @4237ms (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,497] INFO All services up and running. (mesosphere.marathon.Main$:main)
I1212 21:05:38.524034 223 leveldb.cpp:174] Opened db in 33.764878ms
I1212 21:05:38.525073 223 leveldb.cpp:181] Compacted db in 985356ns
I1212 21:05:38.525146 223 leveldb.cpp:196] Created db iterator in 9599ns
I1212 21:05:38.525156 223 leveldb.cpp:202] Seeked to beginning of db in 501ns
I1212 21:05:38.525161 223 leveldb.cpp:271] Iterated through 0 keys in the db in 202ns
I1212 21:05:38.525203 223 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1212 21:05:38.526582 229 recover.cpp:451] Starting replica recovery
I1212 21:05:38.527250 229 recover.cpp:477] Replica is in EMPTY status
I1212 21:05:38.528532 229 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (4)@127.0.0.1:33899
I1212 21:05:38.528669 229 recover.cpp:197] Received a recover response from a replica in EMPTY status
I1212 21:05:38.528774 229 recover.cpp:568] Updating replica status to STARTING
I1212 21:05:38.529911 232 master.cpp:375] Master e999af40-d59e-4d95-a77a-05403042ca4f (openshift-installer-native-docker-compose) started on 127.0.0.1:33899
I1212 21:05:38.530508 232 master.cpp:377] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/tmp/mesos/local/X38lNg" --zk_session_timeout="10secs"
W1212 21:05:38.531312 232 master.cpp:380]
**************************************************
Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address.
**************************************************
I1212 21:05:38.532498 232 master.cpp:429] Master allowing unauthenticated frameworks to register
I1212 21:05:38.532708 232 master.cpp:443] Master allowing unauthenticated agents to register
I1212 21:05:38.532913 232 master.cpp:457] Master allowing HTTP frameworks to register without authentication
I1212 21:05:38.533169 232 master.cpp:499] Using default 'crammd5' authenticator
W1212 21:05:38.533359 232 authenticator.cpp:512] No credentials provided, authentication requests will be refused
I1212 21:05:38.533457 232 authenticator.cpp:519] Initializing server SASL
I1212 21:05:38.532569 229 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 3.159782ms
I1212 21:05:38.534039 229 replica.cpp:320] Persisted replica status to STARTING
I1212 21:05:38.534257 229 recover.cpp:477] Replica is in STARTING status
I1212 21:05:38.534668 229 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (5)@127.0.0.1:33899
I1212 21:05:38.535151 229 recover.cpp:197] Received a recover response from a replica in STARTING status
I1212 21:05:38.540293 229 recover.cpp:568] Updating replica status to VOTING
Failed to start a local cluster while loading agent flags from the environment: Flag 'work_dir' is required, but it was not provided
(container):/marathon#

 

This error message is described in the documentation of the used Docker image as follows:

“Note: Currently the Docker container fails due to strange behavior from the latest Mesos version. There will be an error about work_dir that is still unresolved”

Unfortunately they did not provide any workaround or solution for the problem.

Resolution (Workaround):

I have found a workaround here: “It seems like explicitly setting the Mesos work directory by adding ENV MESOS_WORK_DIR /var/lib/mesos to the Dockerfile resolves the issue.”

I.e., we need to set the MESOS_WORK_DIR variable:

(dockerhost)$ sudo docker run -it --name marathon --rm --net=host -e MESOS_WORK_DIR=/var/lib/mesos --entrypoint=bash mesosphere/marathon
(container)# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[2016-12-13 13:55:36,831] INFO Starting Marathon 1.3.6/unknown with --master local --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main)
[2016-12-13 13:55:37,118] WARN Method [public javax.ws.rs.core.Response mesosphere.marathon.api.MarathonExceptionMapper.toResponse(java.lang.Throwable)] is synthetic and is being intercepted by [mesosphere.marathon.DebugModule$MetricsBehavior@985696]. This could indicate a bug. The method may be intercepted twice, or may not be intercepted at all. (com.google.inject.internal.ProxyFactory:main)
[2016-12-13 13:55:37,371] INFO Logging initialized @1806ms (org.eclipse.jetty.util.log:main)
[2016-12-13 13:55:37,864] INFO Slf4jLogger started (akka.event.slf4j.Slf4jLogger:marathon-akka.actor.default-dispatcher-2)
[2016-12-13 13:55:38,059] INFO Started TaskTrackerUpdateStepsProcessorImpl with steps:
* continueOnError(notifyHealthCheckManager)
* continueOnError(notifyRateLimiter)
* continueOnError(notifyLaunchQueue)
* continueOnError(emitUpdate)
* continueOnError(postTaskStatusEvent)
* continueOnError(scaleApp) (mesosphere.marathon.core.task.tracker.impl.TaskTrackerUpdateStepProcessorImpl:main)
[2016-12-13 13:55:38,119] INFO Calling reviveOffers is enabled. Use --disable_revive_offers_for_new_apps to disable. (mesosphere.marathon.core.flow.FlowModule:main)
[2016-12-13 13:55:38,196] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authenticator' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 13:55:38,200] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 13:55:38,204] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authorizer' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 13:55:38,204] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 13:55:38,208] INFO Started status update processor (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$ca7c928f:main)
[2016-12-13 13:55:38,361] INFO All actors suspended:
* Actor[akka://marathon/user/launchQueue#872625406]
* Actor[akka://marathon/user/offerMatcherManager#1742672554]
* Actor[akka://marathon/user/reviveOffersWhenWanted#-965567836]
* Actor[akka://marathon/user/expungeOverdueLostTasks#-1682888773]
* Actor[akka://marathon/user/offersWantedForReconciliation#-1611878367]
* Actor[akka://marathon/user/killOverdueStagedTasks#-1736387483]
* Actor[akka://marathon/user/taskTracker#-1718074132]
* Actor[akka://marathon/user/groupManager#1707955637]
* Actor[akka://marathon/user/taskKillServiceActor#-2061381059]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#-786726035]
* Actor[akka://marathon/user/rateLimiter#-27927539] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 13:55:38,466] INFO Adding HTTP support. (mesosphere.chaos.http.HttpModule:main)
[2016-12-13 13:55:38,466] INFO No HTTPS support configured. (mesosphere.chaos.http.HttpModule:main)
[2016-12-13 13:55:38,518] INFO Starting up (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-13 13:55:38,519] INFO Beginning run (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-13 13:55:38,522] INFO Will offer leadership after 500 milliseconds backoff (mesosphere.marathon.core.election.impl.CuratorElectionService:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-13 13:55:38,525] INFO jetty-9.3.z-SNAPSHOT (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,555] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-4)
[2016-12-13 13:55:38,727] INFO Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,730] INFO Registering mesosphere.marathon.api.MarathonExceptionMapper as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,730] INFO Registering mesosphere.marathon.api.v2.AppsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,730] INFO Registering mesosphere.marathon.api.v2.TasksResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,730] INFO Registering mesosphere.marathon.api.v2.EventSubscriptionsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,730] INFO Registering mesosphere.marathon.api.v2.QueueResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.GroupsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.InfoResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.LeaderResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.DeploymentsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.ArtifactsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,731] INFO Registering mesosphere.marathon.api.v2.SchemaResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,733] INFO Registering mesosphere.marathon.api.v2.PluginsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,737] INFO Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' (com.sun.jersey.server.impl.application.WebApplicationImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,785] INFO Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:38,802] INFO Binding mesosphere.marathon.api.MarathonExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,057] INFO Using HA and therefore offering leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,060] INFO Will do leader election through localhost:2181 (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,119] WARN session timeout [10000] is less than connection timeout [15000] (org.apache.curator.CuratorZookeeperClient:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,136] INFO Starting (org.apache.curator.framework.imps.CuratorFrameworkImpl:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,153] INFO Client environment:zookeeper.version=3.5.0-alpha-1615249, built on 08/01/2014 22:13 GMT (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:host.name=openshift-installer-native-docker-compose (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:java.class.path=./bin/../target/marathon-assembly-1.3.6.jar (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:os.version=4.2.0-42-generic (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:user.dir=/marathon (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:os.memory.free=118MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:os.memory.max=880MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Client environment:os.memory.total=159MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,154] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@3da7f0be (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,183] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-13 13:55:39,195] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-13 13:55:39,230] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f9000d, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-13 13:55:39,236] INFO State change: CONNECTED (org.apache.curator.framework.state.ConnectionStateManager:ForkJoinPool-2-worker-13-EventThread)
[2016-12-13 13:55:39,309] INFO Binding mesosphere.marathon.api.v2.AppsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,328] INFO Binding mesosphere.marathon.api.v2.TasksResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,331] INFO Binding mesosphere.marathon.api.v2.EventSubscriptionsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,331] INFO Event notification disabled. (mesosphere.marathon.core.event.EventModule:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,334] INFO Elected (LeaderLatchListener Interface) (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1)
[2016-12-13 13:55:39,338] INFO As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-13 13:55:39,341] INFO Binding mesosphere.marathon.api.v2.QueueResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,351] INFO Binding mesosphere.marathon.api.v2.GroupsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,356] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@44a67b18 (org.apache.zookeeper.ZooKeeper:pool-1-thread-1)
[2016-12-13 13:55:39,357] INFO Binding mesosphere.marathon.api.v2.InfoResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,360] INFO Binding mesosphere.marathon.api.v2.LeaderResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,364] INFO Binding mesosphere.marathon.api.v2.DeploymentsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,369] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-13 13:55:39,369] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-13 13:55:39,374] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f9000e, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-13 13:55:39,378] INFO Binding mesosphere.marathon.api.v2.ArtifactsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,380] INFO Binding mesosphere.marathon.api.v2.SchemaResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,391] INFO Binding mesosphere.marathon.api.v2.PluginsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,392] INFO Loading plugins implementing 'mesosphere.marathon.plugin.http.HttpRequestHandler' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,392] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,407] INFO Started o.e.j.s.ServletContextHandler@72f0fd27{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,419] INFO Started ServerConnector@3c2f8cd8{HTTP/1.1,[http/1.1]}{0.0.0.0:7070} (org.eclipse.jetty.server.ServerConnector:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,419] INFO Started @3855ms (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 13:55:39,420] INFO All services up and running. (mesosphere.marathon.Main$:main)
[2016-12-13 13:55:39,464] INFO Migration successfully applied for version Version(1, 3, 6) (mesosphere.marathon.state.Migration:ForkJoinPool-2-worker-5)
[2016-12-13 13:55:39,464] INFO Call preDriverStarts callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCache(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(events:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-13 13:55:39,482] INFO Finished preDriverStarts callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-13 13:55:39,485] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-13 13:55:39,487] INFO Started. Will remain interested in offer reconciliation for 17500 milliseconds when needed. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-13 13:55:39,498] INFO started RateLimiterActor (mesosphere.marathon.core.launchqueue.impl.RateLimiterActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 13:55:39,501] INFO ExpungeOverdueLostTasksActor has started (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 13:55:39,506] INFO All actors active:
* Actor[akka://marathon/user/launchQueue#872625406]
* Actor[akka://marathon/user/offerMatcherManager#1742672554]
* Actor[akka://marathon/user/reviveOffersWhenWanted#-965567836]
* Actor[akka://marathon/user/expungeOverdueLostTasks#-1682888773]
* Actor[akka://marathon/user/offersWantedForReconciliation#-1611878367]
* Actor[akka://marathon/user/killOverdueStagedTasks#-1736387483]
* Actor[akka://marathon/user/taskTracker#-1718074132]
* Actor[akka://marathon/user/groupManager#1707955637]
* Actor[akka://marathon/user/taskKillServiceActor#-2061381059]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#-786726035]
* Actor[akka://marathon/user/rateLimiter#-27927539] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 13:55:39,509] INFO TaskTrackerActor is starting. Task loading initiated. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-13 13:55:39,522] INFO About to load 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-13)
[2016-12-13 13:55:39,528] INFO Loaded 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5)
[2016-12-13 13:55:39,537] INFO Task loading complete. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 13:55:39,568] INFO interested in offers for reservation reconciliation because of becoming leader (until 2016-12-13T13:55:57.005Z) (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-13 13:55:39,589] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 13:55:39,589] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 13:55:39,596] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 13:55:39,597] INFO => Schedule next revive at 2016-12-13T13:55:44.589Z in 4993 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 13:55:39,655] INFO Create new Scheduler Driver with frameworkId: Some(value: "44a35e16-dc32-4f91-afac-33dfff498944-0000"
) and scheduler mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$EnhancerByGuice$$c730b9b8@4eb166a1 (mesosphere.marathon.MarathonSchedulerDriver$:pool-1-thread-1)
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1213 13:55:39.779234 64 sched.cpp:1697]
**************************************************
Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address.
**************************************************
I1213 13:55:39.809289 64 leveldb.cpp:174] Opened db in 28.534471ms
I1213 13:55:39.811184 64 leveldb.cpp:181] Compacted db in 1.79725ms
I1213 13:55:39.811257 64 leveldb.cpp:196] Created db iterator in 11125ns
I1213 13:55:39.811283 64 leveldb.cpp:202] Seeked to beginning of db in 670ns
I1213 13:55:39.811290 64 leveldb.cpp:271] Iterated through 0 keys in the db in 197ns
I1213 13:55:39.811357 64 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1213 13:55:39.812511 77 recover.cpp:451] Starting replica recovery
I1213 13:55:39.812706 77 recover.cpp:477] Replica is in EMPTY status
I1213 13:55:39.813297 77 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (3)@127.0.0.1:40102
I1213 13:55:39.813491 77 recover.cpp:197] Received a recover response from a replica in EMPTY status
I1213 13:55:39.813727 77 recover.cpp:568] Updating replica status to STARTING
I1213 13:55:39.815317 73 master.cpp:375] Master d4536c66-4886-4c88-9a02-7dac4b5f2e6d (openshift-installer-native-docker-compose) started on 127.0.0.1:40102
I1213 13:55:39.815476 73 master.cpp:377] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk_session_timeout="10secs"
W1213 13:55:39.815767 73 master.cpp:380]
**************************************************
Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address.
**************************************************
I1213 13:55:39.815810 73 master.cpp:429] Master allowing unauthenticated frameworks to register
I1213 13:55:39.815817 73 master.cpp:443] Master allowing unauthenticated agents to register
I1213 13:55:39.815824 73 master.cpp:457] Master allowing HTTP frameworks to register without authentication
I1213 13:55:39.815835 73 master.cpp:499] Using default 'crammd5' authenticator
W1213 13:55:39.815845 73 authenticator.cpp:512] No credentials provided, authentication requests will be refused
I1213 13:55:39.815851 73 authenticator.cpp:519] Initializing server SASL
I1213 13:55:39.817608 77 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 3.763381ms
I1213 13:55:39.817780 77 replica.cpp:320] Persisted replica status to STARTING
I1213 13:55:39.817886 77 recover.cpp:477] Replica is in STARTING status
I1213 13:55:39.818063 77 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (7)@127.0.0.1:40102
I1213 13:55:39.818200 77 recover.cpp:197] Received a recover response from a replica in STARTING status
I1213 13:55:39.818279 77 recover.cpp:568] Updating replica status to VOTING
I1213 13:55:39.819156 77 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 782170ns
I1213 13:55:39.819229 77 replica.cpp:320] Persisted replica status to VOTING
I1213 13:55:39.819284 77 recover.cpp:582] Successfully joined the Paxos group
I1213 13:55:39.819321 77 recover.cpp:466] Recover process terminated
I1213 13:55:39.820673 64 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
I1213 13:55:39.823923 73 master.cpp:1847] The newly elected leader is master@127.0.0.1:40102 with id d4536c66-4886-4c88-9a02-7dac4b5f2e6d
I1213 13:55:39.823953 73 master.cpp:1860] Elected as the leading master!
I1213 13:55:39.823961 73 master.cpp:1547] Recovering from registrar
I1213 13:55:39.824046 73 registrar.cpp:332] Recovering registrar
I1213 13:55:39.824394 73 log.cpp:553] Attempting to start the writer
I1213 13:55:39.824522 73 slave.cpp:198] Agent started on 1)@127.0.0.1:40102
I1213 13:55:39.824532 73 slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="true" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/usr/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos/0"
W1213 13:55:39.824811 73 slave.cpp:202]
**************************************************
Agent bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address.
**************************************************
I1213 13:55:39.825176 73 slave.cpp:519] Agent resources: cpus(*):2; mem(*):2928; disk(*):34068; ports(*):[31000-32000]
I1213 13:55:39.825285 73 slave.cpp:527] Agent attributes: [ ]
I1213 13:55:39.825294 73 slave.cpp:532] Agent hostname: openshift-installer-native-docker-compose
I1213 13:55:39.825721 73 state.cpp:57] Recovering state from '/var/lib/mesos/0/meta'
I1213 13:55:39.826089 73 status_update_manager.cpp:200] Recovering status update manager
I1213 13:55:39.826179 73 containerizer.cpp:522] Recovering containerizer
I1213 13:55:39.826676 72 replica.cpp:493] Replica received implicit promise request from (21)@127.0.0.1:40102 with proposal 1
I1213 13:55:39.826875 74 provisioner.cpp:253] Provisioner recovery complete
I1213 13:55:39.828560 74 slave.cpp:4782] Finished recovery
I1213 13:55:39.829459 74 slave.cpp:895] New master detected at master@127.0.0.1:40102
I1213 13:55:39.829998 74 slave.cpp:916] No credentials provided. Attempting to register without authentication
I1213 13:55:39.829576 72 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.862869ms
I1213 13:55:39.830559 72 replica.cpp:342] Persisted promised to 1
I1213 13:55:39.830961 72 coordinator.cpp:238] Coordinator attempting to fill missing positions
I1213 13:55:39.829550 75 status_update_manager.cpp:174] Pausing sending status updates
I1213 13:55:39.830510 74 slave.cpp:927] Detecting new master
I1213 13:55:39.831506 77 replica.cpp:388] Replica received explicit promise request from (23)@127.0.0.1:40102 for position 0 with proposal 2
I1213 13:55:39.832626 77 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 920616ns
I1213 13:55:39.832670 77 replica.cpp:712] Persisted action at 0
I1213 13:55:39.833004 75 replica.cpp:537] Replica received write request for position 0 from (24)@127.0.0.1:40102
I1213 13:55:39.834054 75 leveldb.cpp:436] Reading position from leveldb took 40734ns
I1213 13:55:39.835325 75 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 722995ns
I1213 13:55:39.835367 75 replica.cpp:712] Persisted action at 0
I1213 13:55:39.835605 75 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0
I1213 13:55:39.836382 75 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 699339ns
I1213 13:55:39.836410 75 replica.cpp:712] Persisted action at 0
I1213 13:55:39.836419 75 replica.cpp:697] Replica learned NOP action at position 0
I1213 13:55:39.836597 75 log.cpp:569] Writer started with ending position 0
I1213 13:55:39.836915 75 leveldb.cpp:436] Reading position from leveldb took 12450ns
I1213 13:55:39.838395 77 registrar.cpp:365] Successfully fetched the registry (0B) in 14.328064ms
I1213 13:55:39.838459 77 registrar.cpp:464] Applied 1 operations in 4626ns; attempting to update the 'registry'
I1213 13:55:39.840064 77 log.cpp:577] Attempting to append 226 bytes to the log
I1213 13:55:39.840198 77 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1
I1213 13:55:39.840348 77 replica.cpp:537] Replica received write request for position 1 from (25)@127.0.0.1:40102
I1213 13:55:39.842838 77 leveldb.cpp:341] Persisting action (245 bytes) to leveldb took 2.394634ms
I1213 13:55:39.848156 77 replica.cpp:712] Persisted action at 1
I1213 13:55:39.848462 77 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0
I1213 13:55:39.849424 77 leveldb.cpp:341] Persisting action (247 bytes) to leveldb took 911778ns
I1213 13:55:39.849508 77 replica.cpp:712] Persisted action at 1
I1213 13:55:39.849520 77 replica.cpp:697] Replica learned APPEND action at position 1
I1213 13:55:39.849964 77 registrar.cpp:509] Successfully updated the 'registry' in 11.484928ms
I1213 13:55:39.850054 77 registrar.cpp:395] Successfully recovered registrar
I1213 13:55:39.850160 77 master.cpp:1655] Recovered 0 agents from the Registry (187B) ; allowing 10mins for agents to re-register
I1213 13:55:39.850020 75 log.cpp:596] Attempting to truncate the log to 1
I1213 13:55:39.850306 75 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2
I1213 13:55:39.850533 75 replica.cpp:537] Replica received write request for position 2 from (26)@127.0.0.1:40102
I1213 13:55:39.851971 75 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.375167ms
I1213 13:55:39.852044 75 replica.cpp:712] Persisted action at 2
I1213 13:55:39.852219 75 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0
I1213 13:55:39.853096 75 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 825996ns
I1213 13:55:39.853198 75 leveldb.cpp:399] Deleting ~1 keys from leveldb took 55808ns
I1213 13:55:39.853235 75 replica.cpp:712] Persisted action at 2
I1213 13:55:39.853245 75 replica.cpp:697] Replica learned TRUNCATE action at position 2
I1213 13:55:39.855962 68 sched.cpp:226] Version: 1.0.1
I1213 13:55:39.857800 75 sched.cpp:330] New master detected at master@127.0.0.1:40102
I1213 13:55:39.857892 75 sched.cpp:341] No credentials provided. Attempting to register without authentication
I1213 13:55:39.858000 75 master.cpp:2424] Received SUBSCRIBE call for framework 'marathon' at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
I1213 13:55:39.858047 75 master.cpp:2500] Subscribing framework marathon with checkpointing enabled and capabilities [ ]
I1213 13:55:39.858240 75 hierarchical.cpp:271] Added framework 44a35e16-dc32-4f91-afac-33dfff498944-0000
I1213 13:55:39.858304 74 sched.cpp:743] Framework registered with 44a35e16-dc32-4f91-afac-33dfff498944-0000
[2016-12-13 13:55:39,867] INFO Starting scheduler actor (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 13:55:39,875] INFO Scheduler actor ready (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 13:55:39,876] INFO Became active. Accepting event streaming requests. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 13:55:39,876] INFO Reset offerLeadership backoff (mesosphere.marathon.core.election.impl.ExponentialBackoff:pool-1-thread-1)
[2016-12-13 13:55:39,901] INFO Creating tombstone for old twitter commons leader election (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1)
[2016-12-13 13:55:39,908] INFO Registered as 44a35e16-dc32-4f91-afac-33dfff498944-0000 to master 'd4536c66-4886-4c88-9a02-7dac4b5f2e6d' (mesosphere.marathon.MarathonScheduler$$EnhancerByGuice$$1ef061b0:Thread-14)
[2016-12-13 13:55:39,908] INFO Store framework id: value: "44a35e16-dc32-4f91-afac-33dfff498944-0000"
 (mesosphere.util.state.FrameworkIdUtil:Thread-14)
[2016-12-13 13:55:39,925] INFO Received reviveOffers notification: SchedulerRegisteredEvent (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-4)
I1213 13:55:39.948446 72 master.cpp:4550] Registering agent at slave(1)@127.0.0.1:40102 (openshift-installer-native-docker-compose) with id d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0
I1213 13:55:39.948563 72 registrar.cpp:464] Applied 1 operations in 10900ns; attempting to update the 'registry'
I1213 13:55:39.949964 77 log.cpp:577] Attempting to append 424 bytes to the log
I1213 13:55:39.950723 77 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3
I1213 13:55:39.951776 76 replica.cpp:537] Replica received write request for position 3 from (27)@127.0.0.1:40102
I1213 13:55:39.952749 76 leveldb.cpp:341] Persisting action (443 bytes) to leveldb took 891144ns
I1213 13:55:39.952900 76 replica.cpp:712] Persisted action at 3
I1213 13:55:39.953106 76 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0
I1213 13:55:39.953799 76 leveldb.cpp:341] Persisting action (445 bytes) to leveldb took 501557ns
I1213 13:55:39.953876 76 replica.cpp:712] Persisted action at 3
I1213 13:55:39.953902 76 replica.cpp:697] Replica learned APPEND action at position 3
I1213 13:55:39.954263 76 registrar.cpp:509] Successfully updated the 'registry' in 5.665024ms
I1213 13:55:39.954856 71 hierarchical.cpp:478] Added agent d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0 (openshift-installer-native-docker-compose) with cpus(*):2; mem(*):2928; disk(*):34068; ports(*):[31000-32000] (allocated: )
I1213 13:55:39.954433 75 log.cpp:596] Attempting to truncate the log to 3
I1213 13:55:39.955282 71 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4
I1213 13:55:39.955612 71 replica.cpp:537] Replica received write request for position 4 from (28)@127.0.0.1:40102
I1213 13:55:39.955880 76 master.cpp:4619] Registered agent d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0 at slave(1)@127.0.0.1:40102 (openshift-installer-native-docker-compose) with cpus(*):2; mem(*):2928; disk(*):34068; ports(*):[31000-32000]
I1213 13:55:39.956233 76 master.cpp:5725] Sending 1 offers to framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
I1213 13:55:39.956048 75 slave.cpp:1095] Registered with master master@127.0.0.1:40102; given agent ID d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0
I1213 13:55:39.957394 73 status_update_manager.cpp:181] Resuming sending status updates
I1213 13:55:39.957774 75 slave.cpp:1155] Forwarding total oversubscribed resources
I1213 13:55:39.958096 75 master.cpp:5002] Received update of agent d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0 at slave(1)@127.0.0.1:40102 (openshift-installer-native-docker-compose) with total oversubscribed resources
I1213 13:55:39.958235 75 hierarchical.cpp:542] Agent d4536c66-4886-4c88-9a02-7dac4b5f2e6d-S0 (openshift-installer-native-docker-compose) updated with oversubscribed resources (total: cpus(*):2; mem(*):2928; disk(*):34068; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):2928; disk(*):34068; ports(*):[31000-32000])
I1213 13:55:39.956370 71 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 663264ns
I1213 13:55:39.958457 71 replica.cpp:712] Persisted action at 4
I1213 13:55:39.958593 71 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0
I1213 13:55:39.959283 71 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 622339ns
I1213 13:55:39.959475 71 leveldb.cpp:399] Deleting ~2 keys from leveldb took 25791ns
I1213 13:55:39.959547 71 replica.cpp:712] Persisted action at 4
I1213 13:55:39.959604 71 replica.cpp:697] Replica learned TRUNCATE action at position 4
I1213 13:55:40.008047 73 master.cpp:3951] Processing DECLINE call for offers: [ d4536c66-4886-4c88-9a02-7dac4b5f2e6d-O0 ] for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
[2016-12-13 13:55:44,622] INFO Received TimedCheck (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-13 13:55:44,622] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-2)
I1213 13:55:44.624449 73 master.cpp:4046] Processing REVIVE call for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
I1213 13:55:44.626070 73 hierarchical.cpp:1022] Removed offer filters for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000
I1213 13:55:44.626485 73 master.cpp:5725] Sending 1 offers to framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
[2016-12-13 13:55:44,627] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-7)
[2016-12-13 13:55:44,630] INFO => Schedule next revive at 2016-12-13T13:55:49.621Z in 4995 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-7)
I1213 13:55:44.638694 71 master.cpp:3951] Processing DECLINE call for offers: [ d4536c66-4886-4c88-9a02-7dac4b5f2e6d-O1 ] for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
I1213 13:55:49.643371 71 master.cpp:4046] Processing REVIVE call for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
[2016-12-13 13:55:49,641] INFO Received TimedCheck (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-13 13:55:49,642] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-13 13:55:49,642] INFO 1 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-13 13:55:49,643] INFO => Schedule next revive at 2016-12-13T13:55:54.640Z in 4999 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-9)
I1213 13:55:49.649920 71 hierarchical.cpp:1022] Removed offer filters for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000
I1213 13:55:49.652896 71 master.cpp:5725] Sending 1 offers to framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
I1213 13:55:49.664896 75 master.cpp:3951] Processing DECLINE call for offers: [ d4536c66-4886-4c88-9a02-7dac4b5f2e6d-O2 ] for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
I1213 13:55:54.672166 77 master.cpp:4046] Processing REVIVE call for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
I1213 13:55:54.672760 77 hierarchical.cpp:1022] Removed offer filters for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000
[2016-12-13 13:55:54,671] INFO Received TimedCheck (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-13 13:55:54,672] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-2)
I1213 13:55:54.675570 77 master.cpp:5725] Sending 1 offers to framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
I1213 13:55:54.688268 75 master.cpp:3951] Processing DECLINE call for offers: [ d4536c66-4886-4c88-9a02-7dac4b5f2e6d-O3 ] for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
[2016-12-13 13:55:54,848] INFO initiate task reconciliation (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 13:55:54,859] INFO Requesting task reconciliation with the Mesos master (mesosphere.marathon.SchedulerActions:ForkJoinPool-2-worker-7)
I1213 13:55:54.862715 77 master.cpp:5398] Performing implicit task state reconciliation for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
[2016-12-13 13:55:54,862] INFO task reconciliation has finished (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-13 13:55:54,871] INFO Message [mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from Actor[akka://marathon/user/MarathonScheduler/$a#-694294900] to Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. (akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 13:55:57,020] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 13:55:57,024] INFO Received offers NOT WANTED notification, canceling 0 revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 13:55:57,026] INFO => Suppress offers NOW (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6)
I1213 13:55:57.036034 74 master.cpp:2882] Processing SUPPRESS call for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102
I1213 13:55:57.036320 74 hierarchical.cpp:1002] Suppressed offers for framework 44a35e16-dc32-4f91-afac-33dfff498944-0000
I1213 13:56:09.705217 73 master.cpp:5463] Performing explicit task state reconciliation for 1 tasks of framework 44a35e16-dc32-4f91-afac-33dfff498944-0000 (marathon) at scheduler-a6e2761a-1f40-4e86-bc73-63171f1435aa@127.0.0.1:40102

This has resolved the issue.

Summary

In this blog post we have performed following tasks:

  1. Started a Marathon + Mesos environment with ZooKeeper, Mesos Master, Mesos Agent (Slave) and Mesosphere Marathon.
  2. Started a “Hello World” application via Marathon in a default Mesos container.
  3. Tested the resiliency by killing a process managed by Marathon.
  4. Started a “Hello World” application via Marathon in a Docker container.
  5. Showed, what happens, if all CPU resources are reserved.
  6. Described many errors we have observed in the Appendix.

Our observations were:

  • Different from our expectation for Mesos and Marathon to be production-ready, the Mesos applications do not seem to be very robust: we have seen critical errors like “suicides” of the Mesos Master, ZooKeeper and Mesosphere Marathon. I.e., monitoring and watchdogs are key when using Mesos + Marathon in a production environment. Apart from the critical errors, there are some minor problems like “connection lost” messages on the Mesos Web Portal, non-working “Application Restart” functions on Mesosphere Marathon. All problems are discussed in Appendix A.1 to A.6.
  • We have demonstrated in Step 8, that Mesos will not allow for over-subscription of resources per default. I.e., if you have 2 CPU cores and you are reserving 1 CPU core of 2 instances each, all CPU resources are occupied and no other task instance can be started (it will reside in “waiting” state). Mesos’ CPU reservation feels similar to a hard assignment of resources (since over-subscription is not supported per default, as we have seen above), but under the hood, Mesos does not apply hard limits on the CPU usage, unless the Mesos slave has --cgroups_enable_cfs  (CFS = Completely Fair Scheduler) enabled. See also the second answer on this StackOverflow question. For more information on over-subscription by Mesos, see this Apache Mesos Documentation page.
  • Mesos supports processes run in Mesos containers as well as processes run on Docker containers.
  • We could demonstrate that Mesosphere Marathon does, what it is designed for: it will make sure that an application is restarted, if it has crashed or has been killed.
5

Jenkins Part 2: automated Code Download and Build (Gradle + Maven)


2016-11-30-18_19_38

NEW (2017-01-02): you now can immediately start with part 2 (this post) without going through the steps of part 1. A corresponding pre-installed Docker image is provided.

NEW (2017-01-05): I have added the Maven build path with a fat executable JAR file.

In this blog post, we will perform our first automated job within Jenkins, the most popular open source tool for Continuous Integration and Continuous Deployment. Like in part 1 of this blog series, we will start Jenkins in a Docker container and define and run a first task:

  • download source code from GitHub
  • Create a lean executable JAR file via Gradle
  • Create a fat  executable JAR file via Maven

At the end of this session, we will have learned how to download GitHub code and build a Java program (executable JAR file) on a push of a button.

This blog post series is divided into following parts:

    • Part 1: Installation and Configuration of Jenkins, loading Plugins
    • Part 2 (this blog): Creating our first Jenkins job: GitHub download and Software build
    • Part 3: Periodic and automatically triggered Builds
    • Part 4 (planned): running automated tests

What is Jenkins?

Jenkins is the leading open source automation server mostly used in continuous integration and continuous deployment pipelines. Jenkins provides hundreds of plugins to support building, deploying and automating any project.

 

Jenkins build, test and deployment pipeline
Jenkins build, test and deployment pipeline

A typical workflow is visualized above: a developer checks in the code changes into the repository. Jenkins will detect the change, build (compile) the software, test it and prepare to deploy it on a system. Depending on the configuration, the deployment is triggered by a human person, or automatically performed by Jenkins.

For more information, see the introduction found in part 1 of this blog series.

Our first Jenkins Job

In this hello world, we will perform the first part of the typical build pipeline shown in the previous chapter, applied to a java hello world program:

  1. Upon a push of a button, download code from the GIT repository
  2. Build an executable jar from the java code

2016-12-09-15_29_08

Tools used

      • Vagrant 1.8.6
      • Virtualbox 5.0.20
      • Docker 1.12.1
      • Jenkins 2.32.1

Prerequisites:

      • Free DRAM overall >~ 4 GB
      • A Docker Host is available. Perform Step 1 in Part 1 of this blog series, if you are in need of a Docker host.

Step 1: Start Jenkins in interactive Terminal Mode

Step 1.1: Make sure the TCP Port is unused

Make sure that port 8080 is unused on the Docker host. If you were following all steps in part 1 of the series, you might need to stop cadvisor:

(dockerhost)$ sudo docker stop cadvisor

Alternatively, you can alter the ports option below from -p8080:8080 to -p9090:8080, as an example.

Step 1.2: Alternative (A): If you have followed all steps in part 1 of this blog series, start Jenkins from the official Jenkins image

I assume that you have followed all steps in part 1 of the blog series. In this case, you have created a jenkins_home directory on you Docker host. All popular plugins are installed and an Admin user has been created. If the Jenkins container is not started already, we start it with the jenkins_home Docker host volume mapped to /var/jenkins_home (as we have done in part 1 of this blog series):

(dockerhost)$ cd <path_to_jenkins_home>
(dockerhost:jenkins_home)$ sudo docker run -it --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins
Running from: /usr/share/jenkins/jenkins.war
...
--> setting agent port for jnlp
--> setting agent port for jnlp... done

Step 1.2: Alternative (B): You prefer to start from a pre-installed Jenkins Image

If you have not followed the steps in part 1, or if you prefer to start from a cleanly installed image, you also can start Jenkins from a cleanly installed Docker image like follows:

(dockerhost)$ sudo docker run -it --name jenkins -p8080:8080 -p50000:50000 oveits/jenkins_tutorial:part2_step1
Running from: /usr/share/jenkins/jenkins.war
...
--> setting agent port for jnlp
--> setting agent port for jnlp... done

Note that in alternative (B), all data is kept within /var/jenkins_home_local directory within the created container. For the case you want to save your work thereafter, I have not used the docker run remove option --rm. This will give you the chance to stop and start the same container later. In addition, you can create your own image from the stopped container in order to retain the work.

Step 2: Open Jenkins in a Browser

Now we want to connect to the Jenkins portal. For that, open a browser and open the URL

<your_jenkins_host>:8080

In our case, Jenkins is running in a container and we have mapped the container-port 8080 to the local port 8080 of the Docker host. On the Docker host, we can open the URL.

localhost:8080

Note: In case of Vagrant with VirtualBox, per default, there is only a NAT-based interface and you need to create port-forwarding for any port you want to reach from outside (also the local machine you are working on is to be considered as outside). In this case, we need to add an entry in the port forwarding list of VirtualBox:
2016-11-30-19_22_22-regel-fur-port-weiterleitung

We have created this entry in part 1 already, but I have seen that the entries were gone again, which seems to be a VirtualBox bug. I have added it again now.

Log in with the admin account we have created in the last session:

2016-12-09-10_24_00-jenkins

Step 3: Alternative (a): Prepare Gradle Usage

If you later prefer to use Gradle instead of (or in addition to) Maven, you need to prepare its first usage. For Maven preparation, see Step 3, Alternative (b) below.

On this wiki page about the Gradle plugin we find that we need to configure Gradle first:

Goto Jenkins -> Manage Jenkins -> Global Tool Configuration (available for Jenkins >2.0)

2016-12-09-11_34_55-manage-jenkins-jenkins

2016-12-09-11_35_26-global-tool-configuration-jenkins

Scroll down to Gradle -> Add Gradle

2017-01-02-14_27_26-global-tool-configuration-jenkins

-> choose Version (Gradle 3.2.1 in my case)

-> Add a name (“Gradle 3.2.1” in my case)

-> Save

Since we have checked “Install automatically” above, I expect that it will be installed automatically on first usage.

Step 3: Alternative (b): Prepare Maven Usage

If you later prefer to use Maven instead of (or in addition to) Gradle, you need to prepare its first usage. For Gradle preparation, see Step 3, Alternative (a) above.

Goto Jenkins -> Manage Jenkins -> Global Tool Configuration (available for Jenkins >2.0)

2016-12-09-11_34_55-manage-jenkins-jenkins

2016-12-09-11_35_26-global-tool-configuration-jenkins

Scroll down to Maven -> Add Maven

2017-01-02-14_32_46-global-tool-configuration-jenkins

-> choose Version (3.3.9 in my case)

-> Add a name (“Maven 3.3.9” in my case)

-> Save

Since we have checked “Install automatically” above, I expect that it will be installed automatically on first usage.

Step 4: Create a Job (Freestyle Project)

Step 4.1 Enter Name and Project Type

Either click on “create new jobs” or on New Item.

Now enter an Item name and click on Freestyle Project and OK:

2016-12-09-10_55_56-new-item-jenkins

Step 4.2: Specify GitHub Project

Check “GitHub project” and add the HTTPS GitHub URL. I have used a small Apache Camel project of mine that provides a simple restful file storage:

https://github.com/oveits/simple-restful-file-storage

2016-12-09-11_02_22-github-triggered-build-config-jenkins

Step 4.3 Configure Source Code Management

Under Source Code Management, we choose “Git” and specify the GitHub repository a second time. If it is public, we do not enter the credentials for now:

2016-12-09-14_53_46-github-triggered-build-config-jenkins

-> Click Apply

Note also that I have chosen a branch different from the master branch (“jenkinstest”). I have created this new branch in order to keep the master branch clean from any changes that might be needed to test Jenkins.

Step 4.4 Configure Build Triggers (postponed to part 3 of this blog post series)

For now, we will test only manual “build now” triggers, so we do not need to specify any build triggers. Build triggers will be tested in the next blog post.

Step 4.5: Alternative (a): Configure Gradle Build

Prerequisite: For creation of an executable JAR, the file build.gradle in the project root directory must be prepared. See e.g.

Here, we show how to build the project via Gradle. If you prefer to build via Maven, seee Step 4.5: Alternative (b): Configure Maven Build below.

Here, we scroll down to “Build” -> click 2017-01-02-14_54_07-github-triggered-build-config-jenkins -> “Invoke Gradle script”

2016-12-09-11_11_47-github-triggered-build-config-jenkins

Choose the Gradle version we have prepared in Step 3 and add the task “jar”:

2017-01-02-14_56_34-github-triggered-build-config-jenkins

Note that keeping the (Default) Gradle version will not work, as long as this Default has not been defined. See Appendix A for details.

The Gradle task “jar” will create our executable JAR file.

-> Click Save at the bottom left.

Step 4.5: Alternative (b): Configure Maven Build

Here, we show how to build the project via Maven. If you prefer to build via Gradle, seee Step 4.5: Alternative (a): Configure Graven Build above.

Here, we scroll down to “Build” -> click 2017-01-02-14_54_07-github-triggered-build-config-jenkins -> “Invoke Maven script”

2017-01-03-13_52_27-github-triggered-build-config-jenkins

Choose the Maven version we have prepared in Step 3 and specify the goal “package”:

2017-01-03-14_00_05-github-triggered-build-config-jenkins

Note that keeping the (Default) Maven version will not work, as long as this Default has not been defined.

The Maven goal “package”will build our JAR file.

-> Click Save at the bottom left.

Step 5: Test manually triggered Build

We can trigger a build manually via Jenkins -> drop-down right of “GitHub Triggered Build” -> Build Now.

2016-12-09-11_46_03-dashboard-jenkins

Click on #1 of the build history:

2017-01-02-14_59_15-github-triggered-build-jenkins-v2

then on Console Output:

2017-01-02-15_02_46-github-triggered-build-1-jenkins-v2

 

We can observe the console output e.g. by clicking on the build link in the build history, then clicking Console:

Output for step 5 in case of Gradle:

2017-01-02-15_04_25-github-triggered-build-1-console-jenkins

This may take a while (~11 min in my case with a 100Mbps Internet connection):

2017-01-02-15_48_17-github-triggered-build-1-console-jenkins

thumps_up_3

This was the first successful Jenkins triggered Git download and Gradle build.

Output for step 5 in case of Maven:

2017-01-03-14_05_03-github-triggered-build-4-console-jenkins

This may take a while (~8 min in my case with a 100Mbps Internet connection):

2017-01-03-14_42_22-github-triggered-build-4-console-jenkins

We can see in the output, that the JAR file was placed to

/var/jenkins_home_local/workspace/GitHub Triggered Build/target/camel-spring4-0.0.1-SNAPSHOT.jar

thumps_up_3

This was the first successful Jenkins triggered Git download and Maven build.

Step 5.2 (Optional): Measure Time Consumption for Gradle clean Build

Let us test again, whether the build is quicker the second time:

-> Back to Project

-> Configure

-> Add “clean” Gradle task before “jar” Gradle task:

2017-01-02-16_00_27-github-triggered-build-config-jenkins

-> Save

-> Build Now

-> Build Histrory -> current build

-> Console Output

Clean Build - Console Ouptut

This is showing that a clean build takes only ~6.4 sec, if all SW is downloaded already.

Step 6 (Gradle): Retrieve and start executable JAR File

For Maven, scroll down to Step 6 (Maven).

Let us see, where the executable jar file can be found:

For that, let us enter a bash session on the same Docker container:

(dockerhost)$ docker exec -it jenkins bash
jenkins@(container):/$ cd /var/jenkins_home_local/workspace/GitHub\ Triggered\ Build/build/libs/

In case you have started Jenkins with the jenkins image (Step 1.2, alternative (A)), the project will be found on

(container):$ cd /var/jenkins_home

In case you have started Jenkins with the oveits/jenkins_tutorial image (Step 1.2, alternative (B)), the project will be found on

(container):$ cd /var/jenkins_home_local

Then enter the Project. In my case “GitHub Triggered Build”

(container)$ cd 'GitHub Triggered Build'

The jar is found on the path defined in build.gradle file (default: build/libs).

(container)$ cd build/libs
(container)$ ls
GitHub Triggered Build-0.0.1-SNAPSHOT.jar   META-INF   lib   log4j.properties   properties   templates

Now let us start the executable file:

$ java -jar 'GitHub Triggered Build-0.0.1-SNAPSHOT.jar'
[                          main] MainSupport                    INFO  Apache Camel 2.16.0 starting
0 [main] INFO org.apache.camel.main.MainSupport  - Apache Camel 2.16.0 starting
[                          main] DefaultTypeConverter           INFO  Loaded 196 type converters
1706 [main] INFO org.apache.camel.impl.converter.DefaultTypeConverter  - Loaded 196 type converters
...
2762 [main] INFO org.apache.camel.spring.SpringCamelContext  - Total 15 routes, of which 15 is started.
[                          main] SpringCamelContext             INFO  Apache Camel 2.16.0 (CamelContext: camel-1) started in 1.046 seconds
2765 [main] INFO org.apache.camel.spring.SpringCamelContext  - Apache Camel 2.16.0 (CamelContext: camel-1) started in 1.046 seconds

Yes. perfect, it seems to work.
thumps_up_3

You can stop the Apache Camel process by pressing <CTRL>-C in the console.

Step 6 (Maven): Retrieve and start executable JAR File

For Gradle, scroll up to Step 6 (Gradle).

In case of Maven, the location of the created JAR file can be seen at the end of the build console output:

[INFO] Building jar: /var/jenkins_home_local/workspace/GitHub Triggered Build/target/camel-spring4-0.0.1-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 08:15 min
[INFO] Finished at: 2017-01-03T13:13:06+00:00
[INFO] Final Memory: 37M/263M
[INFO] ------------------------------------------------------------------------
Finished: SUCCESS

Let us test the executable JAR:

(dockerhost)$ docker exec -it jenkins bash
jenkins@(container):/$ java -jar '/var/jenkins_home_local/workspace/GitHub Triggered Build/target/camel-spring4-0.0.1-SNAPSHOT.jar'
no main manifest attribute, in /var/jenkins_home_local/workspace/GitHub Triggered Build/target/camel-spring4-0.0.1-SNAPSHOT.jar

Okay, the jar is not executable yet. Let us change the POM file to create an executable fat JAR as described on Mkyong’s page:

$ git clone <repository-URL>
$ cd <repository-Dir>
$ vi pom.xml

Add the following text to the plugins-part of pom.xml:

cloning the git repository, adding the text below to the plugins part, adding pom.xml to git, commit the git change and push the change:

      <!-- Maven Assembly Plugin -->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>2.4.1</version>
        <configuration>
          <!-- get all project dependencies -->
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
          <!-- MainClass in mainfest make a executable jar -->
          <archive>
            <manifest>
            <mainClass>de.oveits.simplerestfulfilestorage.MainApp</mainClass>
            </manifest>
          </archive>

        </configuration>
        <executions>
          <execution>
          <id>make-assembly</id>
          <!-- bind to the packaging phase -->
          <phase>package</phase>
          <goals>
            <goal>single</goal>
          </goals>
          </execution>
        </executions>
      </plugin>

For other projects, you will need to adapt the blue part above.

Then:

$ git add pom.xml
$ git commit -m "Maven creates fat executable JAR file now"
$ git push

Now again, let us build the project:

-> Build Now

-> 2017-01-05-01_15_51-github-triggered-build-jenkins

-> Console Output

Now there are many downloads, and it takes a while:

2017-01-05-01_18_33-github-triggered-build-6-console-jenkins

After ~2.5 minutes, it is ready:

2017-01-05-01_19_25-github-triggered-build-6-console-jenkins

And we can find and run the new fat JAR file on the Docker container:

(dockerhost)$ docker exec -it jenkins bash
(container) $ ls -ltr '/var/jenkins_home_local/workspace/GitHub Triggered Build/target'
total 57680
drwxr-xr-x 3 jenkins jenkins     4096 Jan  3 13:12 generated-sources
drwxr-xr-x 6 jenkins jenkins     4096 Jan  3 13:12 classes
drwxr-xr-x 3 jenkins jenkins     4096 Jan  3 13:12 generated-test-sources
drwxr-xr-x 3 jenkins jenkins     4096 Jan  3 13:12 test-classes
drwxr-xr-x 2 jenkins jenkins     4096 Jan  3 13:13 maven-archiver
-rw-r--r-- 1 jenkins jenkins    44657 Jan  4 23:58 camel-spring4-0.0.1-SNAPSHOT.jar
drwxr-xr-x 2 jenkins jenkins     4096 Jan  5 00:00 archive-tmp
-rw-r--r-- 1 jenkins jenkins 58988354 Jan  5 00:00 camel-spring4-0.0.1-SNAPSHOT-jar-with-dependencies.jar

Here, we can see, that a large JAR file with all dependencies has been created. Now let us try to execute it:

(container) $ java -jar '/var/jenkins_home_local/workspace/GitHub Triggered Build/target/camel-spring4-0.0.1-SNAPSHOT-jar-with-dependencies.jar'
17/01/05 00:07:50 INFO main.MainSupport: Apache Camel 2.16.0 starting
0 [main] INFO org.apache.camel.main.MainSupport  - Apache Camel 2.16.0 starting
...
17/01/05 00:07:52 INFO spring.SpringCamelContext: Total 15 routes, of which 15 is started.
2420 [main] INFO org.apache.camel.spring.SpringCamelContext  - Total 15 routes, of which 15 is started.
17/01/05 00:07:52 INFO spring.SpringCamelContext: Apache Camel 2.16.0 (CamelContext: camel-1) started in 0.876 seconds
2422 [main] INFO org.apache.camel.spring.SpringCamelContext  - Apache Camel 2.16.0 (CamelContext: camel-1) started in 0.876 seconds

Yes. perfect, it seems to work.

thumps_up_3

You can stop the Apache Camel process by pressing <CTRL>-C in the console.

Step 7: How to get the JAR file from the Jenkins Container to the Docker Host in case you have mapped the jenkins_home

In case, you have taken the alternative (A) way or starting Jenkins with your own jenkins_home directory on the Docker host on step 1.2, you can retrieve the JAR file from the project file without copying the file from the container to the Docker host. In my case, the project folder is located on

<jenkins_home>/workspace/GitHub Triggered Build

2016-12-09-15_49_11-github-triggered-build

And from there, the default location for Gradle to place the created JAR file is on ‘build/libs’ as discussed here:
2016-12-09-15_50_27-libs

Step 7: Alternative (B) Retrieve JAR File in case jenkins_home is located on the Container only

In case, you have taken the alternative (B) way or starting Jenkins with the jenkins_home directory on the Docker container on step 1.2, you need to copy the JAR file from the container to another location. The easiest way to do so is to copy it via

(dockerhost) $ docker cp <containerId>:/file/path/within/container /host/path/target

The container ID can be seen via docker ps:

(dockerhost) $ docker ps | grep jenkins
9159bedefbee        oveits/jenkins_tutorial:part2_step1   "/start.sh"              9 hours ago         Up 9 hours          0.0.0.0:8080->8080/tcp, 0.0.0.0:50000->50000/tcp   jenkins

Now we can copy the jar file to the Docker host via:

(dockerhost) $ docker cp '9159bedefbee:/var/jenkins_home_local/workspace/GitHub Triggered Build/build/libs/GitHub Triggered Build-0.0.1-SNAPSHOT.jar' '/vagrant/GitHub Triggered Build-0.0.1-SNAPSHOT.jar'

Since our Docker host is a Vagrant virtual machine, we have chosen a destination on the /vagrant folder, since this folder is synchronized with the Vagrant host machine per default. This way, we can get access to the JAR file on the host machine without further ado:
2017-01-03-00_17_19-ubuntu-trusty64-docker_openshift-installer

Appendix A: Error Message “Cannot run gradle”

Problem:

If the Gradle plugin is installed, but not configured according to step 3 above, you will get following build error:

2016-12-09-11_19_14-github-triggered-build-1-console-jenkins

Resolution:

Perform step 3: Prepare Gradle Usage (alternative (a), see above)

Appendix B: How I have created the Image for Part 2 (oveits/jenkins_tutorial:part2_step1)

Note: This appendix is for reference only and describes, how I have created the Docker images for usage in Part 2 (Part 3, …). You do not need to follow those steps.

The images like jenkins_tutorial:part2 are designed for users, who want to skip steps performed in part 1 and directly start with step 1 in part 2. Similarly with images like jenkins_tutorial:part3_step1jenkins_tutorial:part4_step1, …

My target is to create an Image that has the jenkins_home directory stored in the container. The directory /var/jenkins_home cannot be saved in the container, since it is defined as external volume (if I run a docker commit, all changes in /var/jenkins_home are ignored, even if it is not mapped to a Docker host volume).

Since I do not know how to remove the VOLUME label from /var/jenkins_home in the official jenkins image, the only solution I see is to use a different Jenkins home directory within the container/image.

Step B.1 Start Container from official Jenkins Image

sudo docker run -it --name jenkins_tutorial -p8080:8080 -p50000:50000 -eJENKINS_HOME="/var/jenkins_home_local" --entrypoint=bash jenkins

Step B.2: Create new home Directory

On another terminal determine the container ID (with docker ps).

Assuming that running container has ID 84ec9c83c1ce, I log into the container as root (see this Stackoverflow page), even though we do not know the root password:

(dockerhost)$ sudo docker exec -u 0 -it 84ec9c83c1ce bash

Inside the container, we now can create the new home directory and assign it to the user “jenkins”

(container)# mkdir /var/jenkins_home_local
(container)# chown jenkins:jenkins /var/jenkins_home_local

Step B.3: Create new Entrypoint Startup Script

Now we can create a new entrypoint startup script as follows:

(container)# echo '#!/bin/bash' > /start.sh
(container)# echo 'export JENKINS_HOME=/var/jenkins_home_local' >> /start.sh
(container)# echo '/bin/tini -- /usr/local/bin/jenkins.sh' >> /start.sh
(container)# chmod +x /start.sh
(container)# exit

Step B.4: Save Container into an Image

With the following command, we can save the container as a Docker image:

(dockerhost)$ sudo docker commit e82d5277431d jenkins_local

Step B.5: Change Entrypoint

We now change the entrypoint to run the start.sh script we have created above. For that, we run a new container with the entrypoint and save it as image again.

(dockerhost)$ sudo docker run -it --name jenkins_tutorial -p8080:8080 -p50000:50000 -eJENKINS_HOME="/var/jenkins_home_local" --entrypoint="/start.sh" jenkins_local

Step B.6: Save Container as Docker Image

We now can save the container as a Docker image with the new entrypoint. For that, we can issue a docker ps command to find the container ID.

Assuming that the new container ID is 5e82d8360b3e, we can save the container as an image with the following command. With the docker push commands, we save the image to Docker Hub (here as version 0.4 and as an implicit latest tag):

(dockerhost)$ docker commit 5e82d8360b3e oveits/jenkins_tutorial:part1_step1_v0.4
(dockerhost)$ docker tag jenkins_local oveits/jenkins_tutorial:part1_step1
(dockerhost)$ docker push oveits/jenkins_tutorial:part1_step1_v0.4
(dockerhost)$ docker push oveits/jenkins_tutorial:part1_step1

Step B.7: Create Image for Part 2 (Part 3, …, Part X)

For creating the image oveits/jenkins_tutorial:part1_step1 for usage on part2, we

  1. start image oveits/jenkins_tutorial:part1_step1
  2. perform all steps described on part 1
  3. save (commit) the changes to image oveits/jenkins_tutorial:part2_step1

TODO: the steps of Appendix B should be automated by using a Dockerfile and saving start.sh and the Dockerfile to Github.

Appendix C: docker exec produces error “unable to find user jenkins: no matching entries in passwd file”

Symptoms:

The images I have created in Appendix B can be started and will run inside as the user “jenkins”. However, if the container is running and you want to create a new parallel bash session like so:

(dockerhost)$ docker exec -it 9159bedefbee bash
unable to find user jenkins: no matching entries in passwd file

Here, I was assuming that the Docker container ID is 9159bedefbee.

The error message is misleading, since I can show that the passwd file contains a jenkins user by entering as root:

(dockerhost)$ docker exec -u 0 -it 9159bedefbee bash
root@9159bedefbee:/# grep jenkins /etc/passwd
jenkins:x:1000:1000::/var/jenkins_home:/bin/bash

Workaround:

You can run an exec session as user “jenkins” by specifying the user ID 1000:

(dockerhost)$ docker exec -u 1000 -it 9159bedefbee bash
jenkins@9159bedefbee:/$

Summary

In this blog post we have performed following tasks:

  1. Started Jenkins in a Docker container
  2. Created a project
  3. Configured the project with Git and Gradle information
  4. Manually triggered a build
    1. Jenkins has downloaded the Git repository,
    2. started Gradle build and performed the tasks we have configured in the project configuration
  5. Searched for the executable JAR file on the Jenkins server and started the JAR file successfully from command line

In order to avoid any compatibility issues with the java version on the host, we have run Jenkins in a Docker container. In order to better see what happens under the hood, we have chosen to run the Docker container in interactive terminal mode. While it was a little bit confusing that the GitHub repository had to be specified twice, everything was working fine, once the Source code management was configured correctly

Further Reading:

 

0

Cassandra “Hello World” Example


 

cassandra-logo

 

Today, we will introduce Cassandra, a distributed and resilient, highly scalable noSQL database. For simplicity, we will run a cluster it within Docker containers and test the resiliency functions by killing one of two containers and verifying that all data is retained.

What is Cassandra?

Apache Cassandra is a fast, distributed noSQL database that can be used for big data use cases. A short comparison of Apache Cassandra with Apache Hadoop can found in this Cassandra vs Hadoop blog post:

  • Hadoop is a big data framework for storing and analyzing a vast amount of unstructured, historic data. Why ‘historic’? The reason is that the search capabilities of Hadoop rely on long-running, CPU-intensive MapReduce processes that are running as batch processes.
  • Cassandra is a distributed noSQL database for structured data, and is ideally suited for structured, “hot” data, i.e. Cassandra is capable of processing online workloads of a transactional nature.

I have found following figure that compares Cassandra with SOLR/Lucene and Apache Hadoop:

srchsolrintro
Source: https://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/srch/srchIntro.html

Target Configuration for this Blog Post

In this Hello World blog post, we will create two Cassandra server containers and a Cassandra client container. For sake of this test, the Cassandra databases are stored within the containers (in a productive environment, you would most likely store the database outside the container). We will add data to the cluster and make sure the data is replicated to both servers. We can test this by shutting down one server container first, starting a new server container to restore the redundancy, shutting down a second container and make sure, that the data is still available.

2016-12-08-20_32_11-unbenannt-1-libreoffice-impress2016-12-08-20_38_182016-12-08-20_49_39

Tools used

  • Vagrant 1.8.6
  • Virtualbox 5.0.20
  • Docker 1.12.1
  • Cassandra 3.9

Prerequisites:

  • > 3.3 GB DRAM (Docker host: ~0.3 GB, ~1.5 GB per Cassandra node, < ~0.1 GB for Cassandra client)

Step 1: Install a Docker Host via Vagrant and Connect to the Host via SSH

If you are using an existing docker host, make sure that your host has enough memory and your own Docker ho

We will run Cassandra in Docker containers in order to allow for maximum interoperability. This way, we always can use the latest Logstash version without the need to control the java version used: e.g. Logstash v 1.4.x works with java 7, while version 5.0.x works with java 8 only, currently.

If you are new to Docker, you might want to read this blog post.

Installing Docker on Windows and Mac can be a real challenge, but no worries: we will show an easy way here, that is much quicker than the one described in Docker’s official documentation:

Prerequisites of this step:

  • I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy. However, if you cannot get rid of your HTTP proxy, read this blog post.
  • Administration rights on you computer.

Steps to install a Docker Host VirtualBox VM:

Download and install Virtualbox (if the installation fails with error message “<to be completed> see Appendix A of this blog post: Virtualbox Installation Workaround below)

1. Download and Install Vagrant (requires a reboot)

2. Download Vagrant Box containing an Ubuntu-based Docker Host and create a VirtualBox VM like follows:

basesystem# mkdir ubuntu-trusty64-docker ; cd ubuntu-trusty64-docker
basesystem# vagrant init williamyeh/ubuntu-trusty64-docker
basesystem# vagrant up
basesystem# vagrant ssh

Now you are logged into the Docker host and we are ready for the next step: to create the Ansible Docker image.

Note: I have experienced problems with the vi editor when running vagrant ssh in a Windows terminal. In case of Windows, consider to follow Appendix C of this blog post and to use putty instead.

Step 2 (optional): Download Cassandra Image

This extra download step is optional, since the Cassandra Docker image will be downloaded automatically in step 3, if it is not already found on the system:

(dockerhost)$ docker pull cassandra
Using default tag: latest
latest: Pulling from library/cassandra

386a066cd84a: Already exists
e4bd24d76b78: Pull complete
5ccb1c317672: Pull complete
a7ffd548f738: Pull complete
d6f6138be804: Pull complete
756363f453c9: Pull complete
26258521e648: Pull complete
fb207e348163: Pull complete
3f9a7ac16b1d: Pull complete
49e0632fe1f1: Pull complete
ba775b0b41f4: Pull complete
Digest: sha256:f5b1391b457ead432dc05d34797212f038bd9bd4f0b0260d90ce74e53cbe7ca9
Status: Downloaded newer image for cassandra:latest

The version of the downloaded Cassandra image can be checked with following command:

(dockerhost)$ sudo docker run -it --rm --name cassandra cassandra -v
3.9

We are using version 3.9 currently. If you want to make sure that you use the exact same version as I have used in this blog, you can use the imagename cassandra:3.9 in all docker commands instead of cassandra only.

Step 2: Run Cassandra in interactive Terminal Mode

In this step, we will run Cassandra interactively (with -it switch instead of -d switch) to better see, what is happening. In a productive environment, you will use the detached mode -d instead of the interactive terminal mode -it.

We have found out by analyzing the Cassandra image via the online imagelayer tool, that the default command is to run /docker-entrypoint.sh cassandra -f  and that cassandra uses the ports 7000/tcp 7001/tcp 7199/tcp 9042/tcp 9160/tcp. We keep the entrypoint and map the ports to the outside world:

(dockerhost)$ sudo docker run -it --rm --name cassandra-node1 -p7000:7000 -p7001:7001 -p9042:9042 -p9160:9160 cassandra

INFO  15:30:17 Configuration location: file:/etc/cassandra/cassandra.yaml
INFO  15:30:17 Node configuration:[allocate_tokens_for_keyspace=null; authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_bootstrap=true; auto_snapshot=true; batch_size_fail_threshold_in_kb=50; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_address=172.17.0.4; broadcast_rpc_address=172.17.0.4; buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; cdc_enabled=false; cdc_free_space_check_interval_ms=250; cdc_raw_directory=null; cdc_total_space_in_mb=null; client_encryption_options=; cluster_name=Test Cluster; column_index_cache_size_in_kb=2; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_compression=null; commitlog_directory=/var/lib/cassandra/commitlog; commitlog_max_compression_buffers_in_pool=3; commitlog_periodic_queue_size=-1; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_batch_window_in_ms=null; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=null; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=16; concurrent_compactors=null; concurrent_counter_writes=32; concurrent_materialized_view_writes=32; concurrent_reads=32; concurrent_replicates=null; concurrent_writes=32; counter_cache_keys_to_save=2147483647; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; credentials_cache_max_entries=1000; credentials_update_interval_in_ms=-1; credentials_validity_in_ms=2000; cross_node_timeout=false; data_file_directories=[Ljava.lang.String;@2928854b; disk_access_mode=auto; disk_failure_policy=stop; disk_optimization_estimate_percentile=0.95; disk_optimization_page_cross_chance=0.1; disk_optimization_strategy=ssd; dynamic_snitch=true; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; enable_scripted_user_defined_functions=false; enable_user_defined_functions=false; enable_user_defined_functions_threads=true; encryption_options=null; endpoint_snitch=SimpleSnitch; file_cache_size_in_mb=null; gc_log_threshold_in_ms=200; gc_warn_threshold_in_ms=1000; hinted_handoff_disabled_datacenters=[]; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; hints_compression=null; hints_directory=null; hints_flush_period_in_ms=10000; incremental_backups=false; index_interval=null; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=null; inter_dc_stream_throughput_outbound_megabits_per_sec=200; inter_dc_tcp_nodelay=false; internode_authenticator=null; internode_compression=dc; internode_recv_buff_size_in_bytes=null; internode_send_buff_size_in_bytes=null; key_cache_keys_to_save=2147483647; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=172.17.0.4; listen_interface=null; listen_interface_prefer_ipv6=false; listen_on_broadcast_address=false; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; max_hints_file_size_in_mb=128; max_mutation_size_in_kb=null; max_streaming_retries=3; max_value_size_in_mb=256; memtable_allocation_type=heap_buffers; memtable_cleanup_threshold=null; memtable_flush_writers=1; memtable_heap_space_in_mb=null; memtable_offheap_space_in_mb=null; min_free_space_per_drive_in_mb=50; native_transport_max_concurrent_connections=-1; native_transport_max_concurrent_connections_per_ip=-1; native_transport_max_frame_size_in_mb=256; native_transport_max_threads=128; native_transport_port=9042; native_transport_port_ssl=null; num_tokens=256; otc_coalescing_strategy=TIMEHORIZON; otc_coalescing_window_us=200; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_cache_max_entries=1000; permissions_update_interval_in_ms=-1; permissions_validity_in_ms=2000; phi_convict_threshold=8.0; prepared_statements_cache_size_mb=null; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_scheduler_id=null; request_scheduler_options=null; request_timeout_in_ms=10000; role_manager=CassandraRoleManager; roles_cache_max_entries=1000; roles_update_interval_in_ms=-1; roles_validity_in_ms=2000; row_cache_class_name=org.apache.cassandra.cache.OHCProvider; row_cache_keys_to_save=2147483647; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=0.0.0.0; rpc_interface=null; rpc_interface_prefer_ipv6=false; rpc_keepalive=true; rpc_listen_backlog=50; rpc_max_threads=2147483647; rpc_min_threads=16; rpc_port=9160; rpc_recv_buff_size_in_bytes=null; rpc_send_buff_size_in_bytes=null; rpc_server_type=sync; saved_caches_directory=/var/lib/cassandra/saved_caches; seed_provider=org.apache.cassandra.locator.SimpleSeedProvider{seeds=172.17.0.4}; server_encryption_options=; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=false; storage_port=7000; stream_throughput_outbound_megabits_per_sec=200; streaming_socket_timeout_in_ms=86400000; thrift_framed_transport_size_in_mb=15; thrift_max_message_length_in_mb=16; thrift_prepared_statements_cache_size_mb=null; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; tracetype_query_ttl=86400; tracetype_repair_ttl=604800; transparent_data_encryption_options=org.apache.cassandra.config.TransparentDataEncryptionOptions@27ae2fd0; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; user_defined_function_fail_timeout=1500; user_defined_function_warn_timeout=500; user_function_timeout_policy=die; windows_timer_interval=1; write_request_timeout_in_ms=2000]
INFO  15:30:17 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO  15:30:17 Global memtable on-heap threshold is enabled at 251MB
INFO  15:30:17 Global memtable off-heap threshold is enabled at 251MB
WARN  15:30:18 Only 22.856GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
INFO  15:30:18 Hostname: 4ba7699e4fc2
INFO  15:30:18 JVM vendor/version: OpenJDK 64-Bit Server VM/1.8.0_111
INFO  15:30:18 Heap size: 1004.000MiB/1004.000MiB
INFO  15:30:18 Code Cache Non-heap memory: init = 2555904(2496K) used = 3906816(3815K) committed = 3932160(3840K) max = 251658240(245760K)
INFO  15:30:18 Metaspace Non-heap memory: init = 0(0K) used = 15609080(15243K) committed = 16252928(15872K) max = -1(-1K)
INFO  15:30:18 Compressed Class Space Non-heap memory: init = 0(0K) used = 1909032(1864K) committed = 2097152(2048K) max = 1073741824(1048576K)
INFO  15:30:18 Par Eden Space Heap memory: init = 167772160(163840K) used = 73864848(72133K) committed = 167772160(163840K) max = 167772160(163840K)
INFO  15:30:18 Par Survivor Space Heap memory: init = 20971520(20480K) used = 0(0K) committed = 20971520(20480K) max = 20971520(20480K)
INFO  15:30:18 CMS Old Gen Heap memory: init = 864026624(843776K) used = 0(0K) committed = 864026624(843776K) max = 864026624(843776K)
INFO  15:30:18 Classpath: /etc/cassandra:/usr/share/cassandra/lib/HdrHistogram-2.1.9.jar:/usr/share/cassandra/lib/ST4-4.0.8.jar:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/asm-5.0.4.jar:/usr/share/cassandra/lib/caffeine-2.2.6.jar:/usr/share/cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrent-trees-2.4.0.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/ecj-4.4.2.jar:/usr/share/cassandra/lib/guava-18.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/hppc-0.5.4.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.3.0.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/jflex-1.6.0.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/cassandra/lib/joda-time-2.4.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.2.jar:/usr/share/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/logback-classic-1.1.3.jar:/usr/share/cassandra/lib/logback-core-1.1.3.jar:/usr/share/cassandra/lib/lz4-1.3.0.jar:/usr/share/cassandra/lib/metrics-core-3.1.0.jar:/usr/share/cassandra/lib/metrics-jvm-3.1.0.jar:/usr/share/cassandra/lib/metrics-logback-3.1.0.jar:/usr/share/cassandra/lib/netty-all-4.0.39.Final.jar:/usr/share/cassandra/lib/ohc-core-0.4.3.jar:/usr/share/cassandra/lib/ohc-core-j8-0.4.3.jar:/usr/share/cassandra/lib/primitive-1.0.jar:/usr/share/cassandra/lib/reporter-config-base-3.0.0.jar:/usr/share/cassandra/lib/reporter-config3-3.0.0.jar:/usr/share/cassandra/lib/sigar-1.6.4.jar:/usr/share/cassandra/lib/slf4j-api-1.7.7.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.1.1.7.jar:/usr/share/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-3.9.jar:/usr/share/cassandra/apache-cassandra-thrift-3.9.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar::/usr/share/cassandra/lib/jamm-0.3.0.jar
INFO  15:30:18 JVM Arguments: [-Xloggc:/var/log/cassandra/gc.log, -ea, -XX:+UseThreadPriorities, -XX:ThreadPriorityPolicy=42, -XX:+HeapDumpOnOutOfMemoryError, -Xss256k, -XX:StringTableSize=1000003, -XX:+AlwaysPreTouch, -XX:-UseBiasedLocking, -XX:+UseTLAB, -XX:+ResizeTLAB, -XX:+PerfDisableSharedMem, -Djava.net.preferIPv4Stack=true, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:CMSWaitDuration=10000, -XX:+CMSParallelInitialMarkEnabled, -XX:+CMSEdenChunksRecordAlways, -XX:+CMSClassUnloadingEnabled, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintHeapAtGC, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -XX:+PrintPromotionFailure, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=10, -XX:GCLogFileSize=10M, -Xms1024M, -Xmx1024M, -Xmn200M, -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler, -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar, -Dcassandra.jmx.local.port=7199, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password, -Djava.library.path=/usr/share/cassandra/lib/sigar-bin, -Dcassandra.libjemalloc=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1, -Dlogback.configurationFile=logback.xml, -Dcassandra.logdir=/var/log/cassandra, -Dcassandra.storagedir=/var/lib/cassandra, -Dcassandra-foreground=yes]
WARN  15:30:18 Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
INFO  15:30:18 jemalloc seems to be preloaded from /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
WARN  15:30:18 JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
WARN  15:30:18 OpenJDK is not recommended. Please upgrade to the newest Oracle Java release
INFO  15:30:18 Initializing SIGAR library
WARN  15:30:18 Cassandra server running in degraded mode. Is swap disabled? : false,  Address space adequate? : true,  nofile limit adequate? : true, nproc limit adequate? : true
WARN  15:30:18 Directory /var/lib/cassandra/data doesn't exist
WARN  15:30:18 Directory /var/lib/cassandra/commitlog doesn't exist
WARN  15:30:18 Directory /var/lib/cassandra/saved_caches doesn't exist
WARN  15:30:18 Directory /var/lib/cassandra/hints doesn't exist
INFO  15:30:18 Initialized prepared statement caches with 10 MB (native) and 10 MB (Thrift)
INFO  15:30:19 Initializing system.IndexInfo
INFO  15:30:20 Initializing system.batches
INFO  15:30:20 Initializing system.paxos
INFO  15:30:20 Initializing system.local
INFO  15:30:20 Initializing system.peers
INFO  15:30:20 Initializing system.peer_events
INFO  15:30:20 Initializing system.range_xfers
INFO  15:30:20 Initializing system.compaction_history
INFO  15:30:20 Initializing system.sstable_activity
INFO  15:30:20 Initializing system.size_estimates
INFO  15:30:20 Initializing system.available_ranges
INFO  15:30:20 Initializing system.views_builds_in_progress
INFO  15:30:20 Initializing system.built_views
INFO  15:30:20 Initializing system.hints
INFO  15:30:20 Initializing system.batchlog
INFO  15:30:20 Initializing system.schema_keyspaces
INFO  15:30:20 Initializing system.schema_columnfamilies
INFO  15:30:20 Initializing system.schema_columns
INFO  15:30:20 Initializing system.schema_triggers
INFO  15:30:20 Initializing system.schema_usertypes
INFO  15:30:20 Initializing system.schema_functions
INFO  15:30:20 Initializing system.schema_aggregates
INFO  15:30:20 Not submitting build tasks for views in keyspace system as storage service is not initialized
INFO  15:30:20 Configured JMX server at: service:jmx:rmi://127.0.0.1/jndi/rmi://127.0.0.1:7199/jmxrmi
INFO  15:30:21 Initializing key cache with capacity of 50 MBs.
INFO  15:30:21 Initializing row cache with capacity of 0 MBs
INFO  15:30:21 Initializing counter cache with capacity of 25 MBs
INFO  15:30:21 Scheduling counter cache save to every 7200 seconds (going to save all keys).
INFO  15:30:21 Global buffer pool is enabled, when pool is exhausted (max is 251.000MiB) it will allocate on heap
INFO  15:30:21 Populating token metadata from system tables
INFO  15:30:21 Token metadata:
INFO  15:30:21 Initializing system_schema.keyspaces
INFO  15:30:21 Initializing system_schema.tables
INFO  15:30:21 Initializing system_schema.columns
INFO  15:30:21 Initializing system_schema.triggers
INFO  15:30:21 Initializing system_schema.dropped_columns
INFO  15:30:21 Initializing system_schema.views
INFO  15:30:21 Initializing system_schema.types
INFO  15:30:21 Initializing system_schema.functions
INFO  15:30:21 Initializing system_schema.aggregates
INFO  15:30:21 Initializing system_schema.indexes
INFO  15:30:21 Not submitting build tasks for views in keyspace system_schema as storage service is not initialized
INFO  15:30:21 Completed loading (5 ms; 1 keys) KeyCache cache
INFO  15:30:21 No commitlog files found; skipping replay
INFO  15:30:21 Populating token metadata from system tables
INFO  15:30:21 Token metadata:
INFO  15:30:22 Cassandra version: 3.9
INFO  15:30:22 Thrift API version: 20.1.0
INFO  15:30:22 CQL supported versions: 3.4.2 (default: 3.4.2)
INFO  15:30:22 Initializing index summary manager with a memory pool size of 50 MB and a resize interval of 60 minutes
INFO  15:30:22 Starting Messaging Service on /172.17.0.4:7000 (eth0)
WARN  15:30:22 No host ID found, created 1c7f41f6-4513-4949-abc3-0335af298fc8 (Note: This should happen exactly once per node).
INFO  15:30:22 Loading persisted ring state
INFO  15:30:22 Starting up server gossip
INFO  15:30:22 This node will not auto bootstrap because it is configured to be a seed node.
INFO  15:30:22 Generated random tokens. tokens are [295917137465811607, -302115512024598814, -4810310810107556185, -7541303704934353556, -6783448374042524533, -304630524111773314, 1533898300851998947, 3941083117284553885, -6988940081331410223, -4534890749515718142, 4308460093122238220, 7581503118159726763, -6723684273635062672, 1884874259542417756, 7788232024365633386, 5915388149540707275, -6016738271397646318, 1614489580517171638, -3947302573022728739, 1923926062108950529, -9108925830760249950, -9060042955322751814, -2000084340205921073, -6132707306760563996, -6883241210703381902, 8740195267857701913, 8041076389604804564, -6303053730206256533, 598769270772963796, -2041316525404296230, -3860133938689713137, 4497202060050914136, 8955694409023320159, 3976567367560776009, -9165275604411410822, 1012738234769039757, 7642490246886963502, -3432866062799149095, 2519199740046251471, 2388427761311398841, -6886560953340875448, -4905186677634319463, -2365025404983910008, -8627402965240057817, -7397018902928602797, 1108859385650835241, 5281094978891453223, 6855360109813178939, -7807165856254133598, 1028026472880211244, 16660751466306624, 4072175354537615176, 2046113304521435920, -4044116572714093080, 98476484927120434, -5650328009808548456, -1384196055490211145, 8269412378242105431, -3207741555694791033, 8461694112286132273, 7684200390338837062, -3510258253288340132, 8994172498716928245, 5962685449452339479, -6226929694277901237, -3500613953333362662, -1492091879021245132, -947060640443297664, -6146648192398214417, -4464345544784150661, 6672100140478851757, -1340386687486760416, -3402143056639425167, -8508238454664137195, 964918476456248216, -7768463348829798026, 7756599010305739999, 1151692851232028639, 5052762745532257677, -6938353080763027108, 6683494536543705153, -6365889230484419309, 7384531241040909254, -4442336472114294091, -3750727149103368055, -8877412501490432986, 2647020543458892072, 6274135164775101483, -3649936680386055010, -7567305039792354763, -2172430473128016611, -2414729292194865719, 1408230014943390277, 4364677156300888178, 755861929549178049, 8235690776885053324, 8581387345513151684, 5002718336674882238, -870258856608853484, -1483711216472527900, -1255676054139266272, 331419834776310203, 1622392659577676198, 1187388789833685773, -5932747467864101101, 8122153151262337345, -2146380548913123401, 8197662599537401443, 8506067402867065505, 3090918727224804345, 3744225329829803414, 7619357059829297568, 556700409131325501, 5248429818045721574, 4765015544140971772, 2971482486644427028, -9173245872558505964, -2210735674653180475, 1181488914969268296, 2089494377150191570, 2047582435108024564, -6175545876351053551, -3298063022651817995, -1325629347910090158, 7488863875459007234, -5497017350454887793, -6756613781665488411, -6330009014934080380, 7681124670001326945, -7376366050502109636, 7992819870754351976, 3544290132427354974, -254827227758952719, -5704064381235954635, -836110888355241863, 7698549346624041319, 2301405470858849916, -481871362892611650, 6645744400280051944, -3818320263511106188, 1562581647772413229, -8160175779708883692, -2739834834172049430, 3510749139267324868, -5348896967283946783, 3527384472005761253, 4400799032050497147, -8651238311044541754, -5523410360681048732, 7071021940179800806, -5960796444158211925, 8420370346185308708, -2728886487595348029, 3105537168230717181, -1517621972941887996, 8452927690910375980, 6016490440494310456, -6889421189750345676, -2831286529760432055, -2189711506599834998, -7186866319154931067, -7009440320556973546, -2037384534764248619, -2220440490024002478, -377216044270617087, -1134884987470025768, 5192116499655548796, -3347230676841655272, -9130715416947308740, 9204760499816567337, -2439108211250476827, 1538934571518472975, -337514931320682527, -1674538086055718391, -3843322791290462622, 953749838960659962, 3330174901016008157, -2756012697370451081, -3602025464158100127, 3704439510960864841, 3752924436635734010, -5000939990480386852, 6154714044831917923, -8885087254969833946, 8407434459532892399, 4101903548525500975, 7904189481335560232, -2940053311648165067, 7494278666585169078, 1192828145405490948, 4470543315284180590, 654377960824023051, 2686967977432480840, 1411203428069491170, -8646993717939343792, 1159570865425141646, 8797484166341348183, 1079738560110059198, 1312350127490152747, -5189555431814227920, -2519118820283044758, 6059756840747708677, 5774693484122764099, -4349072189170425833, 4035740869403628813, 1511153166937753622, -3218856330350607949, 7304305360157341382, 3235867109258004764, -8951317005617098076, -162420324859555355, -793345512783903889, 5117521076123648029, -5882312494461926993, 8597264656412748201, -2877683839203210639, 9189818776605217015, 3313374825585251422, -7874810056424078419, 5674307591120690376, 9164898553319153477, 4358330615806437879, -5310359817210733626, 3113922769030946482, -659865237366019522, -9119890847611597075, -9205810881436744029, 8288333514535517283, 110170749276212955, 4325548561407427018, -1734212991042000302, -6873916426971903298, -7698545135503972364, -6954734571985878843, 1921094010145318263, 8877598562894529515, 241048672326064469, 900676715600069606, 5777523205257439109, 3010110724142136055, 8665660702093987211, 8608092300575511901, 7093280185971788300, 2944189561076742298, -2953386007626714319, -4900156269772444277, -5634850246813770872, 2948626453088923273, 2176870549249253374, -7387349836523930484, -9134092894261200380, 3875564163537339084, 6061299752516911114, -8854152481465861942, -9205171033569700009, 1363364174055650687]
INFO  15:30:22 Create new Keyspace: KeyspaceMetadata{name=system_traces, params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=2}}, tables=[org.apache.cassandra.config.CFMetaData@1ef21588[cfId=c5e99f16-8677-3914-b17e-960613512345,ksName=system_traces,cfName=sessions,flags=[COMPOUND],params=TableParams{comment=tracing sessions, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, default_time_to_live=0, memtable_flush_period_in_ms=3600000, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@cc79ec64, extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | [client command coordinator duration request started_at parameters]],partitionKeyColumns=[session_id],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UUIDType,columnMetadata=[client, command, session_id, coordinator, request, started_at, duration, parameters],droppedColumns={},triggers=[],indexes=[]], org.apache.cassandra.config.CFMetaData@43aa6aac[cfId=8826e8e9-e16a-3728-8753-3bc1fc713c25,ksName=system_traces,cfName=events,flags=[COMPOUND],params=TableParams{comment=tracing events, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, default_time_to_live=0, memtable_flush_period_in_ms=3600000, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@cc79ec64, extensions={}, cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.TimeUUIDType),partitionColumns=[[] | [activity source source_elapsed thread]],partitionKeyColumns=[session_id],clusteringColumns=[event_id],keyValidator=org.apache.cassandra.db.marshal.UUIDType,columnMetadata=[activity, session_id, thread, event_id, source, source_elapsed],droppedColumns={},triggers=[],indexes=[]]], views=[], functions=[], types=[]}
INFO  15:30:22 Not submitting build tasks for views in keyspace system_traces as storage service is not initialized
INFO  15:30:22 Initializing system_traces.events
INFO  15:30:22 Initializing system_traces.sessions
INFO  15:30:22 Create new Keyspace: KeyspaceMetadata{name=system_distributed, params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=3}}, tables=[org.apache.cassandra.config.CFMetaData@7a49fac6[cfId=759fffad-624b-3181-80ee-fa9a52d1f627,ksName=system_distributed,cfName=repair_history,flags=[COMPOUND],params=TableParams{comment=Repair history, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, default_time_to_live=0, memtable_flush_period_in_ms=3600000, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@cc79ec64, extensions={}, cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.TimeUUIDType),partitionColumns=[[] | [coordinator exception_message exception_stacktrace finished_at parent_id range_begin range_end started_at status participants]],partitionKeyColumns=[keyspace_name, columnfamily_name],clusteringColumns=[id],keyValidator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),columnMetadata=[status, id, coordinator, finished_at, participants, exception_stacktrace, parent_id, range_end, range_begin, exception_message, keyspace_name, started_at, columnfamily_name],droppedColumns={},triggers=[],indexes=[]], org.apache.cassandra.config.CFMetaData@19525fa0[cfId=deabd734-b99d-3b9c-92e5-fd92eb5abf14,ksName=system_distributed,cfName=parent_repair_history,flags=[COMPOUND],params=TableParams{comment=Repair history, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, default_time_to_live=0, memtable_flush_period_in_ms=3600000, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@cc79ec64, extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | [exception_message exception_stacktrace finished_at keyspace_name started_at columnfamily_names options requested_ranges successful_ranges]],partitionKeyColumns=[parent_id],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.TimeUUIDType,columnMetadata=[requested_ranges, exception_message, keyspace_name, successful_ranges, started_at, finished_at, options, exception_stacktrace, parent_id, columnfamily_names],droppedColumns={},triggers=[],indexes=[]], org.apache.cassandra.config.CFMetaData@59907bc9[cfId=5582b59f-8e4e-35e1-b913-3acada51eb04,ksName=system_distributed,cfName=view_build_status,flags=[COMPOUND],params=TableParams{comment=Materialized View build status, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, default_time_to_live=0, memtable_flush_period_in_ms=3600000, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@cc79ec64, extensions={}, cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UUIDType),partitionColumns=[[] | [status]],partitionKeyColumns=[keyspace_name, view_name],clusteringColumns=[host_id],keyValidator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),columnMetadata=[status, keyspace_name, view_name, host_id],droppedColumns={},triggers=[],indexes=[]]], views=[], functions=[], types=[]}
INFO  15:30:22 Not submitting build tasks for views in keyspace system_distributed as storage service is not initialized
INFO  15:30:22 Initializing system_distributed.parent_repair_history
INFO  15:30:22 Initializing system_distributed.repair_history
INFO  15:30:22 Initializing system_distributed.view_build_status
INFO  15:30:22 Node /172.17.0.4 state jump to NORMAL
INFO  15:30:22 Create new Keyspace: KeyspaceMetadata{name=system_auth, params=KeyspaceParams{durable_writes=true, replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=1}}, tables=[org.apache.cassandra.config.CFMetaData@2bcd7a78[cfId=5bc52802-de25-35ed-aeab-188eecebb090,ksName=system_auth,cfName=roles,flags=[COMPOUND],params=TableParams{comment=role definitions, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=7776000, default_time_to_live=0, memtable_flush_period_in_ms=3600000, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@cc79ec64, extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | [can_login is_superuser salted_hash member_of]],partitionKeyColumns=[role],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[role, salted_hash, member_of, can_login, is_superuser],droppedColumns={},triggers=[],indexes=[]], org.apache.cassandra.config.CFMetaData@14f7f6de[cfId=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d,ksName=system_auth,cfName=role_members,flags=[COMPOUND],params=TableParams{comment=role memberships lookup table, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=7776000, default_time_to_live=0, memtable_flush_period_in_ms=3600000, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@cc79ec64, extensions={}, cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[] | []],partitionKeyColumns=[role],clusteringColumns=[member],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[role, member],droppedColumns={},triggers=[],indexes=[]], org.apache.cassandra.config.CFMetaData@53491684[cfId=3afbe79f-2194-31a7-add7-f5ab90d8ec9c,ksName=system_auth,cfName=role_permissions,flags=[COMPOUND],params=TableParams{comment=permissions granted to db roles, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=7776000, default_time_to_live=0, memtable_flush_period_in_ms=3600000, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@cc79ec64, extensions={}, cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[] | [permissions]],partitionKeyColumns=[role],clusteringColumns=[resource],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[resource, role, permissions],droppedColumns={},triggers=[],indexes=[]], org.apache.cassandra.config.CFMetaData@24fc2ad0[cfId=5f2fbdad-91f1-3946-bd25-d5da3a5c35ec,ksName=system_auth,cfName=resource_role_permissons_index,flags=[COMPOUND],params=TableParams{comment=index of db roles with permissions granted on a resource, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=7776000, default_time_to_live=0, memtable_flush_period_in_ms=3600000, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@cc79ec64, extensions={}, cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[] | []],partitionKeyColumns=[resource],clusteringColumns=[role],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[resource, role],droppedColumns={},triggers=[],indexes=[]]], views=[], functions=[], types=[]}
INFO  15:30:23 Not submitting build tasks for views in keyspace system_auth as storage service is not initialized
INFO  15:30:23 Initializing system_auth.resource_role_permissons_index
INFO  15:30:23 Initializing system_auth.role_members
INFO  15:30:23 Initializing system_auth.role_permissions
INFO  15:30:23 Initializing system_auth.roles
INFO  15:30:23 Waiting for gossip to settle before accepting client requests...
INFO  15:30:31 No gossip backlog; proceeding
INFO  15:30:31 Netty using native Epoll event loop
INFO  15:30:31 Using Netty Version: [netty-buffer=netty-buffer-4.0.39.Final.38bdf86, netty-codec=netty-codec-4.0.39.Final.38bdf86, netty-codec-haproxy=netty-codec-haproxy-4.0.39.Final.38bdf86, netty-codec-http=netty-codec-http-4.0.39.Final.38bdf86, netty-codec-socks=netty-codec-socks-4.0.39.Final.38bdf86, netty-common=netty-common-4.0.39.Final.38bdf86, netty-handler=netty-handler-4.0.39.Final.38bdf86, netty-tcnative=netty-tcnative-1.1.33.Fork19.fe4816e, netty-transport=netty-transport-4.0.39.Final.38bdf86, netty-transport-native-epoll=netty-transport-native-epoll-4.0.39.Final.38bdf86, netty-transport-rxtx=netty-transport-rxtx-4.0.39.Final.38bdf86, netty-transport-sctp=netty-transport-sctp-4.0.39.Final.38bdf86, netty-transport-udt=netty-transport-udt-4.0.39.Final.38bdf86]
INFO  15:30:31 Starting listening for CQL clients on /0.0.0.0:9042 (unencrypted)...
INFO  15:30:31 Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
INFO  15:30:33 Scheduling approximate time-check task with a precision of 10 milliseconds
INFO  15:30:33 Created default superuser role 'cassandra'

Step 3: Create a second Cassandra Node

2016-12-08-21_14_48

We want to start a second Cassandra container on the same Docker host for simple testing. We will connect to the container running on the first node via IP. For that we need to find out the IP address as follows:

(dockerhost)$ sudo docker inspect --format='{{ .NetworkSettings.IPAddress }}' cassandra-node1
172.17.0.2e

This information can be used in the next command by setting the CASSANDRA_SEEDS variable accordingly:

Note also that we have changed the port mapping in order to avoid port conflicts with the first Cassandra node:

(dockerhost)$ sudo docker run -it --rm --entrypoint="bash" --name cassandra-node2 \
              -p27000:7000 -p27001:7001 -p29042:9042 -p29160:9160 \
              -e CASSANDRA_SEEDS="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cassandra-node1)" \
              cassandra

Note that we have overridden the default entrypoint, so we get access to the terminal.

We now start Cassandra on the second node:

(container):/# /docker-entrypoint.sh cassandra -f
...
INFO 17:37:03 Starting listening for CQL clients on /0.0.0.0:9042 (unencrypted)...
INFO 17:37:03 Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
INFO 17:37:05 Created default superuser role 'cassandra'

On the first Cassandra node, we will see following additional log lines:

INFO 17:36:21 Handshaking version with /172.17.0.3
INFO 17:36:21 Handshaking version with /172.17.0.3
INFO 17:36:22 InetAddress /172.17.0.3 is now DOWN
INFO 17:36:22 Handshaking version with /172.17.0.3
INFO 17:36:23 Handshaking version with /172.17.0.3
INFO 17:36:54 [Stream #beb912f0-bca3-11e6-a935-4b019c4b758d ID#0] Creating new streaming plan for Bootstrap
INFO 17:36:54 [Stream #beb912f0-bca3-11e6-a935-4b019c4b758d, ID#0] Received streaming plan for Bootstrap
INFO 17:36:54 [Stream #beb912f0-bca3-11e6-a935-4b019c4b758d, ID#0] Received streaming plan for Bootstrap
INFO 17:36:55 [Stream #beb912f0-bca3-11e6-a935-4b019c4b758d] Session with /172.17.0.3 is complete
INFO 17:36:55 [Stream #beb912f0-bca3-11e6-a935-4b019c4b758d] All sessions completed
INFO 17:36:55 Node /172.17.0.3 state jump to NORMAL
INFO 17:36:55 InetAddress /172.17.0.3 is now UP

Note: if you get following error message:

Exception (java.lang.RuntimeException) encountered during startup: A node with address /172.17.0.3 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.

you need to start the service using the following line instead:

(container):/# /docker-entrypoint.sh cassandra -f -Dcassandra.replace_address=$(ip addr show | grep eth0 | grep -v '@' | awk '{print $2}' | awk -F"\/" '{print $1}')

The error will show up, if you have connected a Cassandra node to the cluster, then you destroy the node (by stopping the container) and re-start a new container. The new container will re-claim the now unused IP address of the destroyed container. However, this address is marked as unreachable within the cluster. We would like to re-use the IP address in the cluster, which requires the -Dcassandra.replace_address option.

The term

(container):/# ip addr show | grep eth0 | grep -v '@' | awk '{print $2}' | awk -F"\/" '{print $1}'
172.17.0.3

will return the current IP address of eth0 of the docker container and helps to feed in the correct IP address to the-Dcassandra.replace_address option.

Step 4: Start a CQL Client Container

Now we want to add some data to the distributed noSQL database. For that, we start a third container that can be used as CQL Client (CQL=Cassandra Query Language similar to SQL). We can start a CQL shell like follows:

(dockerhost)$ sudo docker run -it --rm -e CQLSH_HOST=$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cassandra-node1) --name cassandra-client --entrypoint=cqlsh cassandra
Connected to Test Cluster at 172.17.0.2:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh>

2016-12-08-20_34_48

Step 5: Create Keyspace

Now let us create a keyspace. A keyspace is the pendant for a database in SQL databases:

cqlsh> create keyspace mykeyspace with replication = {'class':'SimpleStrategy','replication_factor' : 2};
cqlsh>

Upon successful creation, the prompt will be printed without error.

Step 6: Create Table

For adding data, we need to enter the keypace and

cqlsh> use mykeyspace;
cqlsh:mykeyspace> create table usertable (userid int primary key, usergivenname varchar, userfamilyname varchar, userprofession varchar);
cqlsh:mykeyspace>

Step 7: Add Data

Now we can add our data:

cqlsh:mykeyspace> insert into usertable (userid, usergivenname, userfamilyname, userprofession) values (1, 'Oliver', 'Veits', 'Freelancer');
cqlsh:mykeyspace>

The CQL INSERT command has the same syntax as an SQL INSERT command.

Step 8 (optional): Update Data

We now can update a single column as well:

cqlsh:mykeyspace> update usertable set userprofession = 'IT Consultant' where userid = 1;

Now let us read the entry:

qlsh:mykeyspace> select * from usertable where userid = 1;

 userid | userfamilyname | usergivenname | userprofession
--------+----------------+---------------+----------------
      1 |          Veits |        Oliver |  IT Consultant

(1 rows)
cqlsh:mykeyspace>

Step 9 (optional): Query on Data other than the primary Index

In Cassandra, you cannot select upon a column without an index, if data filtering is not enabled:

cqlsh:mykeyspace> select * from usertable where userprofession = 'IT Consultant';
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

Let us add a secondary index to enable the query without a performance impact:

cqlsh:mykeyspace> create index idx_dept on usertable(userprofession);

Now the same query should be successful:

cqlsh:mykeyspace> select * from usertable where userprofession = 'IT Consultant';
 userid | userfamilyname | usergivenname | userprofession
--------+----------------+---------------+----------------
      1 |          Veits |        Oliver |  IT Consultant
(1 rows)

Yes, perfect.

Step 10: Test Resiliency

In the moment, we have following topology (with all nodes and the client being Docker containers on the same Docker host):

2016-12-08-20_34_48

Now, we will test, whether the data is retained, if the Cassandra application on node2 is stopped first. For that we stop the application on node2 by pressing Ctrl-C.

2016-12-08-20_38_18

On node1 we see:

INFO 18:06:50 InetAddress /172.17.0.5 is now DOWN
INFO 18:06:51 Handshaking version with /172.17.0.5

On the client we see that the data is still there:

cqlsh:mykeyspace> select * from usertable where userprofession = 'IT Consultant';
 userid | userfamilyname | usergivenname | userprofession
--------+----------------+---------------+----------------
      1 |          Veits |        Oliver |  IT Consultant
(1 rows)

Now let us start the Cassandra application on node 2 again and wait some time until the nodes are synchronized. On node1 we will get a log similar to:

INFO  17:36:35 Handshaking version with /172.17.0.4
INFO  17:38:58 Handshaking version with /172.17.0.4
INFO  17:38:58 Handshaking version with /172.17.0.4
INFO  17:39:00 Node /172.17.0.4 has restarted, now UP
INFO  17:39:00 Node /172.17.0.4 state jump to NORMAL
INFO  17:39:00 InetAddress /172.17.0.4 is now UP
INFO  17:39:00 Updating topology for /172.17.0.4
INFO  17:39:00 Updating topology for /172.17.0.4

Now we can stop Cassandra on node1 by pressing Ctrl-C on terminal 1. On node2, we will get a message similar to:

INFO  17:41:32 InetAddress /172.17.0.3 is now DOWN
INFO  17:41:32 Handshaking version with /172.17.0.3

At the same time, the node1 container is destroyed, since we have not changed the entrypoint for node1 and we have given the --rm option in the docker run command in step 2.

Now, we verify that the data is still retained:

cqlsh:mykeyspace> select * from usertable where userprofession = 'IT Consultant';
NoHostAvailable:

Oh, yes, that is clear: we have used node1’s IP address and port, when we have started the client.

2016-12-08-20_47_59

Let us now connect to node2 by entering “exit" and starting a new client container like follows:

(dockerhost)$ sudo docker run -it --rm -e CQLSH_HOST=$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cassandra-node2) -e CQLSH_PORT=9042 --name cassandra-client --entrypoint=cqlsh cassandra
Connected to Test Cluster at 172.17.0.4:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh>

2016-12-08-20_49_39

To be honest, I was a little bit confused here and would have expected that I need to connect to port 29042 instead, since I have started node2 with a port mapping from 29042 (outside port) to 9042 (container port). But this is wrong: from the Docker host, we can directly access the node2 container IP address with all its ports, including port 9042. Only, if we want to access container from outside the Docker host, we need to access port 29042 of the Docker host IP address instead of port 9042 of the node2 container:

(dockerhost)$ netstat -an | grep 9042
tcp6       0      0 :::29042                :::*                    LISTEN

Now let us check on the second node that the data is retained:

Anyway, we are connected to the second node now and can check the data:

cqlsh> use mykeyspace;
cqlsh:mykeyspace> select * from usertable where userid = 1;
 userid | userfamilyname | usergivenname | userprofession
--------+----------------+---------------+----------------
      1 |          Veits |        Oliver |  IT Consultant
(1 rows)

Perfect! The data is still there.

thumps_up_3

Appendix A: No keyspace has been specified.

If you get an error message like follows,

cqlsh> select * from usertable where userprofession = 'IT Consultant';
InvalidRequest: Error from server: code=2200 [Invalid query] message="No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename"

Then you have forgotten to prepend the “use mykeyspace; command:

cqlsh> use mykeyspace;
cqlsh:mykeyspace> select * from usertable where userid = 1;
 userid | userfamilyname | usergivenname | userprofession
--------+----------------+---------------+----------------
      1 |          Veits |        Oliver |  IT Consultant
(1 rows)

Summary

In this blog post we have performed following tasks:

  1. Introduced Cassandra with a little comparison with Hadoop
  2. Started a Cassandra node in a Docker container
  3. Started a second Cassandra node and build a Cassandra cluster
  4. Started a Cassandra Client in a container
  5. Added Data with replication factor 2 and performed some CQL commands for a warm-up
  6. shut down node 2 and verified that the data is still available
  7. started node 2 again and wait some seconds
  8. shut down node1 and verified that the data is still available on node2

With this test, we could verify that the data replication between nodes in a Cassandra cluster works and no data is lost, if a node fails.

 

10

Jenkins Part 1: Installation the Docker Way


2016-11-30-18_19_38
In this blog post, we will deploy and get started with Jenkins, the most popular open source tool for Continuous Integration and Continuous Deployment. As a modern way of installing, we install a Docker host and deploy a Jenkins Docker container on this host. Then we will log in and install commonly used plugins, before we poke around and prepare for the next step, i.e. creating an automated build process for creating, testing and deploying software.

This blog post series is divided into following parts:

    • Part 1 (this blog): Installation and Configuration of Jenkins, loading Plugins
    • Part 2: Creating our first Jenkins Job: GitHub download and Software build
    • Part 3: Periodic and automatically triggered Builds
    • Part 4 (planned): running automated Tests

What is Jenkins?

Jenkins is the leading open source automation server mostly used in continuous integration and continuous deployment pipelines. Jenkins provides hundreds of plugins to support building, deploying and automating any project.

Waterfall Process

In former times, software version life cycle was often counted in many months or years: a set of required features for a new version was defined, and then all of those features were first designed, then implemented, verified, delivered and maintained. In this software development process called waterfall process, each of the phase could take weeks or even months.

2016-12-01-20_37_59
Waterfall Process

Agile Process

Nowadays, most new software is created in an agile process. Unlike in the waterfall process, the time and other resources for a process cycle, the so-called sprint, is fixed: often to two or four weeks. A quite small set of features is implemented during each sprint, but all phases from design over implementation, test to delivery are accomplished in a single sprint. The great advantage of the agile methodology is that you can deliver a fully tested software version every two or four weeks. This is giving your customer the chance to see the progress and the developers can get early feedback, which makes sure they do not run into a wrong direction.

Agile Methodology
Source: https://crowdsourcedtesting.com/resources/wp-content/uploads/2016/07/agile-methodolody_695x260.jpg found in this article.

Jenkins for Build, Test and Deployment Automation

Since all processes, including software build, test and deployment, are performed every two or four weeks, this is an ideal playground for automation tools like Jenkins: it does not make sense to occupy a person with those tedious repeated tasks, if it can also be done by a computer.

Here comes Jenkins into play: After the developer commits a code change to the repository, Jenkins will detect this change and will trigger the build and test process. The results are immediately sent to the developer. Since the steps are automated now, the developer can trigger the process after each small, incremental change. He gets early feedback, if he has broken the software unintentionally, which makes troubleshooting much easier.

Jenkins build, test and deployment pipeline
Jenkins build, test and deployment pipeline

If all tests were successful, Jenkins can automatically deliver and deploy the software to a staging system or even to a production system.

The whole process is called continuous integration (build and test automation with feedback), continuous delivery (deployment can be done “automated” with a push of a button) or continuous deployment (deployment is performed automatically with no manual intervention).

Installing Jenkins the Docker Way

2016-12-02-10_40_22

Tools used

      • Vagrant 1.8.6
      • Virtualbox 5.0.20
      • Docker 1.12.1
      • Jenkins 2.19.3

Prerequisites:

      • Free DRAM overall >~ 1 GB, better 2 GB (Vagrant, VirtualBox and Jenkins container)
      • If Docker host is available already: free DRAM of host before starting the Jenkins container >~ 600MB, better 1 GB

Step 1: Install a Docker Host via Vagrant and Connect to the Host via SSH

If you are using an existing docker host, you can skip this step. Make sure that your host has enough memory.

We will run Jenkins in a Docker container in order to allow for maximum interoperability. This way, we always can use the latest Jenkins version without the need to control the java version used.

If you are new to Docker, you might want to read this blog post.

Installing Docker on Windows and Mac can be a real challenge, but no worries: we will show an easy way here, that is much quicker than the one described in Docker’s official documentation:

Prerequisites of this step:

      • I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy. However, if you cannot get rid of your HTTP proxy, read this blog post.
      • Administration rights on you computer.

Steps to install a Docker Host VirtualBox VM:

Download and install Virtualbox (if the installation fails with error message “Setup Wizard ended prematurely”, see Appendix A of this blog post: Virtualbox Installation Workaround below)

1. Download and Install Vagrant (requires a reboot)

2. Download Vagrant Box containing an Ubuntu-based Docker Host and create a VirtualBox VM like follows:

basesystem# mkdir ubuntu-trusty64-docker ; cd ubuntu-trusty64-docker
basesystem# vagrant init williamyeh/ubuntu-trusty64-docker
basesystem# vagrant up
basesystem# vagrant ssh

Now you are logged into the Docker host and we are ready for the next step: to create the Ansible Docker image.

Note: I have experienced problems with the vi editor when running vagrant ssh in a Windows terminal. In case of Windows, consider to follow Appendix C of this blog post and to use putty instead.

Step 2 (optional): Download Jenkins Image

This extra download step is optional, since the Docker image will be downloaded automatically in step 3, if it is not already found on the system:

(dockerhost)$ sudo docker pull jenkins
Using default tag: latest
latest: Pulling from library/jenkins
Digest: sha256:8820149b54bfc5d05146b82150b5fdab583eef3e0499fb4ed630f77647a42942
Status: Image is up to date for jenkins:latest

The version of the downloaded Jenkins image can be checked with following command:

(dockerhost)$ sudo docker run -it --rm jenkins --version
2.19.3

We are using version 2.9.13 currently. If you want to make sure that you use the exact same version as I have used in this blog, you can use the imagename jenkins:2.19.3 in all docker commands instead of jenkins only.

Note: The content of the jenkins image can be reviewed on this link. There, we find that the image has an entrypoint /bin/tini -- /usr/local/bin/jenkins.sh, which we could override with the --entrypoint bash option, if we wanted to start a bash shell in the jenkins image. However, in Step 3, we will keep the entrypoint for now.

Step 3: Start Jenkins in interactive Terminal Mode

In this step, we will run Jenkins interactively (with -it switch instead of -d switch) to better see, what is happening. But first, we check that the port we will use is free:

(dockerhost)$ sudo docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                                            NAMES
0ec82b4ca2fd        google/cadvisor:latest   "/usr/bin/cadvisor -l"   2 days ago          Up 2 days           0.0.0.0:8080->8080/tcp                           cadvisor
...

Since we see that one of the standard ports of Jenkins (8080, 50000) is already occupied and I do not want to confuse the readers of this blog post by mapping the port to another host port, I just stop the cadvisor container for this “hello world”:

(dockerhost)$ sudo docker stop cadvisor
cadvisor

Jenkins will be in need of a persistent storage. For that, we create a new folder on the Docker host:

(dockerhost)$ mkdir jenkins_home; cd jenkins_home

Note: The content of the jenkins image can be reviewed on this link. There, we find that the image has an entrypoint /bin/tini -- /usr/local/bin/jenkins.sh, which we could override with the --entrypoint bash option, if we wanted to start a bash shell in the jenkins image.

We start the Jenkins container with the jenkins_home Docker host volume mapped to /var/jenkins_home:

(dockerhost)$ sudo docker run -it --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins
Running from: /usr/share/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
Nov 30, 2016 6:12:14 PM Main deleteWinstoneTempContents
WARNING: Failed to delete the temporary Winstone file /tmp/winstone/jenkins.war
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Logging initialized @347ms
Nov 30, 2016 6:12:14 PM winstone.Logger logInternal
INFO: Beginning extraction from war file
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog warn
WARNING: Empty contextPath
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: jetty-9.2.z-SNAPSHOT
Nov 30, 2016 6:12:16 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: NO JSP Support for /, did not find org.eclipse.jetty.jsp.JettyJspServlet
Jenkins home directory: /var/jenkins_home found at: EnvVars.masterEnvVars.get("JENKINS_HOME")
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started w.@7674f035{/,file:/var/jenkins_home/war/,AVAILABLE}{/var/jenkins_home/war}
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started ServerConnector@548d708a{HTTP/1.1}{0.0.0.0:8080}
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started @3258ms
Nov 30, 2016 6:12:17 PM winstone.Logger logInternal
INFO: Winstone Servlet Engine v2.0 running: controlPort=disabled
Nov 30, 2016 6:12:17 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started initialization
Nov 30, 2016 6:12:17 PM jenkins.InitReactorRunner$1 onAttained
INFO: Listed all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Prepared all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Augmented all extensions
Nov 30, 2016 6:12:20 PM jenkins.InitReactorRunner$1 onAttained
INFO: Loaded all jobs
Nov 30, 2016 6:12:20 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Started Download metadata
Nov 30, 2016 6:12:20 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished Download metadata. 97 ms
Nov 30, 2016 6:12:20 PM org.jenkinsci.main.modules.sshd.SSHD start
INFO: Started SSHD at port 44955
Nov 30, 2016 6:12:21 PM jenkins.util.groovy.GroovyHookScript execute
INFO: Executing /var/jenkins_home/init.groovy.d/tcp-slave-agent-port.groovy
Nov 30, 2016 6:12:22 PM jenkins.InitReactorRunner$1 onAttained
INFO: Completed initialization
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh
INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@453fc3cf: display name [Root WebApplicationContext]; startup date [Wed Nov 30 18:12:22 UTC 2016]; root of context hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext obtainFreshBeanFactory
INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@453fc3cf]: org.springframework.beans.factory.support.DefaultListableBeanFactory@79a53f4b
Nov 30, 2016 6:12:22 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@79a53f4b: defining beans [authenticationManager]; root of factory hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh
INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@7ea44b7: display name [Root WebApplicationContext]; startup date [Wed Nov 30 18:12:22 UTC 2016]; root of context hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext obtainFreshBeanFactory
INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@7ea44b7]: org.springframework.beans.factory.support.DefaultListableBeanFactory@12544046
Nov 30, 2016 6:12:22 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@12544046: defining beans [filter,legacy]; root of factory hierarchy
Nov 30, 2016 6:12:22 PM jenkins.install.SetupWizard init
INFO:

*************************************************************
*************************************************************
*************************************************************

Jenkins initial setup is required. An admin user has been created and a password generated.
Please use the following password to proceed to installation:

0c4a8413a47943ac935a4902e3b8167e

This may also be found at: /var/jenkins_home/secrets/initialAdminPassword

*************************************************************
*************************************************************
*************************************************************

Nov 30, 2016 6:12:27 PM hudson.model.UpdateSite updateData
INFO: Obtained the latest update center data file for UpdateSource default
Nov 30, 2016 6:12:27 PM hudson.WebAppMain$3 run
INFO: Jenkins is fully up and running
--> setting agent port for jnlp
--> setting agent port for jnlp... done

Step 4: Open Jenkins in a Browser

Now we want to connect to the Jenkins portal. For that, open a browser and open the URL

<your_jenkins_host>:8080

In our case, Jenkins is running in a container and we have mapped the container-port 8080 to the local port 8080 of the Docker host. On the Docker host, we can open the URL.

localhost:8080

Note: In case of Vagrant with VirtualBox, per default, there is only a NAT-based interface and you need to create port-forwarding for any port you want to reach from outside (also the local machine you are working on is to be considered as outside). In this case, we need to add an entry in the port forwarding list of VirtualBox:
2016-11-30-19_22_22-regel-fur-port-weiterleitung

The Jenkins login screen will open:

2016-11-30-19_36_42-jenkins-jenkins

The admin password can be retrieved from the startup log, we have seen above (0c4a8413a47943ac935a4902e3b8167e), or we can find it by typing

(dockerhost: .../jenkins_home)$ cat secrets/initialAdminPassword
0c4a8413a47943ac935a4902e3b8167e

on the mapped jenkins_home folder on the Docker host.

Step 5: Install Plugins

Let us install the suggested plugins:

2016-11-30-19_37_43-setupwizard-jenkins

This may take a while to finish:

2016-11-30-19_39_38-setupwizard-jenkins

Step 6: Create an Admin User and log in

Then we reach a page, where we can create an Admin user:

2016-11-30-19_48_25-setupwizard-jenkins

Let us do so and save and finish.

2016-11-30-19_48_38-setupwizard-jenkins

Note: After this step, I have deleted the Jenkins container and started a new container attached to the same Jenkins Home directory. After that, all configuration and plugins were still available and we can delete containers after usage without loosing relevant information.

I have had a dinner break at this point. Maybe this is the reason I got following message when clicking the “Start using Jenkins” button?

2016-11-30-20_42_21-setupwizard-jenkins

What ever. After clicking “retry”, we reach the login page:

2016-11-30-20_44_53-jenkins

Coming Soon: Create a New Job

In the next, upcoming blog post, we will create our first Jenkins job. I plan to trigger the Maven and/or Gradle build of a Java executable file upon detection of a code change.

2016-11-30-20_47_12-dashboard-jenkins

Summary

In this blog post we have performed following tasks:

      1. installed a Docker host using Vagrant and VirtualBox
      2. downloaded and installed Jenkins the Docker way
      3. installed often used Plugins
      4. created an Admin account

In order to avoid any compatibility issues with the java version on the host, we have run Jenkins in a Docker container. In order to better see what happens under the hood, we have chosen to run the Docker container in interactive terminal mode. We have had an intermediate error “Unable to connect to Jenkins”, but this error was unreproducible and gone after the next click.

Further Reading:

 

2

Kibana “Hello World” Example – Part 3 of the ELK Stack Series


kibana-logo-color-vToday, we will introduce Kibana, a data visualization open source tool. As part of Elastic’s ELK stack (now called Elastic stack), Kibana is often used to visualize logging statistics and for management of the Elastic Stack. However, in this Tutorial, we will analyze statistical data from Twitter by comparing the popularity of Trump vs. Obama vs. Clinton.

For that, we will attach Logstash to the Twitter API and feed the data to Elasticsearch. Kibana will visualize the Elasticsearch data in a pie chart and a data time histogram. At the end, we will see, that Trump wins this little Twitter tweet count contest by far: he is mentioned in Twitter tweets about 20 times as often as Obama and Clinton together:


2016-11-20-18_24_06-popularity-of-us-politicians-kibana

But now let us get back to the technology topics.

This is the third blog post of a series about the Elastic Stack (a.k.a. ELK stack):

What is Kibana?

Kibana is a tool for visualization of logging statistics stored in the Elasticsearch database. Statistical graphs like histograms, line graphs, pie charts, sunbursts are core capabilities of Kibana.

kibana-basics

In addition, Logstash’s and Elasticsearch’s capabilites allow it to visualize statistical data on a geographical map. And with tools like Timelion and Graph, an administrator can analyze time-series and relationships, respectively:

kibana-geokibana-timekibana-graph

Kibana is often used in the so-called ELK pipeline for log file collection, analysis and visualization:

  • Elasticsearch is for searching, analyzing, and storing your data
  • Logstash (and Beats) is for collecting and transforming data, from any source, in any format
  • Kibana is a portal for visualizing the data and to navigate within the elastic stack

 

2016-11-17-18_31_39

 

Target Configuration for this Blog Post

In this Hello World blog post, we will use simple HTTP Verbs towards the RESTful API of Elasticsearch to create, read and destroy data entries:

2016-11-18-21_15_08

As a second step, we will attach Logstash to Twitter and Elasticsearch, whose data will be visualized in Kibana. Apart from the data source on the left, this is the same as the usual ELK pipeline:

2016-11-20-18_54_28

This will allow us to analyze the number of Twitter tweets with certain keywords in the text.

Tools used

  • Vagrant 1.8.6
  • Virtualbox 5.0.20
  • Docker 1.12.1
  • Logstash 5.0.1
  • Elasticsearch 5.0.1
  • Kibana 5.0.1

Prerequisites:

  • DRAM >~ 4GB
  • The max virtual memory areas vm.max_map_count must be at least 262144, see this note on the official documentation.
    See also Appendix B below, how to set the value on Linux temporarily, permanently and also for the next Vagrant-created Linux VM.

Step 1: Install a Docker Host via Vagrant and Connect to the Host via SSH

If you are using an existing docker host, make sure that your host has enough memory and your own Docker ho

We will run Kibana, Elasticstack and Logstash in Docker containers in order to allow for maximum interoperability. This way, we always can use the latest Logstash version without the need to control the java version used: e.g. Logstash v 1.4.x works with java 7, while version 5.0.x works with java 8 only, currently.

If you are new to Docker, you might want to read this blog post.

Installing Docker on Windows and Mac can be a real challenge, but no worries: we will show an easy way here, that is much quicker than the one described in Docker’s official documentation:

Prerequisites of this step:

  • I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy. However, if you cannot get rid of your HTTP proxy, read this blog post.
  • Administration rights on you computer.

Steps to install a Docker Host VirtualBox VM:

Download and install Virtualbox (if the installation fails with error message “Oracle VM Virtualbox x.x.x Setup Wizard ended prematurely” see Appendix A of this blog post: Virtualbox Installation Workaround below)

1. Download and Install Vagrant (requires a reboot)

2. Download Vagrant Box containing an Ubuntu-based Docker Host and create a VirtualBox VM like follows:

basesystem# mkdir ubuntu-trusty64-docker ; cd ubuntu-trusty64-docker
basesystem# vagrant init williamyeh/ubuntu-trusty64-docker
basesystem# vagrant up
basesystem# vagrant ssh

Now you are logged into the Docker host and we are ready for the next step: to create the Ansible Docker image.

Note: I have experienced problems with the vi editor when running vagrant ssh in a Windows terminal. In case of Windows, consider to follow Appendix C of this blog post and to use putty instead.

Step 2 (optional): Download Kibana Image

This extra download step is optional, since the Kibana Docker image will be downloaded automatically in step 3, if it is not already found on the system:

(dockerhost)$ docker pull kibana
Using default tag: latest
latest: Pulling from library/kibana

386a066cd84a: Already exists
9ca92df3a376: Pull complete
c04752ac6b44: Pull complete
7bfecbcf70ff: Pull complete
f1338b2c8ead: Pull complete
bfe1da400856: Pull complete
cf0b2da1d7f9: Pull complete
aeaada72e01d: Pull complete
0162f4823d8e: Pull complete
Digest: sha256:c75dbca9c774887a3ab778c859208db638fde1a67cfa48aad703ac8cc94a793d
Status: Downloaded newer image for kibana:latest

The version of the downloaded Kibana image can be checked with following command:

(dockerhost)$ sudo docker run -it --rm kibana --version
5.0.1

We are using version 5.0.1 currently. If you want to make sure that you use the exact same version as I have used in this blog, you can use the imagename kibana:5.0.1 in all docker commands instead of kibana only.

Step 3: Start Elasticsearch

Kibana relies on the data stored and analyzed in Elasticsearch, so let us start that one first. Like in the Elasticsearch blog post, we run Elasticsearch interactively:

(dockerhost)$ sudo docker run -it --rm --name elasticsearch -p9200:9200 -p9300:9300 --entrypoint bash elasticsearch
(elasticsearchcontainer)# /docker-entrypoint.sh elasticsearch

After successful start, Elasticsearch is waiting for data.

Step 4: Start Logstash and use Twitter as Data Source

For this demonstration, it is good to have a lot of data we can analyze. Why not using Twitter as data source and look for tweets about Obama, Trump or Clinton? For that, let us create a file logstash_twitter.conf on the Docker host in the directory we will start the Logstash container from:

# logstash_twitter.conf
input {
  twitter {
    consumer_key => "consumer_key"
    consumer_secret => "consumer_secret"
    oauth_token => "oauth_token"
    oauth_token_secret => "oauth_token_secret"
    keywords => [ "Obama", "Trump", "Clinton" ]
    full_tweet => true
  }
}

output {
 stdout { codec => dots }
 elasticsearch {
 action => "index"
 index => "twitter"
 hosts => "elasticsearch"
 document_type => "tweet"
 template => "/app/twitter_template.json"
 template_name => "twitter"
 workers => 1
 }
}

But how do you find your personal consumer_key, etc? For that, you need a Twitter account, log in and create a new app on https://apps.twitter.com/.

Note: this works only, if you have registered your mobile phone with the Twitter account on Profile -> Settings -> Mobile Phone. The Website must have a valid URL format, even if you add a dummy address there.

2016-11-19-20_02_58-create-an-application-_-twitter-application-management

The consumer key and secret can be accessed on the “Keys and Access Tokens” tab of the page you are redirected to. We do not want to send any tweets, so we can set the Access Level to “Read only”. Then, on the “Keys and Access Tokens” tab again, create an access token by clicking the button at the bottom of the page. Then copy and paste the keys to the configuration file logstash_twitter.conf we have created above.

Now we need to download the template file twitter_template.json that logstash_twitter.conf is referring to (found here on the elasticstack/examples GIT repository of  this blog post and on this GIT repository)

(dockerhost)$ curl -JO https://raw.githubusercontent.com/elastic/examples/master/ElasticStack_twitter/twitter_template.json

The content of the file is:

{
  "template": "twitter_elastic_example",
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "_default_": {
      "_all": {
        "enabled": true
      },
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "dateOptionalTime"
        },
        "text": {
          "type": "text"
        },
        "user": {
          "type": "object",
          "properties": {
            "description": {
              "type": "text"
            }
          }
        },
        "coordinates": {
          "type": "object",
          "properties": {
            "coordinates": {
              "type": "geo_point"
            }
          }
        },
        "entities": {
          "type": "object",
          "properties": {
            "hashtags": {
              "type": "object",
              "properties": {
                "text": {
                  "type": "text",
                  "fielddata": true
                }
              }
            }
          }
        },
        "retweeted_status": {
          "type": "object",
          "properties": {
            "text": {
              "type": "text"
            }
          }
        }
      },
      "dynamic_templates": [
        {
          "string_template": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
            }
          }
        }
      ]
    }
  }
}

With that, we are ready to start a Logstash Docker container with a link to the Elasticsearch container and start the Logstash process using of the configuration file we have created above:

(dockerhost)$ sudo docker run -it --rm --name logstash --link elasticsearch -v "$PWD":/app --entrypoint bash logstash
(logstash-container)# logstash -f /app/logstash_twitter.conf

In my case, I get many warnings in the Logstash terminal like

...............................20:05:52.220 [[main]>worker0] WARN  logstash.outputs.elasticsearch - Failed action. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"twitter", :_type=>"tweet", :_routing=>nil}, 2016-11-19T20:05:51.000Z %{host} %{message}], :response=>{"index"=>{"_index"=>"twitter", "_type"=>"tweet", "_id"=>"AVh-MfEl713eVTPkwAuA", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Limit of total fields [1000] in index [twitter] has been exceeded"}}}}

and in the Elasticsearch terminal like

[2016-11-19T20:49:43,874][WARN ][o.e.d.i.m.StringFieldMapper$TypeParser] The [string] field is deprecated, please use [text] or [keyword] instead on [id_str]
[2016-11-19T20:49:43,874][WARN ][o.e.d.i.m.StringFieldMapper$TypeParser] The [string] field is deprecated, please use [text] or [keyword] instead on [raw]

However, the many dots in the Logstash terminal show that a high number of tweets is recorded continuously. Let us ignore the warnings for now (they appear also, if logstash is started without any template, so it looks like bugs in the current version of the Elastic stack) and let us check the number of recorded tweeds from the Docker host:

(dockerhost)$ curl -XGET localhost:9200/twitter/_count
{"count":115,"_shards":{"total":1,"successful":1,"failed":0}}
(dockerhost)$ curl -XGET localhost:9200/twitter/_count
{"count":253,"_shards":{"total":1,"successful":1,"failed":0}}

The number of tweets is rising quickly. While performing the next steps, we keep Logstash and Elasticsearch running, so we have a good amount of data entries to work with.

Step 5: Run Kibana in interactive Terminal Mode

In this step, we will run Kibana interactively (with -it switch instead of -d switch) to better see, what is happening (in the Elasticsearch blog post, I had some memory issues, which cannot be seen easily in detached mode).

Similar to Logstash, we start Kibana with a link to the Elasticsearch container:

(dockerhost)$ sudo docker run -it --rm --name kibana -p5601:5601 --link elasticsearch --entrypoint bash kibana

We have found out by analyzing the Kibana image via the online imagelayer tool, that the default command is to run /docker-entrypoint.sh kibana. Let us do that now:

root@f13588d10379:/# /docker-entrypoint.sh kibana
[WARN  tini (5)] Tini is not running as PID 1 and isn't registered as a child subreaper.
        Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
        To fix the problem, use -s or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.
  log   [16:28:02.791] [info][status][plugin:kibana@5.0.1] Status changed from uninitialized to green - Ready
  log   [16:28:02.842] [info][status][plugin:elasticsearch@5.0.1] Status changed from uninitialized to yellow - Waiting for Elasticsearch
  log   [16:28:02.867] [info][status][plugin:console@5.0.1] Status changed from uninitialized to green - Ready
  log   [16:28:03.074] [info][status][plugin:timelion@5.0.1] Status changed from uninitialized to green - Ready
  log   [16:28:03.080] [info][listening] Server running at http://0.0.0.0:5601
  log   [16:28:03.085] [info][status][ui settings] Status changed from uninitialized to yellow - Elasticsearch plugin is yellow
  log   [16:28:08.118] [info][status][plugin:elasticsearch@5.0.1] Status changed from yellow to yellow - No existing Kibana index found
  log   [16:28:08.269] [info][status][plugin:elasticsearch@5.0.1] Status changed from yellow to green - Kibana index ready
  log   [16:28:08.270] [info][status][ui settings] Status changed from yellow to green - Ready

If you see errors at this point, refer to Appendix C.

Step 6: Open Kibana in a Browser

Now we want to connect to the Kibana portal. For that, open a browser and open the URL

<your_kibana_host>:5601

In our case, Kibana is running in a container and we have mapped the container-port 5601 to the local port 5601 of the Docker host. On the Docker host, we can open the URL.

localhost:5601

Note: In case of Vagrant with VirtualBox, per default, there is only a NAT-based interface and you need to create port-forwarding for any port you want to reach from outside (also the local machine you are working on is to be considered as outside). In this case, we need to add an entry in the port forwarding list of VirtualBox:

2016-11-19-18_45_38-regel-fur-port-weiterleitung

The Kibana dashboard will open:

2016-11-19-18_34_15-kibana

We change the index name pattern logstash-* by twitter and press Create.

After clicking Discover in the left pane, Kibana displays a time/date histogram of the total tweet count recorded, the fields received, and a list of the tweets:

2016-11-19-21_56_10-kibana

Now let us compare the popularity of Obama vs. Trump. On the Docker host, we can test a query like follows:

(dockerhost)$ curl -XGET localhost:9200/twitter/tweet/_count?q=text:Obama
{"count":2046,"_shards":{"total":1,"successful":1,"failed":0}}
(dockerhost)$ curl -XGET localhost:9200/twitter/tweet/_count?q=text:Trump
{"count":9357,"_shards":{"total":1,"successful":1,"failed":0}}
(dockerhost)$ curl -XGET localhost:9200/twitter/tweet/_count?q=text:Clinton
{"count":747,"_shards":{"total":1,"successful":1,"failed":0}}

Okay, we already can see, who the winner of this little Twitter contest is: Trump. Let us analyze the data a little bit more in detail. For that, we can place the tested query into the query field:

2016-11-20-08_33_44-trump-or-obama-kibana

All matching entries are listed, and the matching strings are highlighted. Let us press Save and give it the name “Trump OR Obama OR Clinton”.

Step 6: Create a Pie Chart

Now let us visualize the data. Press on Visualize link in the left pane, choose pie chart

Pie Chart Icon

and choose the “Trump OR Obama OR Clinton” query from the Saved Searches on the right pane. We are shown a very simple pie chart:

2016-11-20-08_48_38-kibana

This is not so interesting yet. Let us now click Split Slices, choose the Filter Aggregation and add the query text:Trump. Then Add Filter and text:Obama. The same for Clinton. After that press the white on blue triangle Apply Button to apply the changes. That looks better now.

2016-11-20-08_55_13-kibana

Let us save this as “Trump vs. Obama vs. Clinton Pie Chart”.

Step 7: Create a Time Chart

Now we want to visualize, how the popularity of the three politicians change over time. For a single query, this can be done with a Line Chart and using Visualize -> Line Chart -> choose the query -> and choose the X-Axis Aggregation Date Histogram. However, this is not, what we want to achieve:

2016-11-20-09_01_29-kibana

We would like to display all three queries in a single graph, and this requires the usage of the Timelion plugin.

So, click on Timelion link on the left pane, then add the query .es('text:Trump'), .es('text:Obama'), .es(text:Clinton) on the top. This will create the chart we were looking for:

2016-11-20-09_05_28-timelion-kibana

Let us save this as a Kibana dashboard panel with the name “Trump vs. Obama vs. Clinton Time Chart”, so we can use it in the next step.

Step 8: Define a Dashboard

We now will create a dashboard. Click on Dashboard on the left pane and click Add in the upper menu. Click on Trump vs. Obama vs Clinton Time Chart and then on Trump vs. Obama vs Clinton Pie Chart.

2016-11-20-09_19_46-kibana

Clicking the white on black ^ icon will give you more space.

Resize the charts from the corner and move them, so they are aligned. We now see that the colors do not match yet:

2016-11-20-09_21_33-kibana

However, the colors of the Pie chart can easily be changed by clicking on the legends (it is not so easy on the Time chart, though).

2016-11-20-09_27_03-kibana

Even if the colors are not 100% the same, we are coming closer:

2016-11-20-09_31_25-kibana

Let us Save this as “Popularity of US Politicians” dashboard.

About 10 hours later, we can see the rise of tweets in the US, which is ~5 to 10 hours behind Germany’s time zone:

2016-11-20-18_24_06-popularity-of-us-politicians-kibana

Summary

With this Hello World or Tutorial, we have shown

  • how we can use Logstash to collect Twitter data,
  • save it on Elasticsearch and
  • use Kibana to visualize the Elasticsearch search queries.

We can see the rise and fall for the number of tweeds during a day and we also can see that the number of Trump tweets is outpacing those of Obama tweets and Clinton tweets by far.

DONE!

P.S.: the colors of the timelion graphs can also be changed easily by adding a .color(...) directive after the .es(...) . If we want to have Trump in red, Obama n blue and Clinton in green, we set:

 

.es('text:Trump').color('red'), .es('text:Obama').color('blue'), .es('text:Clinton').color('green')

The resulting graph is:

2016-11-23-16_14_35-timelion-kibana

We also can use hex color codes like .color('#ff0000') instead of color names.

Appendix A: Error: Cannot allocate memory

This error has been seen by running Elasticsearch as a Docker container on a Docker host with only 250 MB RAM left (as seen with top).

(dockerhost) $
$ sudo docker run -it --rm elasticsearch --version
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x000000008a660000, 1973026816, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 1973026816 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log

Resolution:

A temporary resolution is to

  1. shut down the Vagrant Docker host via
vagrant halt

2. Open the Virtualbox console

3. Increase the memory by ~500 MB (right-click the VM on the left pane of the Virtualbox console -> change -> system -> increase memory)

4. Start Vagrant Docker host via

vagrant up

A permanent solution is to

  1. increase the value of vb.memory in the Vagrantfile line, e.g.
vb.memory = "1536"

by

vb.memory = "4096"

With that, next time a Virtualbox VM is created by Vagrant, the new value will be used. Also I have seen that the reboot has freed up quite some resources…

Appendix B: vm.max_map_count too low

The Elasticsearch application requires a minimum vm.max_map_count of 262144. See the official documentation for details. If this minimum requirement is not met, we see following log during startup of Elasticsearch:

$ sudo docker run -it --rm --name elasticsearch -p9200:9200 -p9300:9300 elasticsearch
[2016-11-18T13:29:35,124][INFO ][o.e.n.Node ] [] initializing ...
[2016-11-18T13:29:35,258][INFO ][o.e.e.NodeEnvironment ] [SfJmZdJ] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/dm-0)]], net usable_space [32.3gb], net total_space [38.2gb], spins? [possibly], types [ext4]
[2016-11-18T13:29:35,258][INFO ][o.e.e.NodeEnvironment ] [SfJmZdJ] heap size [1.9gb], compressed ordinary object pointers [true]
[2016-11-18T13:29:35,261][INFO ][o.e.n.Node ] [SfJmZdJ] node name [SfJmZdJ] derived from node ID; set [node.name] to override
[2016-11-18T13:29:35,267][INFO ][o.e.n.Node ] [SfJmZdJ] version[5.0.1], pid[1], build[080bb47/2016-11-11T22:08:49.812Z], OS[Linux/4.2.0-42-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_111/25.111-b14]
[2016-11-18T13:29:37,449][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [aggs-matrix-stats]
[2016-11-18T13:29:37,450][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [ingest-common]
[2016-11-18T13:29:37,451][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [lang-expression]
[2016-11-18T13:29:37,452][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [lang-groovy]
[2016-11-18T13:29:37,452][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [lang-mustache]
[2016-11-18T13:29:37,453][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [lang-painless]
[2016-11-18T13:29:37,455][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [percolator]
[2016-11-18T13:29:37,455][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [reindex]
[2016-11-18T13:29:37,456][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [transport-netty3]
[2016-11-18T13:29:37,456][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [transport-netty4]
[2016-11-18T13:29:37,457][INFO ][o.e.p.PluginsService ] [SfJmZdJ] no plugins loaded
[2016-11-18T13:29:37,807][WARN ][o.e.d.s.g.GroovyScriptEngineService] [groovy] scripts are deprecated, use [painless] scripts instead
[2016-11-18T13:29:43,310][INFO ][o.e.n.Node ] [SfJmZdJ] initialized
[2016-11-18T13:29:43,310][INFO ][o.e.n.Node ] [SfJmZdJ] starting ...
[2016-11-18T13:29:43,716][INFO ][o.e.t.TransportService ] [SfJmZdJ] publish_address {172.17.0.3:9300}, bound_addresses {[::]:9300}
[2016-11-18T13:29:43,725][INFO ][o.e.b.BootstrapCheck ] [SfJmZdJ] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
ERROR: bootstrap checks failed
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2016-11-18T13:29:43,741][INFO ][o.e.n.Node ] [SfJmZdJ] stopping ...
[2016-11-18T13:29:43,763][INFO ][o.e.n.Node ] [SfJmZdJ] stopped
[2016-11-18T13:29:43,764][INFO ][o.e.n.Node ] [SfJmZdJ] closing ...
[2016-11-18T13:29:43,791][INFO ][o.e.n.Node ] [SfJmZdJ] closed

Resolution:

Temporary solution:
(dockerhost) $ sudo sysctl -w vm.max_map_count=262144

and reboot the system.

Permanent solution on LINUX hosts:

Update the vm.max_map_count setting to 262144 or more in /etc/sysctl.conf. To verify after rebooting, run sysctl vm.max_map_count.

Permanent solution for future Vagrant-created LINUX hosts:

In case we use Vagrant to create Linux VMs, we also need to make sure the next VM is created with the correct vm.max_map_count setting. For that, we can run a startup.sh file like described here:

In the Vagrantfile we set:

config.vm.provision :file, :source => "elasticsearchpreparation.sh", :destination => "/tmp/elasticsearchpreparation.sh"  
config.vm.provision :shell, :inline => "sudo sed -i 's/\r//g' /tmp/elasticsearchpreparation.sh && chmod +x /tmp/elasticsearchpreparation.sh && /tmp/elasticsearchpreparation.sh", :privileged => true

with the file elasticsearchpreparation.sh:

#!/usr/bin/env bash
# file: elasticsearchpreparation.sh
sudo sysctl -w vm.max_map_count=262144
ulimit -n 65536

The sed and chmod commands make sense on Windows hosts in order to make sure the file has UNIX format and and has the required rights. Also here, make sure to run sysctl vm.max_map_count in order to check, that the configuration is active (might require a reboot).

Appendix C: Typical Kibana Startup Logs

Successful Log

Here we see a successful startup log:

root@f13588d10379:/# /docker-entrypoint.sh kibana
[WARN  tini (5)] Tini is not running as PID 1 and isn't registered as a child subreaper.
        Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
        To fix the problem, use -s or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.
  log   [16:28:02.791] [info][status][plugin:kibana@5.0.1] Status changed from uninitialized to green - Ready
  log   [16:28:02.842] [info][status][plugin:elasticsearch@5.0.1] Status changed from uninitialized to yellow - Waiting for Elasticsearch
  log   [16:28:02.867] [info][status][plugin:console@5.0.1] Status changed from uninitialized to green - Ready
  log   [16:28:03.074] [info][status][plugin:timelion@5.0.1] Status changed from uninitialized to green - Ready
  log   [16:28:03.080] [info][listening] Server running at http://0.0.0.0:5601
  log   [16:28:03.085] [info][status][ui settings] Status changed from uninitialized to yellow - Elasticsearch plugin is yellow
  log   [16:28:08.118] [info][status][plugin:elasticsearch@5.0.1] Status changed from yellow to yellow - No existing Kibana index found
  log   [16:28:08.269] [info][status][plugin:elasticsearch@5.0.1] Status changed from yellow to green - Kibana index ready
  log   [16:28:08.270] [info][status][ui settings] Status changed from yellow to green - Ready

At this point the system has connected successfully to Elasticsearch, as can be seen in the last three log lines above.

Logs if Elasticsearch is not reachable

If Kibana cannot connect to Elasticsearch on the IP layer (e.g. because the Docker container link is missing) the last four lines of the successful log are replaced by:

  log   [16:45:51.597] [info][status][ui settings] Status changed from uninitialized to yellow - Elasticsearch plugin is yellow
  log   [16:45:54.407] [error][status][plugin:elasticsearch@5.0.1] Status changed from yellow to red - Request Timeout after 3000ms
  log   [16:45:54.410] [error][status][ui settings] Status changed from yellow to red - Elasticsearch plugin is red
...

To correct the issue, make sure that the Elasticsearch server (or container) is reachable from the Kibana server (or container).

Logs if Elasticsearch is reachable but not started (TCP RST)

If it can reach the Elasticsearch server, but the Elasticsearch process has not been started, the error messages appear even earlier in the log:

(kibanacontainer)# /docker-entrypoint.sh kibana
[WARN  tini (8)] Tini is not running as PID 1 and isn't registered as a child subreaper.
        Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
        To fix the problem, use -s or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.
  log   [17:06:57.714] [info][status][plugin:kibana@5.0.1] Status changed from uninitialized to green - Ready
  log   [17:06:57.763] [info][status][plugin:elasticsearch@5.0.1] Status changed from uninitialized to yellow - Waiting for Elasticsearch
  log   [17:06:57.780] [error][elasticsearch] Request error, retrying
HEAD http://elasticsearch:9200/ => connect ECONNREFUSED 172.17.0.3:9200
  log   [17:06:57.794] [warning][elasticsearch] Unable to revive connection: http://elasticsearch:9200/
  log   [17:06:57.795] [warning][elasticsearch] No living connections
  log   [17:06:57.798] [error][status][plugin:elasticsearch@5.0.1] Status changed from yellow to red - Unable to connect to Elasticsearch at http://elasticsearch:9200.
  log   [17:06:57.800] [info][status][plugin:console@5.0.1] Status changed from uninitialized to green - Ready
  log   [17:06:57.981] [info][status][plugin:timelion@5.0.1] Status changed from uninitialized to green - Ready
  log   [17:06:57.989] [info][listening] Server running at http://0.0.0.0:5601
  log   [17:06:57.992] [error][status][ui settings] Status changed from uninitialized to red - Elasticsearch plugin is red
  log   [17:07:00.309] [warning][elasticsearch] Unable to revive connection: http://elasticsearch:9200/
  log   [17:07:00.314] [warning][elasticsearch] No living connections

To correct the issue, make sure that the Elasticsearch server (or container) is reachable from the Kibana server (or container), that the Elasticsearch process is started and the port is reachable from outside. This may involve to map TCP ports from inside networks to outside networks. In the example of this blog post, the container port is mapped with the docker run -p9200:9200 switch from the container to the Docker host, and then the Docker host port is mapped via Virtualbox forwarding from the Docker host VM to the local machine.

Summary

In this blog post we have performed following tasks:

  1. attach Logstash to the Twitter API for retrieval of all tweeds with the Keywords “Obama” or “Trump” or “Clinton”
  2. feed Logstash’s data into Elasticsearch
  3. attach Kibana to Elasticsearch and visualize the statistics on how often the text pattern “Obama”, “Trump” and “Clinton” is found in the recorded tweets.  The total number is shown in a Pie Chart and the date/time histogram is shown in a Line Chart with more than one search term in a single chart. The latter can be done by usage of the Timelion plugin.

In order to avoid any compatibility issues with the java version on the host, we have run Kibana, Elasticsearch and Logstash in Docker containers. In order to better see what happens under the hood, we have chosen Docker containers in interactive terminal mode. In the course of the Elasticsearch “Hello World” in the last blog post, we had hit two Memory resource issues: too low memory and too low number of mapped memory areas. Those issues and their workarounds/solutions are described in Appendix A and B here and in the last blog post.

References

 

2

Elasticsearch “Hello World” Example – Part 2 of the ELK Stack Series



elasticsearch_logo

In the last blog post, we have explored Logstash, a tool for collecting and transform log data from many different input sources. Today, we will explore Elasticsearch, a scheme-less noSQL database with a versatile (“elastic”) search engine. We will perform a little Elasticsearch “Hello World” by running Elasticsearch in a Docker container and manipulating database entries. After that we will  use Logstash as a data source for populating the Elasticsearch database. This configuration is often seen in a typical log processing pipeline.

This is the second blog post of a series about the Elastic Stack (a.k.a. ELK stack):

What is Elasticsearch?

Elasticsearch is a highly scalable, distributed, scheme-less noSQL database with a versatile (“elastic”) search engine. It is based on an open source project created by Elastic. In this performance comparison it has been shown that Elasticsearch performs well even for millions of documents.

Elasticsearch is often used in the so-called ELK pipeline for log file collection, analysis and visualization:

  • Elasticsearch is for searching, analyzing, and storing your data
  • Logstash (and Beats) is for collecting and transforming data, from any source, in any format
  • Kibana is a portal for visualizing the data and to navigate within the elastic stack

 

2016-11-17-18_31_39

 

Target

In this post, we will perform a little Elasticsearch “Hello World” by running Elasticsearch in a Docker container and create, read, search and delete our first database entries. This is done by sending simple HTTP messages towards the RESTful API of Elasticsearch:

 

2016-11-18-21_15_08

As a second step, we will attach Logstash as a data source for Elasticsearch in order to move one step closer towards the ELK pipeline shown above:

2016-11-18-20_21_44

Tools used

  • Vagrant 1.8.6
  • Virtualbox 5.0.20
  • Docker 1.12.1
  • Logstash 5.0.1
  • Elasticsearch 5.0.1

Prerequisites:

  • Free Memory >= 3 GB for the Elasticsearch step and >= 4 GB for the Logstash + Elasticsearch pipeline (see Appendix A).
  • The max virtual memory areas vm.max_map_count must be at least 262144, see this note on the official documentation.
    See also Appendix B below, how to set the value on Linux temporarily, permanently and also for the next Vagrant-created Linux VM.

Step 1: Install a Docker Host via Vagrant and Connect to the Host via SSH

We will run Elasticsearch and Logstash in Docker containers in order to allow for maximum interoperability. This way, we always can use the latest Elasticsearch and Logstash versions without the need to control the java version used: e.g. Logstash v 1.4.x works with java 7, while version 5.0.x works with java 8 only, currently.

If you are new to Docker, you might want to read this blog post.

Installing Docker on Windows and Mac can be a real challenge, but no worries: we will show an easy way here, that is much quicker than the one described in Docker’s official documentation:

Prerequisites of this step:

  • I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy. However, if you cannot get rid of your HTTP proxy, read this blog post.
  • Administration rights on you computer.

Steps to install a Docker Host VirtualBox VM:

Download and install Virtualbox (if the installation fails with error message “<to be completed> see Appendix A of this blog post: Virtualbox Installation Workaround below)

1. Download and Install Vagrant (requires a reboot)

2. Download Vagrant Box containing an Ubuntu-based Docker Host and create a VirtualBox VM like follows:

basesystem# mkdir ubuntu-trusty64-docker ; cd ubuntu-trusty64-docker
basesystem# vagrant init williamyeh/ubuntu-trusty64-docker
basesystem# vagrant up
basesystem# vagrant ssh

Now you are logged into the Docker host and we are ready for the next step: to create the Ansible Docker image.

Note: I have experienced problems with the vi editor when running vagrant ssh in a Windows terminal. In case of Windows, consider to follow Appendix C of this blog post and to use putty instead.

Step 2 (optional): Download Elasticsearch Image

This extra download step is optional, since the Elasticsearch Docker image will be downloaded automatically in step 3, if it is not already found on the system:

(dockerhost)$ docker pull elasticsearch
Using default tag: latest
latest: Pulling from library/elasticsearch

386a066cd84a: Already exists
75ea84187083: Already exists
3e2e387eb26a: Already exists
eef540699244: Already exists
1624a2f8d114: Already exists
7018f4ec6e0a: Already exists
6ca3bc2ad3b3: Already exists
424638b495a6: Pull complete
2ff72d0b7bea: Pull complete
d0d6a2049bf2: Pull complete
51dc322097cb: Pull complete
5d6cdd5ecea8: Pull complete
51cdecfd285e: Pull complete
29a05afcfde6: Pull complete
Digest: sha256:c7eaa97e9b898b65f8f8588ade1c9c6187420b8ce6efb7d3300d9213cd5cb0dc
Status: Downloaded newer image for elasticsearch:latest

The version of the downloaded Elasticsearch image can be checked with following command:

(dockerhost)$ sudo docker run -it --rm elasticsearch --version
Version: 5.0.1, Build: 080bb47/2016-11-11T22:08:49.812Z, JVM: 1.8.0_111

We are using version 5.0.1 currently. If you want to make sure that you use the exact same version as I have used in this blog, you can use the imagename elasticsearch:5.0.1 in all docker commands instead of elasticsearch only.

Step 3: Run Elasticsearch in interactive Terminal Mode

In this step, we will run Elasticsearch interactively (with -it switch instead of -d switch) to better see, what is happening (I had some memory issues, see Appendix A and B, which cannot be seen easily in detached mode):

(dockerhost)$ sudo docker run -it --rm --name elasticsearch -p9200:9200 -p9300:9300 --entrypoint bash elasticsearch

We have found out by analyzing the Elasticsearch image via the online imagelayer tool, that the default command is to run /docker-entrypoint.sh elasticsearch. Let us do that now. The output should look something like follows:

root@8e7170639d98:/usr/share/elasticsearch# /docker-entrypoint.sh elasticsearch
[2016-11-18T14:34:36,149][INFO ][o.e.n.Node               ] [] initializing ...
[2016-11-18T14:34:36,395][INFO ][o.e.e.NodeEnvironment    ] [iqF8643] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/dm-0)]], net usable_space [32.3gb], net total_space [38.2gb], spins? [possibly], types [ext4]
[2016-11-18T14:34:36,396][INFO ][o.e.e.NodeEnvironment    ] [iqF8643] heap size [1.9gb], compressed ordinary object pointers [true]
[2016-11-18T14:34:36,398][INFO ][o.e.n.Node               ] [iqF8643] node name [iqF8643] derived from node ID; set [node.name] to override
[2016-11-18T14:34:36,403][INFO ][o.e.n.Node               ] [iqF8643] version[5.0.1], pid[41], build[080bb47/2016-11-11T22:08:49.812Z], OS[Linux/4.2.0-42-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_111/25.111-b14]
[2016-11-18T14:34:38,606][INFO ][o.e.p.PluginsService     ] [iqF8643] loaded module [aggs-matrix-stats]
[2016-11-18T14:34:38,607][INFO ][o.e.p.PluginsService     ] [iqF8643] loaded module [ingest-common]
[2016-11-18T14:34:38,607][INFO ][o.e.p.PluginsService     ] [iqF8643] loaded module [lang-expression]
[2016-11-18T14:34:38,607][INFO ][o.e.p.PluginsService     ] [iqF8643] loaded module [lang-groovy]
[2016-11-18T14:34:38,607][INFO ][o.e.p.PluginsService     ] [iqF8643] loaded module [lang-mustache]
[2016-11-18T14:34:38,608][INFO ][o.e.p.PluginsService     ] [iqF8643] loaded module [lang-painless]
[2016-11-18T14:34:38,608][INFO ][o.e.p.PluginsService     ] [iqF8643] loaded module [percolator]
[2016-11-18T14:34:38,608][INFO ][o.e.p.PluginsService     ] [iqF8643] loaded module [reindex]
[2016-11-18T14:34:38,608][INFO ][o.e.p.PluginsService     ] [iqF8643] loaded module [transport-netty3]
[2016-11-18T14:34:38,609][INFO ][o.e.p.PluginsService     ] [iqF8643] loaded module [transport-netty4]
[2016-11-18T14:34:38,610][INFO ][o.e.p.PluginsService     ] [iqF8643] no plugins loaded
[2016-11-18T14:34:39,104][WARN ][o.e.d.s.g.GroovyScriptEngineService] [groovy] scripts are deprecated, use [painless] scripts instead
[2016-11-18T14:34:42,833][INFO ][o.e.n.Node               ] [iqF8643] initialized
[2016-11-18T14:34:42,833][INFO ][o.e.n.Node               ] [iqF8643] starting ...
[2016-11-18T14:34:43,034][INFO ][o.e.t.TransportService   ] [iqF8643] publish_address {172.17.0.2:9300}, bound_addresses {[::]:9300}
[2016-11-18T14:34:43,040][INFO ][o.e.b.BootstrapCheck     ] [iqF8643] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2016-11-18T14:34:43,839][INFO ][o.e.m.j.JvmGcMonitorService] [iqF8643] [gc][1] overhead, spent [434ms] collecting in the last [1s]
[2016-11-18T14:34:46,211][INFO ][o.e.c.s.ClusterService   ] [iqF8643] new_master {iqF8643}{iqF86430QRmm70Y5fDzVQw}{KsVmKueNQL6UBOMpiMsa5w}{172.17.0.2}{172.17.0.2:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2016-11-18T14:34:46,263][INFO ][o.e.h.HttpServer         ] [iqF8643] publish_address {172.17.0.2:9200}, bound_addresses {[::]:9200}
[2016-11-18T14:34:46,265][INFO ][o.e.n.Node               ] [iqF8643] started
[2016-11-18T14:34:46,276][INFO ][o.e.g.GatewayService     ] [iqF8643] recovered [0] indices into cluster_state

At this point the system is waiting for input on port 9200.

Step 4: Create sample Data

With the -p9200:9200 docker run option in the previous step, we have mapped the Docker container port 9200 to the Docker host port 9200. We now can send API calls to the Docker host’s port 9200.

Let us open a new terminal on the Docker host and type:

(dockerhost)$ curl -XPOST localhost:9200/twitter/tweed/1 -d '
{
"user": "oveits",
"message": "this is my first elasticsearch message",
"postDate": "2016-11-18T15:55:00"
}'

This will return a result like

{"_index":"twitter","_type":"tweed","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}

On the Elasticsearch terminal we see that a new index has been created with name “twitter” a new mapping has been created with name “tweed”:

[2016-11-18T14:56:46,777][INFO ][o.e.c.m.MetaDataCreateIndexService] [iqF8643] [twitter] creating index, cause [auto(index api)], templates [], shards [5]/[1], mappings []
[2016-11-18T15:01:01,361][INFO ][o.e.c.m.MetaDataMappingService] [iqF8643] [twitter/p9whAy1-TeSVZbUbz-3VVQ] create_mapping [tweed]

Step 5: Read Data from the Database

We can read the data with a HTTP GET command:

curl -XGET localhost:9200/twitter/tweed/1

This will return

{"_index":"twitter","_type":"tweed","_id":"1","_version":1,"found":true,"_source":
{
"user": "oveits",
"message": "this is my first elasticsearch message",
"postDate": "2016-11-18T15:55:00"
}}

Let us send a second tweed a little bit later (postDate: 16:11 instead of 15:55):

(dockerhost)$ curl -XPOST localhost:9200/twitter/tweed/2 -d '
{
"user": "oveits",
"message": "this is my second message",
"postDate": "2016-11-18T16:11:00"
}'
curl -XPUT localhost:9200/twitter/tweed/2 -d '
{
"user": "oveits",
"message": "this is my second message",
"postDate": "2016-11-18T16:11:00"
}'

Step 6: Search Data based on Content

Now we will test some search capabilities of Elasticsearch. Let us search for all entries with a message that contains the string “elasticsearch”:

Step 6.1: Search String in Message

curl -XGET localhost:9200/twitter/_search?q=message:elasticsearch

This will return our first message only, since it contains the “elasticsearch” string:

{"took":58,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.25316024,"hits":[{"_index":"twitter","_type":"tweed","_id":"1","_score":0.25316024,"_source":
{
"user": "oveits",
"message": "this is my first elasticsearch message",
"postDate": "2016-11-18T15:55:00"
}}]}}

Note that the answer contains a _source field with the full text of the data.

Step 6.2: Search String in any Field

We also can search for any field, if we remove the field name message: from the query, e.g.

$ curl -XGET localhost:9200/twitter/_search?q=2016
{"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.25316024,"hits":[{"_index":"twitter","_type":"tweed","_id":"1","_score":0.25316024,"_source":
{
"user": "oveits",
"message": "this is my first elasticsearch message",
"postDate": "2016-11-18T15:55:00"
}},{"_index":"twitter","_type":"tweed","_id":"2","_score":0.24257512,"_source":
{
"user": "oveits",
"message": "this is my second message",
"postDate": "2016-11-18T16:11:00"

The query has found both entries, since they both contain the string “2016” in one of the fields.

Step 6.3: Search for Entries within a Time Range

We also can filter database entries based on a time range. The command

$ curl -XGET localhost:9200/twitter/_search? -d '
{ "query": { "range": { "postDate": { "from": "2016-11-18T15:00:00", "to": "2016-11-18T17:00:00" } } } }'

returns both entries while

$ curl -XGET localhost:9200/twitter/_search? -d '
{ "query": { "range": { "postDate": { "from": "2016-11-18T15:00:00", "to": "2016-11-18T16:00:00" } } } }'

returns the first entry only:

{ "query": { "range": { "postDate": { "from": "2016-11-18T15:00:00", "to": "2016-11-18T16:00:00" } } } }'
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"twitter","_type":"tweed","_id":"1","_score":1.0,"_source":
{
"user": "oveits",
"message": "this is my first elasticsearch message",
"postDate": "2016-11-18T15:55:00"
}}]}}

Step 7: Logstash als Input Source to Elasticsearch

Our final step for this Hello World post is to use Logstash as the data source for Elasticsearch. The target pipeline of this step is:

2016-11-18-20_21_44

It does not really make a difference, but for simplicity of this demonstration, we will replace the input file by a command line STDIN input. We already have shown in the Logstash blog post that both input sources create the same results. This helps us reduce the number of needed terminals:  we can use the Logstash terminal to add the data and there is no need to open a separate terminal for manipulation of the input file.

2016-11-18-20_27_19

Note: For this step, make sure to have at least 500 MB memory left on your (Docker) host after starting Elasticsearch, e.g. by checking with top. In my tests, I have created a Docker host VM with a total memory of 4 GB. I have seen Elasticsearch to occupy up to 2.9 GB, while Logstash may need another 0.5 GB.

On the Docker host, we create a configuration file logstash_to_elasticsearch.conf like follows:

#logstash_to_elasticsearch.conf
input {
  stdin { }
}

output {
  elasticsearch {
    action => "index"
    index => "logstash"
    hosts => "10.0.2.15"
    workers => 1
  }
  stdout { }
}

Here 10.0.2.15 ist the IP address of the Docker host (interface docker0). We have used STDIN and STDOUT for simplicity. This way, we can just type the input data into the Logstash terminal, similar to yesterday’s Logstash blog post like  follows:

(dockerhost)$ sudo docker run -it --rm --name logstash -v "$PWD":/app --entrypoint bash logstash

And within the container we start Logstash with this configuration file:

(container)# logstash -f /app/logstash_to_elasticsearch.conf
...
18:43:58.751 [[main]-pipeline-manager] INFO  logstash.outputs.elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>["http://10.0.2.15:9200"]}}

In a second terminal on the Docker host, we clean the Elasticsearch database and verify that the database is empty by checking that the total number of entries is 0:

(dockerhost)$ curl -XDELETE 'http://localhost:9200/_all'
{"acknowledged":true}
(dockerhost)$ curl -XGET localhost:9200/_search
{"took":1,"timed_out":false,"_shards":{"total":0,"successful":0,"failed":0},"hits":{"total":0,"max_score":0.0,"hits":[]}}

Caution: this will delete all data in the database!

Now we type into the Logstash terminal:

This is a testlog<Enter>

In the Elasticsearch terminal, we see the log:

[2016-11-18T19:12:15,275][INFO ][o.e.c.m.MetaDataCreateIndexService] [kam5hQi] [logstash] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2016-11-18T19:12:15,422][INFO ][o.e.c.m.MetaDataMappingService] [kam5hQi] [logstash/TbRsmMiFRbuGyP_THANk3w] create_mapping [logs]

And with following command we can review the data Logstash has forwarded the data to Elasticsearch:

(dockerhost)$ curl -XGET localhost:9200/_search
{"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"logstash","_type":"logs","_id":"AVh42oA25J6ZuRKS_qBB","_score":1.0,"_source":{"@timestamp":"2016-11-18T19:12:14.442Z","@version":"1","host":"adf58f139fd3","message":"This is a testlog","tags":[]}}]}}

Perfect! With that we have verified that data is sent from Logstash to Elasticsearch.

thumps_up_3

Appendix A: Error: Cannot allocate memory

This error has been seen by running Elasticsearch as a Docker container on a Docker host with only 250 MB RAM left (as seen with top).

(dockerhost) $
$ sudo docker run -it --rm elasticsearch --version
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x000000008a660000, 1973026816, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 1973026816 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log

Resolution:

A temporary resolution is to

  1. shut down the Vagrant Docker host via
vagrant halt

2. Open the Virtualbox console

3. Increase the memory by ~500 MB (right-click the VM on the left pane of the Virtualbox console -> change -> system -> increase memory)

4. Start Vagrant Docker host via

vagrant up

A permanent solution is to

  1. increase the value of vb.memory in the Vagrantfile line, e.g.
vb.memory = "1536"

by

vb.memory = "4096"

With that, next time a Virtualbox VM is created by Vagrant, the new value will be used. Also I have seen that the reboot has freed up quite some resources…

Appendix B: vm.max_map_count too low

The Elasticsearch application requires a minimum vm.max_map_count of 262144. See the official documentation for details. If this minimum requirement is not met, we see following log during startup of Elasticsearch:

$ sudo docker run -it --rm --name elasticsearch -p9200:9200 -p9300:9300 elasticsearch
[2016-11-18T13:29:35,124][INFO ][o.e.n.Node ] [] initializing ...
[2016-11-18T13:29:35,258][INFO ][o.e.e.NodeEnvironment ] [SfJmZdJ] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/dm-0)]], net usable_space [32.3gb], net total_space [38.2gb], spins? [possibly], types [ext4]
[2016-11-18T13:29:35,258][INFO ][o.e.e.NodeEnvironment ] [SfJmZdJ] heap size [1.9gb], compressed ordinary object pointers [true]
[2016-11-18T13:29:35,261][INFO ][o.e.n.Node ] [SfJmZdJ] node name [SfJmZdJ] derived from node ID; set [node.name] to override
[2016-11-18T13:29:35,267][INFO ][o.e.n.Node ] [SfJmZdJ] version[5.0.1], pid[1], build[080bb47/2016-11-11T22:08:49.812Z], OS[Linux/4.2.0-42-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_111/25.111-b14]
[2016-11-18T13:29:37,449][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [aggs-matrix-stats]
[2016-11-18T13:29:37,450][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [ingest-common]
[2016-11-18T13:29:37,451][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [lang-expression]
[2016-11-18T13:29:37,452][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [lang-groovy]
[2016-11-18T13:29:37,452][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [lang-mustache]
[2016-11-18T13:29:37,453][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [lang-painless]
[2016-11-18T13:29:37,455][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [percolator]
[2016-11-18T13:29:37,455][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [reindex]
[2016-11-18T13:29:37,456][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [transport-netty3]
[2016-11-18T13:29:37,456][INFO ][o.e.p.PluginsService ] [SfJmZdJ] loaded module [transport-netty4]
[2016-11-18T13:29:37,457][INFO ][o.e.p.PluginsService ] [SfJmZdJ] no plugins loaded
[2016-11-18T13:29:37,807][WARN ][o.e.d.s.g.GroovyScriptEngineService] [groovy] scripts are deprecated, use [painless] scripts instead
[2016-11-18T13:29:43,310][INFO ][o.e.n.Node ] [SfJmZdJ] initialized
[2016-11-18T13:29:43,310][INFO ][o.e.n.Node ] [SfJmZdJ] starting ...
[2016-11-18T13:29:43,716][INFO ][o.e.t.TransportService ] [SfJmZdJ] publish_address {172.17.0.3:9300}, bound_addresses {[::]:9300}
[2016-11-18T13:29:43,725][INFO ][o.e.b.BootstrapCheck ] [SfJmZdJ] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
ERROR: bootstrap checks failed
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2016-11-18T13:29:43,741][INFO ][o.e.n.Node ] [SfJmZdJ] stopping ...
[2016-11-18T13:29:43,763][INFO ][o.e.n.Node ] [SfJmZdJ] stopped
[2016-11-18T13:29:43,764][INFO ][o.e.n.Node ] [SfJmZdJ] closing ...
[2016-11-18T13:29:43,791][INFO ][o.e.n.Node ] [SfJmZdJ] closed

Resolution:

Temporary solution:
(dockerhost) $ sudo sysctl -w vm.max_map_count=262144

and reboot the system.

Permanent solution on LINUX hosts:

Update the vm.max_map_count setting to 262144 or more in /etc/sysctl.conf. To verify after rebooting, run sysctl vm.max_map_count.

Permanent solution for future Vagrant-created LINUX hosts:

In case we use Vagrant to create Linux VMs, we also need to make sure the next VM is created with the correct vm.max_map_count setting. For that, we can run a startup.sh file like described here:

In the Vagrantfile we set:

config.vm.provision :file, :source => "elasticsearchpreparation.sh", :destination => "/tmp/elasticsearchpreparation.sh"  
config.vm.provision :shell, :inline => "sudo sed -i 's/\r//g' /tmp/elasticsearchpreparation.sh && chmod +x /tmp/elasticsearchpreparation.sh && /tmp/elasticsearchpreparation.sh", :privileged => true

with the file elasticsearchpreparation.sh:

#!/usr/bin/env bash
# file: elasticsearchpreparation.sh
sudo sysctl -w vm.max_map_count=262144
ulimit -n 65536

The sed and chmod commands make sense on Windows hosts in order to make sure the file has UNIX format and and has the required rights. Also here, make sure to run sysctl vm.max_map_count in order to check, that the configuration is active (might require a reboot).

Summary

In this blog post we have performed following Hello World tasks:

  1. we have fed Elasticsearch with JSON style data using simple CURL commands
  2. we have shown how to read and search data by full text search and by time range
  3. we have shown how Logstash can be used as the data source to feed data into the Elasticsearch database

In order to avoid any compatibility issues with the java version on the host, we have run both, Elasticsearch and Logstash in Docker containers. In order to better see what happens under the hood, we have chosen Docker containers in interactive terminal mode. In the course of the tests, we had hit two Memory resource issues: too low memory and too low number of mapped memory areas. Those issues and their workarounds/solutions are described in Appendix A and B.

References

 

8

Logstash “Hello World” Example – Part 1 of the ELK Stack Series


2016-11-17-17_10_26-https___static-www-elastic-co_assets_bltdf06b3795cdbfb45_elastic-logstash-fw-svg

Today, we will first introduce Logstash, an open source project created by Elastic, before we perform a little Logstash “Hello World”: we will show how to read data from command line or from file, transform the data and send it back to command line or file. In the appendix you will find a note on Logstash CSV input performance and on how to replace the timestamp by a custom timestamp read from the input message (e.g. the input file).

For a maximum of interoperability with the host system (so the used java version becomes irrelevant), Logstash will be run in a Docker-based container sandbox.

This is the first blog post of a series about the Elastic Stack (a.k.a. ELK stack):

What is Logstash?

Logstash can collect logging data from a multitude of sources, transform the data, and send the data to a multitude of “stashes”.

logstash_input_output

Elastic’s “favorite stash” is Elasticsearch, another open source project driven by Elastic. Together with Kibana, Logstash and Elastic build the so-called ELK pipeline:

  • Elasticsearch is for searching, analyzing, and storing your data
  • Logstash (and Beats) is for collecting and transforming data, from any source, in any format
  • Kibana is a portal for visualizing the data and to navigate within the elastic stack

 

2016-11-17-18_31_39

 

In the current blog post, we will restrict ourselves to simplified Hello World Pipelines like follows:

2016-11-17-19_52_26

and:

2016-11-17-18_34_43

We will first read and write to from command line, before we will use log files as input source and output destinations.

Tools used

  • Vagrant 1.8.6
  • Virtualbox 5.0.20
  • Docker 1.12.1
  • Logstash 5.0.1

Step 1: Install a Docker Host via Vagrant and Connect to the Host via SSH

We will run Logstash in a Docker container in order to allow for maximum interoperability. This way, we always can use the latest Logstash version without the need to control the java version used: e.g. Logstash v 1.4.x works with java 7, while version 5.0.x works with java 8 only, currently.

If you are new to Docker, you might want to read this blog post.

Installing Docker on Windows and Mac can be a real challenge, but no worries: we will show an easy way here, that is much quicker than the one described in Docker’s official documentation:

Prerequisites of this step:

  • I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy. However, if you cannot get rid of your HTTP proxy, read this blog post.
  • Administration rights on you computer.

Steps to install a Docker Host VirtualBox VM:

Download and install Virtualbox (if the installation fails with error message “Oracle VM Virtualbox x.x.x Setup Wizard ended prematurely” see Appendix A of this blog post: Virtualbox Installation Workaround below)

1. Download and Install Vagrant (requires a reboot)

2. Download Vagrant Box containing an Ubuntu-based Docker Host and create a VirtualBox VM like follows:

(basesystem)# mkdir ubuntu-trusty64-docker ; cd ubuntu-trusty64-docker
(basesystem)# vagrant init williamyeh/ubuntu-trusty64-docker
(basesystem)# vagrant up
(basesystem)# vagrant ssh
(dockerhost)$

Now you are logged into the Docker host and we are ready for the next step: to create the Ansible Docker image.

Note: I have experienced problems with the vi editor when running vagrant ssh in a Windows terminal. In case of Windows, consider to follow Appendix C of this blog post and to use putty instead.

Step 2 (optional): Download Logstash Image

This extra download step is optional, since the Logstash Docker image will be downloaded automatically in step 3, if it is not already found on the system:

(dockerhost)$ sudo docker pull logstash
Unable to find image 'logstash:latest' locally
latest: Pulling from library/logstash

386a066cd84a: Already exists
75ea84187083: Already exists
3e2e387eb26a: Pull complete
eef540699244: Pull complete
1624a2f8d114: Pull complete
7018f4ec6e0a: Pull complete
6ca3bc2ad3b3: Pull complete
3829939e7052: Pull complete
1cf20bb3ce62: Pull complete
f737f281552e: Pull complete
f1b7aca72edd: Pull complete
fb821ca73c54: Pull complete
c1543e80c12a: Pull complete
566f64970d2a: Pull complete
de88d0e92195: Pull complete
Digest: sha256:048a18100f18cdec3a42ebaa42042d5ee5bb3acceacea027dee4ae3819039da7
Status: Downloaded newer image for logstash:latest

The version of the downloaded Logstash image can be checked with following command:

(dockerhost)$ sudo docker run -it --rm logstash --version
logstash 5.0.1

We are using version 5.0.1 currently.

Step 3: Run Logstash als a Translator from Command Line to Command Line

In this step, we will use Logstash to translate the command line standard input (STDIN) to command line standard output (STDOUT).

2016-11-17-19_52_26

Once a docker host  is available, downloading, installing and running Logstash is as simple as typing following command. If the image is already downloaded, because Step 2 was accomplished before, the download part will be skipped:

(dockerhost)$ sudo docker run -it --rm logstash -e 'input { stdin { } } output { stdout { } }'

The with the -e option, we tell Logstash to read from the command line input (STDIN) and to send all output to the command line STOUT.

The output looks like follows:

Unable to find image 'logstash:latest' locally
latest: Pulling from library/logstash

386a066cd84a: Already exists
75ea84187083: Already exists
3e2e387eb26a: Pull complete
eef540699244: Pull complete
1624a2f8d114: Pull complete
7018f4ec6e0a: Pull complete
6ca3bc2ad3b3: Pull complete
3829939e7052: Pull complete
1cf20bb3ce62: Pull complete
f737f281552e: Pull complete
f1b7aca72edd: Pull complete
fb821ca73c54: Pull complete
c1543e80c12a: Pull complete
566f64970d2a: Pull complete
de88d0e92195: Pull complete
Digest: sha256:048a18100f18cdec3a42ebaa42042d5ee5bb3acceacea027dee4ae3819039da7
Status: Downloaded newer image for logstash:latest
Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
The stdin plugin is now waiting for input:
11:19:07.293 [[main]-pipeline-manager] INFO  logstash.pipeline - Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
11:19:07.334 [[main]-pipeline-manager] INFO  logstash.pipeline - Pipeline main started
11:19:07.447 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {:port=>9600}

In the first part, the Logstash Docker image is downloaded from Docker Hub, if the image is not already available locally. Then there are the logs of the Logstash start and the output is stopping and is waiting for your input. Now, if we type

hello logstash

we get an output similar to

2016-11-17T11:35:10.764Z 828389ba165b hello logstash

We can stop the container by typing <Ctrl>-D and we will get an output like

11:51:20.132 [LogStash::Runner] WARN  logstash.agent - stopping pipeline {:id=>"main"}

Now let us try another output format:

(dockerhost)$ sudo docker run -it --rm logstash -e 'input { stdin { } } output { stdout { codec => rubydebug } }'
Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
The stdin plugin is now waiting for input:
11:48:05.746 [[main]-pipeline-manager] INFO logstash.pipeline - Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
11:48:05.760 [[main]-pipeline-manager] INFO logstash.pipeline - Pipeline main started
11:48:05.827 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600}

You will need to wait for ~8 sec before you can send your first log to the STDIN. Let us do that now and type:

hello logstash in ruby style

This will produce an output like

{
 "@timestamp" => 2016-11-17T11:50:24.571Z,
 "@version" => "1",
 "host" => "9cd979a20db4",
 "message" => "hello logstash in ruby style",
 "tags" => []
}

Step 3: Run Logstash als a Translator from from File to File

In this example, we will use (log) files as input source and output destination:

2016-11-17-18_34_43

For this, we will create a Logstash configuration file on the Docker host as follows:

#logstash.conf
input {
  file {
    path => "/app/input.log"
  }
}

output {
  file {
    path => "/app/output.log"
  }
}

For being able to read a file in the current directory on the Docker host, we need to map the current directory to a directory inside the Docker container using the -v switch. This time we need to override the entrypoint, since we need to get access to the command line of the container itself. We cannot just

(dockerhost-terminal1)$ sudo docker run -it --rm --name logstash -v "$PWD":/app --entrypoint bash logstash

Then within the container we run logstash:

(container-terminal1)# logstash -f /app/logstash.conf

In a second terminal on the docker host, we need to run a second bash terminal within the container by issuing the command:

(dockerhost-terminal2)$ sudo docker exec -it logstash bash

Now, on the container command line, we prepare to see the output like follows:

(container-terminal2)# touch /app/output.log; tail -f /app/output.log

Now we need a third terminal, and connect to the container again. Then we send a “Hello Logstash” to the input file:

(dockerhost-terminal3)$ sudo docker exec -it logstash bash
(container-terminal3)# echo "Hello Logstash" >> /app/input.log

This will create following output on terminal 2:

{"path":"/app/input.log","@timestamp":"2016-11-17T19:53:02.728Z","@version":"1","host":"88a342b6b385","message":"Hello Logstash","tags":[]}

The output is in a format Elasticsearch understands.

In order to improve the readability of the output, we can specify a “plain” output codec in the configuration file:

#logstash.conf
input {
  file {
    path => "/app/input.log"
  }
}

output {
  file {
    path => "/app/output.log"
    codec => "plain"
  }
}

Note that a change of the Logstash configuration file content requires the logstash process to be restarted for the change to have an effect; i.e. we can stop it with Ctrl-C and restart the logstash process in terminal 1 it with

(container-terminal1)# logstash -f /app/logstash.conf

Now again

(container-terminal-3)# echo "Hello Logstash" >> /app/input.log

in terminal 3. That will produce following syslog-style output on terminal 2:

2016-11-17T20:10:39.861Z 88a342b6b385 Hello Logstash

Appendix A: Error Errno::EACCES: Permission denied if the logfile is changed from on a mapped Volume

This error has been seen by running Logstash as a Docker container with a mapped folder and manipulate the input file from the Docker host

(dockerhost)$ sudo docker run -it --rm -v "$PWD":/app logstash -f /app/logstash.conf
Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
19:15:59.927 [[main]-pipeline-manager] INFO logstash.pipeline - Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
19:15:59.940 [[main]-pipeline-manager] INFO logstash.pipeline - Pipeline main started
19:16:00.005 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600}

If we now change the input file on the docker host in a second terminal like follows:

(dockerhost)$ echo "Hello Logstash" >> input.log

we receive following output on the first terminal:

19:22:47.732 [[main]>worker1] INFO logstash.outputs.file - Opening file {:path=>"/app/output.log"}
19:22:47.779 [LogStash::Runner] FATAL logstash.runner - An unexpected error occurred! {:error=>#<Errno::EACCES: Permission denied - /app/output.log>, :backtrace=>["org/jruby/RubyFile.java:370:in `initialize'", "org/jruby/RubyIO.java:871:in `new'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-4.0.1/lib/logstash/outputs/file.rb:280:in `open'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-4.0.1/lib/logstash/outputs/file.rb:132:in `multi_receive_encoded'", "org/jruby/RubyHash.java:1342:in `each'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-4.0.1/lib/logstash/outputs/file.rb:131:in `multi_receive_encoded'", "org/jruby/ext/thread/Mutex.java:149:in `synchronize'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-4.0.1/lib/logstash/outputs/file.rb:130:in `multi_receive_encoded'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:90:in `multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/output_delegator_strategies/shared.rb:12:in `multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/output_delegator.rb:42:in `multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:297:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:296:in `output_batch'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:252:in `worker_loop'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:225:in `start_workers'"]}
(dockerhost)$

There is a problem with the synchronization of the input.log file from the Docker host to the Container causing the docker container to stop. The workaround is to run the container with a bash entrypoint and manipulate the file from within the container, as shown in the step by step guide above.

Appendix B: How to apply a custom Time Stamp

In a real customer project, I had the task to visualize the data of certain data dump files, which had their own time stamps in a custom format like follows:

2016-11-21|00:00:00|<other data>

Okay, you are right in thinking that this is a CSV with pipe (|) separator, and that the CSV Logstash plugin should be applied. However, before doing so, we can take it as an example on how to replace the built-in Logstash timestamp variable called @timestamp. This is better than creating your own timestamp variable with a different name. The latter is possible also and works with normal Kibana visualizations, but it does not seem to work with Timelion for more complex visualizations. So let us do it the right way now:

We will create a simple demonstration Logstash configuration file for demonstration of the topic like follows:

# logstash_custom_timestamp.conf
input {
  stdin { }
  file {
    path => "/app/input/*.*"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

With that, will allow for STDIN input as well as for file input from any file you dump into the path /app/input/*. For testing, we have set the start_position to the “beginning”. I.e., Logstash will always read the files from the beginning, even if it already has read part of it. In addition, by setting the sincedb_path to "/dev/null", we make sure, that Logstash forgets about which files are already processed. This way, we can restart Logstash and re-process any files in the folder.

Now let us find the time variable with a grok filter and replace the time variable with the date plugin:

filter {
  grok {
    match => {"message" => "(?<mydate>[1-9][0-9]{3}-[0-9]{2}-[0-9]{2}\|[0-9]{2}:[0-9]{2}:[0-9]{2})"}
  }

  date {
    match => ["mydate", "YYYY-MM-dd|HH:mm:ss", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss", "ISO8601"]
    target => "@timestamp"
  }

}

The grok filter allows us to define a new interim variable named mydate, if the specified regular expression is found in the input message. In our case, we want to match something like 2016-11-21|00:00:00, i.e. one digit between 1 and 9 ([1-9]) and 3 digits between 0 and 9 ([0-9]{3}), then a dash (-), then two digits ([0-9]{2}), a.s.o.

Then we can use the date plugin to overwrite the built-in @timestamp with our variable mydate we have created with the grok filter. Within the date we can match clauses like YYYY-MM-dd|HH:mm:ss in the mydate variable and push it to the @timestamp variable.

Note, that it is not possible to just use the replace directive. If we just try to overwrite @timestamp with mydate using the replace directive, Logstash will complain that you cannot overwrite a time variable with a String variable.

output {
  stdout { codec => rubydebug }
}

Now, let us start Logstash in a Docker container and test the configuration:

(dockerhost)$ sudo docker run -it --rm --name logstash -v "$PWD":/app --entrypoint bash logstash
(container)$ logstash -f /app/logstash_custom_timestamp.conf

And now, the container is waiting for input. We do  not let it wait and we type in the line below in fat:

1966-10-23|12:00:00|birthday
{
 "mydate" => "1966-10-23|12:00:00",
 "@timestamp" => 1966-10-23T12:00:00.000Z,
 "@version" => "1",
 "host" => "02cec85c3aac",
 "message" => "1966-10-23|12:00:00|birthday",
 "tags" => []
}

Success: the built-in timestamp variable @timestamp has been updated with the date found in the input message.

Let us observe, what happens with input messages, which do not match:

this is a message that does not match
{
    "@timestamp" => 2016-12-04T10:19:17.501Z,
      "@version" => "1",
          "host" => "02cec85c3aac",
       "message" => "this is a message that does not match",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

We can see that the output is tagged with "_grokparsefailure" in this case and the timestamp is set to the current date and time, as expected.

Appendix C: Logstash CSV read Performance

In a real project, I had to read in many millions of lines of a large set of CSV files. I have experienced that it took quite a bit of time to read in the data, so I want to measure the input performance of Logstash to be able to calculate the time consumption.

Note: we will reduce the data volume by a random sampling of the input. This will optimize input and Elasticsearch performance with the trade-off that the data analysis will become less accurate. However, if each data point still has more than 100 samples, the error is expected to lower than a few per cent, if the input data has no “unhealthy” values distribution (e.g. many records with low values and only few records with very large values).

Tools used:

  • Notebook with i7-6700HQ CPU and 64 GB RAM and Windows 10 Pro
  • VirtualBox 5.0.20 r106931
  • VirtualBox VM with Ubuntu 14.04, 4GB RAM and 2 vCPU
  • Docker installed 1.12.1, build 23cf638
  • 3 Docker containers running in interactive mode (the performance in detached mode might be higher, so we will measure a lower bound of the performance):
    • Logstash 5.0.1
    • Elasticsearch 5.0.1
    • Kibana 5.0.1
  • Data input files:
  • CSV files with 12200 lines
  • Sample data lines (note that the first line of each file will be dropped by Logstash):
DATUM|ZEIT|IPV4_SRC_ADDR|IPV4_DST_ADDR|ROUTER_IP|INTF_IN|INTF_OUT|TOS|FLAGS|IP_PROTOCOL_VERSION|PROTOCOL|L4_SRC_PORT|L4_DST_PORT|IN_PKTS|IN_BYTES|FLOWS
2016-11-23|15:58:10|9.1.7.231|164.25.118.50|9.0.253.1|2|0|0|0|4|17|49384|161|6|1602|1
2016-10-23|15:58:12|9.1.7.231|9.60.64.1|9.0.253.1|2|2|0|0|4|17|51523|161|1|78|1
...

Logstash configuration

# logstash_netflow_csv_to_elasticsearch.conf
input {
  stdin { }
  file {
    path => "/app/input/*.*"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
  ruby {
    # Sampling:
    code => "event.cancel if rand <= 0.90" # 10% sampling (i.e. cancel 90% of events) 
    #code => "event.cancel if rand <= 0.99" # 1% sampling (i.e. cancel 99% of events)
  } 

  # set timestamp: read from message 
  grok { 
    match => {"message" => "(?[1-9][0-9]{3}-[0-9]{2}-[0-9]{2}\|[0-9]{2}:[0-9]{2}:[0-9]{2})"}
  }

  # set timestamp: overwrite time stamp
  date {
    match => ["mydate", "YYYY-MM-dd|HH:mm:ss", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss", "ISO8601"]
    target => "@timestamp"
  }

  csv {

    columns => [
      "DATUM",
      "ZEIT",
      "IPV4_SRC_ADDR",
      "IPV4_DST_ADDR",
      "ROUTER_IP",
      "INTF_IN",
      "INTF_OUT",
      "TOS",
      "FLAGS",
      "IP_PROTOCOL_VERSION",
      "PROTOCOL",
      "L4_SRC_PORT",
      "L4_DST_PORT",
      "IN_PKTS",
      "IN_BYTES",
      "FLOWS"
    ]

    separator => "|"
    remove_field => ["mydate"]
  }

  if ([DATUM] == "DATUM") {
    drop { }
  }

}

output {
  stdout { codec => dots }

  elasticsearch {
    action => "index"
    index => "csv"
    hosts => "elasticsearch"
    document_type => "data"
    workers => 1
  }
}

Results without Elasticsearch output

As a base line, we will perform tests with the elaisicsearch output commented out first:

Test 1) 100% Sampling -> 3,400/sec (i.e. 3,400 data sets/sec)
(10 files with 12,200 lines each in ~35 sec)

Test 2) 10% Sampling -> 6,100 lines/sec (i.e. 610 data sets/sec)
(10 files with 12,200 lines each in ~20 sec)

Test 3) 1% Sampling -> 8,100 lines/sec (i.e. 81 data sets/sec)
(10 files with 12,200 lines each in ~15 sec)

Results with Elasticsearch output

Now let us test the performance in case the data is sent to Elasticsearch:

Test 1) 100% Sampling -> 1,700/sec with 1,700 data lines/sec
(10 files with 12,200 lines each in ~70 sec)

Test 2) 10% Sampling -> 3,500 lines/sec with 350 data lines/sec
(10 files with 12,200 lines each in ~35 sec)

Test 3) 1% Sampling -> 6,100 lines/sec (i.e. 61 data sets/sec)
(10 files with 12,200 lines each in ~20 sec)

2016-12-05-14_27_24-logstash-input-performance-with-and-without-elasticsearch-output-ov-v0-1-ods-l

As we can see, the input rate is about 2000 lines/sec lower, if the output is sent Elasticsearch instead of sending it to console only (dots) (yellow vs. blue line).

In case of output to Elasticsearch, we get following rates graph:

2016-12-05-15_45_32-logstash-input-performance-with-and-without-elasticsearch-output-ov-v0-1-ods-l

  • Sampling rate 1%: if only 1% of the data records are sent to the output, the input rate is increased to 6,100 (factor ~3.6 compared to a sampling rate of 100%).
  • Sampling rate 10%: if only 10% of the data records are sent to the output, one could expect an input rate increase by the factor 10 compared to 100% sampling, if the output pipe was the bottleneck. This does not seem to be the case, since we observe an increase by the factor 2 only (3,500 lines/sec).
  • Sampling rate 100%: if all input lines are sent to the output, we can reach ~ 1,700 lines/sec

The optimum sampling rate is determined by increasing the sampling rate until the required data accuracy is reached. The data accuracy can be checked by random sampling of the same set of data several times and to observe variance of the output.

Summary

In this blog post we have created two simple Hello World examples:

  1. one for translation between command line input and command line output and
  2. a second one for translation from a file to a file.

In order to avoid any compatibility issues with the java version on the host, we have run Logstash in a Docker container. This works fine, if the input file is manipulated from within the container. As seen in Appendix A, we cannot manipulate the file on a mapped volume on the Docker Host, though.

References

 

0

How to set up Docker Monitoring via cAdvisor, InfluxDB and Grafana


Have you ever tried to monitor a docker solution? In this blog post, we will discuss three open source docker monitoring alternatives, before we will go through a step by step guide of a docker monitoring alternative that consist of the components Google cAdvisor as data source, InfluxDB as the database and Grafana for creating the graphs.

The post is built upon a blog post of Brian Christner.  However, we will take a shortcut via a docker compose file created by Dale Kate-Murray and Ross Jimenez, which helps us to spin up the needed docker containers within minutes (depending on your Internet speed).

Go to Summary ->

Docker Monitoring Alternatives

Other free docker monitoring solutions are discussed in this youtube video of Brian Christner:

  • Google cAdvisor (standalone): easy to use, no config needed
  • cAdvisor + InfluxDB + Grafana: flexible, adaptable (the one we will get hands-on experience below)
  • Prometheus: all-in-one complete monitoring solution

He is summarizing the capabilities of those solutions like follows:

2016-10-25-17_32_32-docker-monitoring-youtube

@Brian Christner: I hope it is Okay, that I have copied this slide from your youtube video?

Those are open source alternatives. Brian Christner points out that you might need more complete, enterprise-level solutions than the open source alternatives can offer, e.g. Data Dog (offers a free service for up to five monitored hosts) or Sysdig (the latter also seems to be open source, though). See also this Rancher post, which compares seven docker monitoring alternatives.

Step by Step guide: “Installing” cAdvisor + InfluxDB + Grafana

Here, we will lead through a step by step guide on how to deploy a flexible docker monitoring solution consisting of Google cAdvisor as data source, InfluxDB as the database and Grafana for creating the graphs. We will make use of a docker compose file Ross Jimenez has created and Brian Christner has included in his git repository

Step 0: Prerequisites

We assume that following prerequisites are met

  • Docker is installed. A nice way to install an Ubuntu Docker host via Vagrant is described here (search for the term “Install a Docker Host”).
  • You have direct Internet access. If you need to cope with a HTTP proxy, see the official docker instructions, or, if it does not work you may try this blog post.

Step 1: Install docker-compose via Container:

On a Docker host, we will install docker-compose via docker container via following script:

# detect, whether sudo is needed:
sudo echo hallo > /dev/null 2>&1 && SUDO=sudo

# download docker-compose wrapper, if docker-compose:
$SUDO docker-compose --version || \
 $SUDO docker --version && \
 curl -L https://github.com/docker/compose/releases/download/1.8.1/run.sh | $SUDO tee /usr/local/bin/docker-compose && \
 $SUDO chmod +x /usr/local/bin/docker-compose

You might prefer a native installation of docker compose. Please check out the official documentation in this case.

Step 2: Download Docker Compose File

Now we download Brian Christner’s docker monitoring git repository. The CircleCI tests of Brian’s repository are failing currently (as of 2016-10-25), but the software seems to work anyway.

git clone https://github.com/vegasbrianc/docker-monitoring && \
cd docker-monitoring

Step 3: Start Containers

Now let us start the containers via

$ docker-compose up
Starting dockermonitoringrepaired_influxdbData_1
Starting dockermonitoringrepaired_influxdb_1
Starting dockermonitoringrepaired_grafana_1
Starting dockermonitoringrepaired_cadvisor_1
Attaching to dockermonitoringrepaired_influxdbData_1, dockermonitoringrepaired_influxdb_1, dockermonitoringrepaired_cadvisor_1, dockermonitoringrepaired_grafana_1
dockermonitoringrepaired_influxdbData_1 exited with code 0
influxdb_1 | influxdb configuration:
influxdb_1 | ### Welcome to the InfluxDB configuration file.
influxdb_1 |
influxdb_1 | # Once every 24 hours InfluxDB will report anonymous data to m.influxdb.com
influxdb_1 | # The data includes raft id (random 8 bytes), os, arch, version, and metadata.
influxdb_1 | # We don't track ip addresses of servers reporting. This is only used
influxdb_1 | # to track the number of instances running and the versions, which
...
(trunkated; see full log in the Appendix)
...
influxdb_1 | [admin] 2016/10/25 16:48:44 Listening on HTTP: [::]:8083
influxdb_1 | [continuous_querier] 2016/10/25 16:48:44 Starting continuous query service
influxdb_1 | [httpd] 2016/10/25 16:48:44 Starting HTTP service
influxdb_1 | [httpd] 2016/10/25 16:48:44 Authentication enabled: false
influxdb_1 | [httpd] 2016/10/25 16:48:44 Listening on HTTP: [::]:8086
influxdb_1 | [retention] 2016/10/25 16:48:44 Starting retention policy enforcement service with check interval of 30m0s
influxdb_1 | [monitor] 2016/10/25 16:48:44 Storing statistics in database '_internal' retention policy 'monitor', at interval 10s
influxdb_1 | 2016/10/25 16:48:44 Sending anonymous usage statistics to m.influxdb.com
influxdb_1 | [run] 2016/10/25 16:48:44 Listening for signals

Note: if you see a continuous message “Waiting for confirmation of InfluxDB service startup”, you might hit a problem described in an Appendix below. Search for “Waiting for confirmation of InfluxDB service startup” on this page.

Step 4 (optional): In a different window on the Docker host, we can test the connection like follows:

$ curl --retry 10 --retry-delay 5 -v http://localhost:8083
* Rebuilt URL to: http://localhost:8083/
* Hostname was NOT found in DNS cache
* Trying ::1...
* Connected to localhost (::1) port 8083 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
...
</body>

</html>
* Connection #0 to host localhost left intact

Step 5: Connect to cAdvisor, InfluxDB, Grafana

Step 5.1 (optional): Connect to cAdvisor

Now let us connect to cAdvisor. For that, you need to find out, which IP address your docker host is using. In my case, I am using a Vagrant-based Docker host and I have added an additional line

config.vm.network "private_network", ip: "192.168.33.11"

The TCP port can be seen in the docker-compose.yml file: it is 8080. This allows me to connect to cAdvisor’ dashboard via http://192.168.33.11:8080/containers/:

2016-10-25-22_36_11-cadvisor-_

Step 5.2 (optional): connect to InfluxDB

InfluxDB is reachable via http://192.168.33.11:8083/:

2016-10-25-22_38_50-influxdb-admin-interface

Step 5.3 (required): Connect to Grafana

And Grafana can be reached via http://192.168.33.11:3000/: log in as admin with password admin, if you are prompted for it:

2016-10-25-22_42_32-grafana-home

Okay, the dashboard is still empty.

Step 6: Add Data Sources to Grafana manually

Connect to Grafana (http://192.168.33.11:3000/ in my case)

Click on Data Sources -> Add new

and add following data:

Name: influxdb
Type: InfluxDB 0.9.x

Note: Be sure to check default box! Otherwise, you will see random data created by Grafana below!

Http settings
Url: http://192.168.33.11:8086 (please adapt the IP address to your environment)
Access: proxy
Basic Auth: Enabled
User: admin
Password: admin

InfluxDB Details
Database: cadvisor
User: root
Password: root

Click Add -> Test Connection (should be successful) -> Save

Step 7: Add New Dashboard to Grafana via json File

Connect to Grafana (http://192.168.33.11:3000/ in my case)

Click on Grafana Home, then Grafana Import, navigate to the cloned guthub repository, click on the button below Import File and pick the file docker-monitoring-0.9.json:

docker-monitoring-0.9.json

As if by an invisible hand, we get a dashboard with information on Filesystem Usage, CPU Usage, Memory Usage and Network Usage of the Containers on the host.

2016-10-30-20_05_15-grafana-new-dashboard

Note: if the graphs look like follows

2016-10-28-21_28_18-grafana-new-dashboard

and the graphs change substantially by clicking Grafana Dashboard Refresh, then you most probably have forgotten to check the “default” box in step 6. In this case, you need to click on the title of the graph -> Edit -> choose influxdb as data source.

Step 9 (optional): CPU Stress Test

Since only the docker monitoring containers are running, the absolute numbers we see are quite low. Let us start a container that is stressing the CPU a little bit:

docker run -it petarmaric/docker.cpu-stress-test

The graphs of cAdvisor are reacting right away:

2016-10-30-21_08_06-cadvisor-_

Let us wait a few minutes and refresh the Grafana graphs and put our focus on the CPU usage:

2016-10-30-21_05_35-grafana-new-dashboard

The data does not seem to be reliable. I have opened issue #10 for this. When looking at the InfluxDB data by specifying following URL in a browser:

http://192.168.33.11:8086/query?pretty=true&db=cadvisor&q=SELECT%20%22value%22%20FROM%20%22cpu_usage_system%22

(you need to adapt your IP address), then we get data that is changing by the factor of 10.000 within milliseconds!

{
    "results": [
        {
            "series": [
                {
                    "name": "cpu_usage_system",
                    "columns": [
                        "time",
                        "value"
                    ],
                    "values": [
                        [
                            "2016-10-24T17:12:49.021559212Z",
                            910720000000
                        ],
                        [
                            "2016-10-24T17:12:49.032153994Z",
                            20000000
                        ],
                        [
                            "2016-10-24T17:12:49.033316234Z",
                            5080000000
                        ],

Summary

Following Brian Christner’s youtube video on Docker Monitoring, we have compared three open source Docker monitoring solutions

  1. Google cAdvisor (standalone): easy to use, no config needed
  2. cAdvisor + InfluxDB + Grafana: flexible, adaptable (the one we will get hands-on experience below)
  3. Prometheus: all-in-one complete monitoring solution

By using a pre-defined docker-compose file, solution 2 can be spin up in minutes (unless you are working in a NFS synced Vagrant folder on Windows, which leads to a continuous ‘Waiting for confirmation of InfluxDB service startup’ message; see the Appendic B below; that problem I have reported here had caused quite a headache on my side).

After the data source is configured manually, a json file helps to create a nice Grafana dashboard within minutes. The dashboard shows graphs about File System Usage, CPU Usage, Memory Usage and Network Usage.

In the moment, there is a ceaveat that the data displayed is not trustworthy. This is investigated in the framework of issue #10 of Brian Christner’s repository. I will report here, when it is resolved.

Next Steps:

Appendix A: full startup log of successful ‘docker-compose up’

$ docker-compose up
Starting dockermonitoringrepaired_influxdbData_1
Starting dockermonitoringrepaired_influxdb_1
Starting dockermonitoringrepaired_grafana_1
Starting dockermonitoringrepaired_cadvisor_1
Attaching to dockermonitoringrepaired_influxdbData_1, dockermonitoringrepaired_influxdb_1, dockermonitoringrepaired_grafana_1, dockermonitoringrepaired_cadvisor_1
dockermonitoringrepaired_influxdbData_1 exited with code 0
influxdb_1 | influxdb configuration:
influxdb_1 | ### Welcome to the InfluxDB configuration file.
influxdb_1 |
influxdb_1 | # Once every 24 hours InfluxDB will report anonymous data to m.influxdb.com
influxdb_1 | # The data includes raft id (random 8 bytes), os, arch, version, and metadata.
influxdb_1 | # We don't track ip addresses of servers reporting. This is only used
influxdb_1 | # to track the number of instances running and the versions, which
influxdb_1 | # is very helpful for us.
influxdb_1 | # Change this option to true to disable reporting.
influxdb_1 | reporting-disabled = false
influxdb_1 |
influxdb_1 | # we'll try to get the hostname automatically, but if it the os returns something
influxdb_1 | # that isn't resolvable by other servers in the cluster, use this option to
influxdb_1 | # manually set the hostname
influxdb_1 | # hostname = "localhost"
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [meta]
influxdb_1 | ###
influxdb_1 | ### Controls the parameters for the Raft consensus group that stores metadata
influxdb_1 | ### about the InfluxDB cluster.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [meta]
influxdb_1 | # Where the metadata/raft database is stored
influxdb_1 | dir = "/data/meta"
influxdb_1 |
influxdb_1 | retention-autocreate = true
influxdb_1 |
influxdb_1 | # If log messages are printed for the meta service
influxdb_1 | logging-enabled = true
influxdb_1 | pprof-enabled = false
influxdb_1 |
influxdb_1 | # The default duration for leases.
influxdb_1 | lease-duration = "1m0s"
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [data]
influxdb_1 | ###
influxdb_1 | ### Controls where the actual shard data for InfluxDB lives and how it is
influxdb_1 | ### flushed from the WAL. "dir" may need to be changed to a suitable place
influxdb_1 | ### for your system, but the WAL settings are an advanced configuration. The
influxdb_1 | ### defaults should work for most systems.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [data]
influxdb_1 | # Controls if this node holds time series data shards in the cluster
influxdb_1 | enabled = true
influxdb_1 |
influxdb_1 | dir = "/data/data"
influxdb_1 |
influxdb_1 | # These are the WAL settings for the storage engine >= 0.9.3
influxdb_1 | wal-dir = "/data/wal"
influxdb_1 | wal-logging-enabled = true
influxdb_1 | data-logging-enabled = true
influxdb_1 |
influxdb_1 | # Whether queries should be logged before execution. Very useful for troubleshooting, but will
influxdb_1 | # log any sensitive data contained within a query.
influxdb_1 | # query-log-enabled = true
influxdb_1 |
influxdb_1 | # Settings for the TSM engine
influxdb_1 |
influxdb_1 | # CacheMaxMemorySize is the maximum size a shard's cache can
influxdb_1 | # reach before it starts rejecting writes.
influxdb_1 | # cache-max-memory-size = 524288000
influxdb_1 |
influxdb_1 | # CacheSnapshotMemorySize is the size at which the engine will
influxdb_1 | # snapshot the cache and write it to a TSM file, freeing up memory
influxdb_1 | # cache-snapshot-memory-size = 26214400
influxdb_1 |
influxdb_1 | # CacheSnapshotWriteColdDuration is the length of time at
influxdb_1 | # which the engine will snapshot the cache and write it to
influxdb_1 | # a new TSM file if the shard hasn't received writes or deletes
influxdb_1 | # cache-snapshot-write-cold-duration = "1h"
influxdb_1 |
influxdb_1 | # MinCompactionFileCount is the minimum number of TSM files
influxdb_1 | # that need to exist before a compaction cycle will run
influxdb_1 | # compact-min-file-count = 3
influxdb_1 |
influxdb_1 | # CompactFullWriteColdDuration is the duration at which the engine
influxdb_1 | # will compact all TSM files in a shard if it hasn't received a
influxdb_1 | # write or delete
influxdb_1 | # compact-full-write-cold-duration = "24h"
influxdb_1 |
influxdb_1 | # MaxPointsPerBlock is the maximum number of points in an encoded
grafana_1 | 2016/10/25 17:30:59 [I] Starting Grafana
grafana_1 | 2016/10/25 17:30:59 [I] Version: 2.6.0, Commit: v2.6.0, Build date: 2015-12-14 14:18:01 +0000 UTC
grafana_1 | 2016/10/25 17:30:59 [I] Configuration Info
grafana_1 | Config files:
grafana_1 | [0]: /usr/share/grafana/conf/defaults.ini
grafana_1 | [1]: /etc/grafana/grafana.ini
grafana_1 | Command lines overrides:
grafana_1 | [0]: default.paths.data=/var/lib/grafana
grafana_1 | [1]: default.paths.logs=/var/log/grafana
grafana_1 | Paths:
grafana_1 | home: /usr/share/grafana
grafana_1 | data: /var/lib/grafana
grafana_1 | logs: /var/log/grafana
grafana_1 |
grafana_1 | 2016/10/25 17:30:59 [I] Database: sqlite3
grafana_1 | 2016/10/25 17:30:59 [I] Migrator: Starting DB migration
grafana_1 | 2016/10/25 17:30:59 [I] Listen: http://0.0.0.0:3000
influxdb_1 | # block in a TSM file. Larger numbers may yield better compression
influxdb_1 | # but could incur a performance penalty when querying
influxdb_1 | # max-points-per-block = 1000
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [cluster]
influxdb_1 | ###
influxdb_1 | ### Controls non-Raft cluster behavior, which generally includes how data is
influxdb_1 | ### shared across shards.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [cluster]
influxdb_1 | shard-writer-timeout = "5s" # The time within which a remote shard must respond to a write request.
influxdb_1 | write-timeout = "10s" # The time within which a write request must complete on the cluster.
influxdb_1 | max-concurrent-queries = 0 # The maximum number of concurrent queries that can run. 0 to disable.
influxdb_1 | query-timeout = "0s" # The time within a query must complete before being killed automatically. 0s to disable.
influxdb_1 | max-select-point = 0 # The maximum number of points to scan in a query. 0 to disable.
influxdb_1 | max-select-series = 0 # The maximum number of series to select in a query. 0 to disable.
influxdb_1 | max-select-buckets = 0 # The maximum number of buckets to select in an aggregate query. 0 to disable.
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [retention]
influxdb_1 | ###
influxdb_1 | ### Controls the enforcement of retention policies for evicting old data.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [retention]
influxdb_1 | enabled = true
influxdb_1 | check-interval = "30m"
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [shard-precreation]
influxdb_1 | ###
influxdb_1 | ### Controls the precreation of shards, so they are available before data arrives.
influxdb_1 | ### Only shards that, after creation, will have both a start- and end-time in the
influxdb_1 | ### future, will ever be created. Shards are never precreated that would be wholly
influxdb_1 | ### or partially in the past.
influxdb_1 |
influxdb_1 | [shard-precreation]
influxdb_1 | enabled = true
influxdb_1 | check-interval = "10m"
influxdb_1 | advance-period = "30m"
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### Controls the system self-monitoring, statistics and diagnostics.
influxdb_1 | ###
influxdb_1 | ### The internal database for monitoring data is created automatically if
influxdb_1 | ### if it does not already exist. The target retention within this database
influxdb_1 | ### is called 'monitor' and is also created with a retention period of 7 days
influxdb_1 | ### and a replication factor of 1, if it does not exist. In all cases the
influxdb_1 | ### this retention policy is configured as the default for the database.
influxdb_1 |
influxdb_1 | [monitor]
influxdb_1 | store-enabled = true # Whether to record statistics internally.
influxdb_1 | store-database = "_internal" # The destination database for recorded statistics
influxdb_1 | store-interval = "10s" # The interval at which to record statistics
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [admin]
influxdb_1 | ###
influxdb_1 | ### Controls the availability of the built-in, web-based admin interface. If HTTPS is
influxdb_1 | ### enabled for the admin interface, HTTPS must also be enabled on the [http] service.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [admin]
influxdb_1 | enabled = true
influxdb_1 | bind-address = ":8083"
influxdb_1 | https-enabled = false
influxdb_1 | https-certificate = "/etc/ssl/influxdb.pem"
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [http]
influxdb_1 | ###
influxdb_1 | ### Controls how the HTTP endpoints are configured. These are the primary
influxdb_1 | ### mechanism for getting data into and out of InfluxDB.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [http]
influxdb_1 | enabled = true
influxdb_1 | bind-address = ":8086"
influxdb_1 | auth-enabled = false
influxdb_1 | log-enabled = true
influxdb_1 | write-tracing = false
influxdb_1 | pprof-enabled = false
influxdb_1 | https-enabled = false
influxdb_1 | https-certificate = "/etc/ssl/influxdb.pem"
influxdb_1 | max-row-limit = 10000
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [[graphite]]
influxdb_1 | ###
influxdb_1 | ### Controls one or many listeners for Graphite data.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [[graphite]]
influxdb_1 | enabled = false
influxdb_1 | database = "graphitedb"
influxdb_1 | bind-address = ":2003"
influxdb_1 | protocol = "tcp"
influxdb_1 | # consistency-level = "one"
influxdb_1 |
influxdb_1 | # These next lines control how batching works. You should have this enabled
influxdb_1 | # otherwise you could get dropped metrics or poor performance. Batching
influxdb_1 | # will buffer points in memory if you have many coming in.
influxdb_1 |
influxdb_1 | # batch-size = 5000 # will flush if this many points get buffered
influxdb_1 | # batch-pending = 10 # number of batches that may be pending in memory
influxdb_1 | # batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
influxdb_1 | # udp-read-buffer = 0 # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
influxdb_1 |
influxdb_1 | ### This string joins multiple matching 'measurement' values providing more control over the final measurement name.
influxdb_1 | # separator = "."
influxdb_1 |
influxdb_1 | ### Default tags that will be added to all metrics. These can be overridden at the template level
influxdb_1 | ### or by tags extracted from metric
influxdb_1 | # tags = ["region=us-east", "zone=1c"]
influxdb_1 |
influxdb_1 | ### Each template line requires a template pattern. It can have an optional
influxdb_1 | ### filter before the template and separated by spaces. It can also have optional extra
influxdb_1 | ### tags following the template. Multiple tags should be separated by commas and no spaces
influxdb_1 | ### similar to the line protocol format. There can be only one default template.
influxdb_1 | templates = [
influxdb_1 | # filter + template
influxdb_1 | #"*.app env.service.resource.measurement",
influxdb_1 | # filter + template + extra tag
influxdb_1 | #"stats.* .host.measurement* region=us-west,agent=sensu",
influxdb_1 | # default template. Ignore the first graphite component "servers"
influxdb_1 | "instance.profile.measurement*"
influxdb_1 | ]
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [collectd]
influxdb_1 | ###
influxdb_1 | ### Controls one or many listeners for collectd data.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [[collectd]]
influxdb_1 | enabled = false
influxdb_1 | # bind-address = ":25826"
influxdb_1 | # database = "collectd"
influxdb_1 | # typesdb = "/usr/share/collectd/types.db"
influxdb_1 | # retention-policy = ""
influxdb_1 |
influxdb_1 | # These next lines control how batching works. You should have this enabled
influxdb_1 | # otherwise you could get dropped metrics or poor performance. Batching
influxdb_1 | # will buffer points in memory if you have many coming in.
influxdb_1 |
influxdb_1 | # batch-size = 1000 # will flush if this many points get buffered
influxdb_1 | # batch-pending = 5 # number of batches that may be pending in memory
influxdb_1 | # batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
influxdb_1 | # read-buffer = 0 # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [opentsdb]
influxdb_1 | ###
influxdb_1 | ### Controls one or many listeners for OpenTSDB data.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [[opentsdb]]
influxdb_1 | enabled = false
influxdb_1 | # bind-address = ":4242"
influxdb_1 | # database = "opentsdb"
influxdb_1 | # retention-policy = ""
influxdb_1 | # consistency-level = "one"
influxdb_1 | # tls-enabled = false
influxdb_1 | # certificate= ""
influxdb_1 | # log-point-errors = true # Log an error for every malformed point.
influxdb_1 |
influxdb_1 | # These next lines control how batching works. You should have this enabled
influxdb_1 | # otherwise you could get dropped metrics or poor performance. Only points
influxdb_1 | # metrics received over the telnet protocol undergo batching.
influxdb_1 |
influxdb_1 | # batch-size = 1000 # will flush if this many points get buffered
cadvisor_1 | I1025 17:30:59.170040 1 storagedriver.go:42] Using backend storage type "influxdb"
cadvisor_1 | I1025 17:30:59.170881 1 storagedriver.go:44] Caching stats in memory for 2m0s
cadvisor_1 | I1025 17:30:59.171032 1 manager.go:131] cAdvisor running in container: "/docker/9839f9c5c9d674016006e4d4144f984ea91320686356235951f21f0b51306c47"
cadvisor_1 | I1025 17:30:59.194143 1 fs.go:107] Filesystem partitions: map[/dev/dm-0:{mountpoint:/rootfs major:252 minor:0 fsType: blockSize:0} /dev/sda1:{mountpoint:/rootfs/boot major:8 minor:1 fsType: blockSize:0}]
influxdb_1 | # batch-pending = 5 # number of batches that may be pending in memory
influxdb_1 | # batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [[udp]]
influxdb_1 | ###
influxdb_1 | ### Controls the listeners for InfluxDB line protocol data via UDP.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [[udp]]
influxdb_1 | enabled = false
influxdb_1 | bind-address = ":4444"
influxdb_1 | database = "udpdb"
influxdb_1 | # retention-policy = ""
influxdb_1 |
influxdb_1 | # These next lines control how batching works. You should have this enabled
influxdb_1 | # otherwise you could get dropped metrics or poor performance. Batching
influxdb_1 | # will buffer points in memory if you have many coming in.
influxdb_1 |
influxdb_1 | # batch-size = 1000 # will flush if this many points get buffered
influxdb_1 | # batch-pending = 5 # number of batches that may be pending in memory
influxdb_1 | # batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
influxdb_1 | # read-buffer = 0 # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
influxdb_1 |
influxdb_1 | # set the expected UDP payload size; lower values tend to yield better performance, default is max UDP size 65536
influxdb_1 | # udp-payload-size = 65536
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [continuous_queries]
influxdb_1 | ###
influxdb_1 | ### Controls how continuous queries are run within InfluxDB.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [continuous_queries]
influxdb_1 | log-enabled = true
influxdb_1 | enabled = true
influxdb_1 | # run-interval = "1s" # interval for how often continuous queries will be checked if they need to run=> Starting InfluxDB ...
influxdb_1 | => About to create the following database: cadvisor
influxdb_1 | => Database had been created before, skipping ...
influxdb_1 | exec influxd -config=${CONFIG_FILE}
influxdb_1 |
influxdb_1 | 8888888 .d888 888 8888888b. 888888b.
influxdb_1 | 888 d88P" 888 888 "Y88b 888 "88b
influxdb_1 | 888 888 888 888 888 888 .88P
influxdb_1 | 888 88888b. 888888 888 888 888 888 888 888 888 8888888K.
influxdb_1 | 888 888 "88b 888 888 888 888 Y8bd8P' 888 888 888 "Y88b
influxdb_1 | 888 888 888 888 888 888 888 X88K 888 888 888 888
influxdb_1 | 888 888 888 888 888 Y88b 888 .d8""8b. 888 .d88P 888 d88P
influxdb_1 | 8888888 888 888 888 888 "Y88888 888 888 8888888P" 8888888P"
influxdb_1 |
cadvisor_1 | I1025 17:30:59.228909 1 machine.go:50] Couldn't collect info from any of the files in "/rootfs/etc/machine-id,/var/lib/dbus/machine-id"
cadvisor_1 | I1025 17:30:59.229080 1 manager.go:166] Machine: {NumCores:2 CpuFrequency:2592000 MemoryCapacity:1569599488 MachineID: SystemUUID:B63CB367-870F-4E48-917F-7E524C2C67A0 BootID:e225b37a-b8e6-466b-9f67-84b74df8e90c Filesystems:[{Device:/dev/dm-0 Capacity:41092214784} {Device:/dev/sda1 Capacity:246755328}] DiskMap:map[252:0:{Name:dm-0 Major:252 Minor:0 Size:41884319744 Scheduler:none} 252:1:{Name:dm-1 Major:252 Minor:1 Size:805306368 Scheduler:none} 8:0:{Name:sda Major:8 Minor:0 Size:42949672960 Scheduler:deadline}] NetworkDevices:[{Name:br-067c518abd1f MacAddress:02:42:78:41:c0:71 Speed:0 Mtu:1500} {Name:br-1c136984ac6d MacAddress:02:42:0c:dc:89:ac Speed:0 Mtu:1500} {Name:br-9b7560132352 MacAddress:02:42:5c:df:9a:43 Speed:0 Mtu:1500} {Name:eth0 MacAddress:08:00:27:c7:ba:b5 Speed:1000 Mtu:1500} {Name:eth1 MacAddress:08:00:27:51:9c:7e Speed:1000 Mtu:1500}] Topology:[{Id:0 Memory:1569599488 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:6291456 Type:Unified Level:3}]} {Id:1 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:6291456 Type:Unified Level:3}]}] Caches:[]}] CloudProvider:Unknown InstanceType:Unknown}
cadvisor_1 | I1025 17:30:59.229884 1 manager.go:172] Version: {KernelVersion:4.2.0-42-generic ContainerOsVersion:Alpine Linux v3.2 DockerVersion:1.12.1 CadvisorVersion:0.20.5 CadvisorRevision:9aa348f}
influxdb_1 | [run] 2016/10/25 17:30:58 InfluxDB starting, version 0.13.0, branch 0.13, commit e57fb88a051ee40fd9277094345fbd47bb4783ce
influxdb_1 | [run] 2016/10/25 17:30:58 Go version go1.6.2, GOMAXPROCS set to 2
influxdb_1 | [run] 2016/10/25 17:30:58 Using configuration at: /config/config.toml
influxdb_1 | [store] 2016/10/25 17:30:58 Using data dir: /data/data
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL starting with 10485760 segment size
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL writing to /data/wal/_internal/monitor/1
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL starting with 10485760 segment size
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL writing to /data/wal/cadvisor/default/2
influxdb_1 | [filestore] 2016/10/25 17:30:58 /data/data/_internal/monitor/1/000000001-000000001.tsm (#0) opened in 1.243404ms
influxdb_1 | [cacheloader] 2016/10/25 17:30:58 reading file /data/wal/_internal/monitor/1/_00001.wal, size 1777379
influxdb_1 | [filestore] 2016/10/25 17:30:58 /data/data/cadvisor/default/2/000000001-000000001.tsm (#0) opened in 1.725916ms
influxdb_1 | [cacheloader] 2016/10/25 17:30:58 reading file /data/wal/cadvisor/default/2/_00001.wal, size 4130244
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL starting with 10485760 segment size
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL writing to /data/wal/_internal/monitor/3
influxdb_1 | [cacheloader] 2016/10/25 17:30:58 reading file /data/wal/_internal/monitor/3/_00001.wal, size 1097258
cadvisor_1 | E1025 17:30:59.248299 1 manager.go:208] Docker container factory registration failed: docker found, but not using native exec driver.
cadvisor_1 | I1025 17:30:59.262682 1 factory.go:94] Registering Raw factory
cadvisor_1 | I1025 17:30:59.327660 1 manager.go:1000] Started watching for new ooms in manager
cadvisor_1 | W1025 17:30:59.327883 1 manager.go:239] Could not configure a source for OOM detection, disabling OOM events: exec: "journalctl": executable file not found in $PATH
cadvisor_1 | I1025 17:30:59.328250 1 manager.go:252] Starting recovery of all containers
cadvisor_1 | I1025 17:30:59.371456 1 manager.go:257] Recovery completed
cadvisor_1 | I1025 17:30:59.395792 1 cadvisor.go:106] Starting cAdvisor version: 0.20.5-9aa348f on port 8080
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/cadvisor/default/2/_00002.wal, size 2232957
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/_internal/monitor/3/_00002.wal, size 197651
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/_internal/monitor/3/_00003.wal, size 0
influxdb_1 | [shard] 2016/10/25 17:30:59 /data/data/_internal/monitor/3 database index loaded in 1.387775ms
influxdb_1 | [store] 2016/10/25 17:30:59 /data/data/_internal/monitor/3 opened in 865.976354ms
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/_internal/monitor/1/_00004.wal, size 0
influxdb_1 | [shard] 2016/10/25 17:30:59 /data/data/_internal/monitor/1 database index loaded in 3.29894ms
influxdb_1 | [store] 2016/10/25 17:30:59 /data/data/_internal/monitor/1 opened in 896.765569ms
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/cadvisor/default/2/_00003.wal, size 444696
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/cadvisor/default/2/_00004.wal, size 0
influxdb_1 | [shard] 2016/10/25 17:30:59 /data/data/cadvisor/default/2 database index loaded in 2.465579ms
influxdb_1 | [store] 2016/10/25 17:30:59 /data/data/cadvisor/default/2 opened in 981.523781ms
influxdb_1 | [subscriber] 2016/10/25 17:30:59 opened service
influxdb_1 | [monitor] 2016/10/25 17:30:59 Starting monitor system
influxdb_1 | [monitor] 2016/10/25 17:30:59 'build' registered for diagnostics monitoring
influxdb_1 | [monitor] 2016/10/25 17:30:59 'runtime' registered for diagnostics monitoring
influxdb_1 | [monitor] 2016/10/25 17:30:59 'network' registered for diagnostics monitoring
influxdb_1 | [monitor] 2016/10/25 17:30:59 'system' registered for diagnostics monitoring
influxdb_1 | [cluster] 2016/10/25 17:30:59 Starting cluster service
influxdb_1 | [shard-precreation] 2016/10/25 17:30:59 Starting precreation service with check interval of 10m0s, advance period of 30m0s
influxdb_1 | [snapshot] 2016/10/25 17:30:59 Starting snapshot service
influxdb_1 | [copier] 2016/10/25 17:30:59 Starting copier service
influxdb_1 | [admin] 2016/10/25 17:30:59 Starting admin service
influxdb_1 | [admin] 2016/10/25 17:30:59 Listening on HTTP: [::]:8083
influxdb_1 | [continuous_querier] 2016/10/25 17:30:59 Starting continuous query service
influxdb_1 | [httpd] 2016/10/25 17:30:59 Starting HTTP service
influxdb_1 | [httpd] 2016/10/25 17:30:59 Authentication enabled: false
influxdb_1 | [httpd] 2016/10/25 17:30:59 Listening on HTTP: [::]:8086
influxdb_1 | [retention] 2016/10/25 17:30:59 Starting retention policy enforcement service with check interval of 30m0s
influxdb_1 | [run] 2016/10/25 17:30:59 Listening for signals
influxdb_1 | [monitor] 2016/10/25 17:30:59 Storing statistics in database '_internal' retention policy 'monitor', at interval 10s
influxdb_1 | 2016/10/25 17:30:59 Sending anonymous usage statistics to m.influxdb.com

Appendix: Error message ”

Appendix B: ‘Waiting for confirmation of InfluxDB service startup’

After issuing the command

docker-compose up

I had hit a problem described here, that was caused by using a Vagrant synced folder as working directory.

vagrant@openshift-installer /vagrant/Monitoring/docker-monitoring_master $ docker-compose up
Starting dockermonitoringmaster_influxdbData_1
Starting dockermonitoringmaster_influxdb_1
Starting dockermonitoringmaster_cadvisor_1
Starting dockermonitoringmaster_grafana_1
Attaching to dockermonitoringmaster_influxdbData_1, dockermonitoringmaster_influxdb_1, dockermonitoringmaster_grafana_1, dockermonitoringmaster_cadvisor_1
dockermonitoringmaster_influxdbData_1 exited with code 0
influxdb_1      | => Starting InfluxDB in background ...
influxdb_1      | => Waiting for confirmation of InfluxDB service startup ...
influxdb_1      |
influxdb_1      |  8888888           .d888 888                   8888888b.  888888b.
influxdb_1      |    888            d88P"  888                   888  "Y88b 888  "88b
influxdb_1      |    888            888    888                   888    888 888  .88P
influxdb_1      |    888   88888b.  888888 888 888  888 888  888 888    888 8888888K.
influxdb_1      |    888   888 "88b 888    888 888  888  Y8bd8P' 888    888 888  "Y88b
influxdb_1      |    888   888  888 888    888 888  888   X88K   888    888 888    888
influxdb_1      |    888   888  888 888    888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
influxdb_1      |  8888888 888  888 888    888  "Y88888 888  888 8888888P"  8888888P"
influxdb_1      |
influxdb_1      | 2016/10/28 12:34:49 InfluxDB starting, version 0.9.6.1, branch 0.9.6, commit 6d3a8603cfdaf1a141779ed88b093dcc5c528e5e, built 2015-12-10T23:40:23+0000
influxdb_1      | 2016/10/28 12:34:49 Go version go1.4.2, GOMAXPROCS set to 2
influxdb_1      | 2016/10/28 12:34:49 Using configuration at: /config/config.toml
influxdb_1      | [metastore] 2016/10/28 12:34:49 Using data dir: /data/meta
influxdb_1      | [retention] 2016/10/28 12:34:49 retention policy enforcement terminating
influxdb_1      | [monitor] 2016/10/28 12:34:49 shutting down monitor system
influxdb_1      | [handoff] 2016/10/28 12:34:49 shutting down hh service
influxdb_1      | [subscriber] 2016/10/28 12:34:49 closed service
influxdb_1      | run: open server: open meta store: raft: new bolt store: invalid argument
grafana_1       | 2016/10/28 12:34:50 [I] Starting Grafana
grafana_1       | 2016/10/28 12:34:50 [I] Version: 2.6.0, Commit: v2.6.0, Build date: 2015-12-14 14:18:01 +0000 UTC
grafana_1       | 2016/10/28 12:34:50 [I] Configuration Info
grafana_1       | Config files:
grafana_1       |   [0]: /usr/share/grafana/conf/defaults.ini
grafana_1       |   [1]: /etc/grafana/grafana.ini
grafana_1       | Command lines overrides:
grafana_1       |   [0]: default.paths.data=/var/lib/grafana
grafana_1       |   [1]: default.paths.logs=/var/log/grafana
grafana_1       | Paths:
grafana_1       |   home: /usr/share/grafana
grafana_1       |   data: /var/lib/grafana
grafana_1       |   logs: /var/log/grafana
grafana_1       |
grafana_1       | 2016/10/28 12:34:50 [I] Database: sqlite3
grafana_1       | 2016/10/28 12:34:50 [I] Migrator: Starting DB migration
grafana_1       | 2016/10/28 12:34:50 [I] Listen: http://0.0.0.0:3000
cadvisor_1      | I1028 12:34:50.214917       1 storagedriver.go:42] Using backend storage type "influxdb"
cadvisor_1      | I1028 12:34:50.215243       1 storagedriver.go:44] Caching stats in memory for 2m0s
cadvisor_1      | I1028 12:34:50.215376       1 manager.go:131] cAdvisor running in container: "/docker/2da85f53aaf23024eb2016dc330b05634972252eea2f230831e3676ad3b6fa73"
cadvisor_1      | I1028 12:34:50.238721       1 fs.go:107] Filesystem partitions: map[/dev/dm-0:{mountpoint:/rootfs major:252 minor:0 fsType: blockSize:0} /dev/sda1:{mountpoint:/rootfs/boot major:8 minor:1 fsType: blockSize:0}]
cadvisor_1      | I1028 12:34:50.249690       1 machine.go:50] Couldn't collect info from any of the files in "/rootfs/etc/machine-id,/var/lib/dbus/machine-id"
cadvisor_1      | I1028 12:34:50.249806       1 manager.go:166] Machine: {NumCores:2 CpuFrequency:2592000 MemoryCapacity:1569599488 MachineID: SystemUUID:B63CB367-870F-4E48-917F-7E524C2C67A0 BootID:e225b37a-b8e6-466b-9f67-84b74df8e90c Filesystems:[{Device:/dev/dm-0 Capacity:41092214784} {Device:/dev/sda1 Capacity:246755328}] DiskMap:map[252:0:{Name:dm-0 Major:252 Minor:0 Size:41884319744 Scheduler:none} 252:1:{Name:dm-1 Major:252 Minor:1 Size:805306368 Scheduler:none} 8:0:{Name:sda Major:8 Minor:0 Size:42949672960 Scheduler:deadline}] NetworkDevices:[{Name:br-067c518abd1f MacAddress:02:42:78:41:c0:71 Speed:0 Mtu:1500} {Name:br-1c136984ac6d MacAddress:02:42:0c:dc:89:ac Speed:0 Mtu:1500} {Name:br-3b100a8c826a MacAddress:02:42:11:2c:a0:4c Speed:0 Mtu:1500} {Name:br-5573a4076799 MacAddress:02:42:97:14:9a:fc Speed:0 Mtu:1500} {Name:br-9b7560132352 MacAddress:02:42:5c:df:9a:43 Speed:0 Mtu:1500} {Name:eth0 MacAddress:08:00:27:c7:ba:b5 Speed:1000 Mtu:1500} {Name:eth1 MacAddress:08:00:27:51:9c:7e Speed:1000 Mtu:1500}] Topology:[{Id:0 Memory:1569599488 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:6291456 Type:Unified Level:3}]} {Id:1 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:6291456 Type:Unified Level:3}]}] Caches:[]}] CloudProvider:Unknown InstanceType:Unknown}
cadvisor_1      | I1028 12:34:50.251115       1 manager.go:172] Version: {KernelVersion:4.2.0-42-generic ContainerOsVersion:Alpine Linux v3.2 DockerVersion:1.12.1 CadvisorVersion:0.20.5 CadvisorRevision:9aa348f}
cadvisor_1      | E1028 12:34:50.273526       1 manager.go:208] Docker container factory registration failed: docker found, but not using native exec driver.
cadvisor_1      | I1028 12:34:50.279684       1 factory.go:94] Registering Raw factory
cadvisor_1      | I1028 12:34:50.316816       1 manager.go:1000] Started watching for new ooms in manager
cadvisor_1      | W1028 12:34:50.316960       1 manager.go:239] Could not configure a source for OOM detection, disabling OOM events: exec: "journalctl": executable file not found in $PATH
cadvisor_1      | I1028 12:34:50.317927       1 manager.go:252] Starting recovery of all containers
cadvisor_1      | I1028 12:34:50.336674       1 manager.go:257] Recovery completed
cadvisor_1      | I1028 12:34:50.352618       1 cadvisor.go:106] Starting cAdvisor version: 0.20.5-9aa348f on port 8080
influxdb_1      | => Waiting for confirmation of InfluxDB service startup ...
influxdb_1      | => Waiting for confirmation of InfluxDB service startup ...
influxdb_1      | => Waiting for confirmation of InfluxDB service startup ...
influxdb_1      | => Waiting for confirmation of InfluxDB service startup ...

To confirm the issue, you can try to connect to port 8083 in a different window on the docker host:

(docker host) $ curl --retry 10 --retry-delay 5 -v http://localhost:8083
* Rebuilt URL to: http://localhost:8083/
* Hostname was NOT found in DNS cache
*   Trying ::1...
* Connected to localhost (::1) port 8083 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8083
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

I.e. there is a TCP RST on the port.

Reason:

The reason of this problem lies in a problem with Vagrant synced folders of type “vboxsf”.

Workaround 1a: do not use synced folders

The problem disappears, if you clone the repository into a non-synced folder.

Workaround 1b: use synced folders of different type

The problem disappears, if you use a synced folder of different type. I have tested to use Vagrant synched folder of type “smb”.

  • add the line
     config.vm.synced_folder ".", "/vagrant", type: "smb"

    to the Vagrantfile inside the configure section

  • start CMD as Administrator. E.g. run
     runas.exe /savecred /user:Administrator "cmd"

    in a non-priviledged CMD

  • run
    vagrant up

    in the privideged CMD session.

After that, you can ssh into the docker host system, and clone the repository and run docker-compose up like starting in step 1 without hitting the InfluxDB problem.

Workaround 2: upgrade InfluxDB to 0.13

The problem disappears even, if we use vboxfs synced folders, if we update InfluxDN to 0.13:

I have found this InfluxDB 0.9 issue with the same symptoms. This is, why I have tried to upgrade, still working within the Vagrant synced folder /vagrant.

Step 1: Upgrade InfluxDB

In the docker-compose.yml file replace

influxdb:
 image: tutum/influxdb:0.9

by

influxdb:
 image: tutum/influxdb:0.13
Step 2: remove ./data folder (important! Otherwise, the problem will persist!)
Step 3: Try again:

$ docker-compose up

Starting dockermonitoringrepaired_influxdbData_1
Starting dockermonitoringrepaired_influxdb_1
Starting dockermonitoringrepaired_grafana_1
Starting dockermonitoringrepaired_cadvisor_1
...
influxdb_1 | [monitor] 2016/10/25 16:48:44 Storing statistics in database '_internal' retention policy 'monitor', at interval 10s
influxdb_1 | 2016/10/25 16:48:44 Sending anonymous usage statistics to m.influxdb.com
influxdb_1 | [run] 2016/10/25 16:48:44 Listening for signals
Step 4: CURL-Test

Now, the curl test is successful:

$ curl --retry 10 --retry-delay 5 -v http://localhost:8083
* Rebuilt URL to: http://localhost:8083/
* Hostname was NOT found in DNS cache
* Trying ::1...
* Connected to localhost (::1) port 8083 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
...
</body>

</html>
* Connection #0 to host localhost left intact

Also this is successful now.

Appendix: Error: load error nokogiri/nokogiri LoadError Vagrant

This is an error I have encountered after installing Vagrant 1.8.1 on Windows 10 and …