Posts /

Superset Deployment

Twitter Facebook Google+
07 Apr 2017

Literate Programming with Jupyter Notebooks

The following will look like a relatively normal article. This is one of the benfits I want to illustrate here. Jupyter Notebooks, which was used to write this, provide an executable, documentable, multi-format exportable evironment for programming work.

You can inspect, download, use, change and critique everything you see here at: https://github.com/aaronmyatt/superset_deployment

I have used Jupyter Notebooks since I first learned about them, I saw an immediate place for them in my day to day workflow. Yet, I have yearned to find a way to integrate the notebooks into my work beyond an excellent experimental and exploratory playground.

What follows is an effort to extract more utility out of this tool. By the end of this tutorial I hope you will come away with several things:

  1. How Jupyter Notebooks look and feel
  2. How the Ansible provisioning tool works
  3. How Jupyter Notebooks can be used to develop literate programming workflows that encourage iteration, experimentation, self-documentation and a reproducible, transparent, sharable outcome.

Superset Provisioning Script

We will build out an Ansible deployment script for Airbnb’s amazing data visualization tool called Superset.

I will not explain too much about the tools employed here, instead I will rely on the workflows micro iterations scattered with light commentary to illustrate the tools functionality. Let us begin.

Get the repo using Ansibles Git module

http://docs.ansible.com/ansible/git_module.html

Command Line

Wherever you see the above heading, we will test the Ansible module, via Jupyter, using a bash command. You could download and install Ansible and run any of these commands in your own terminal - assuming you’re on a Unix machine!

%%bash
ansible localhost -m git -a "repo=https://github.com/airbnb/superset.git dest=~/experiment/superset"
localhost | SUCCESS => {
    "after": "9ba5b49d8ac197a5ba908b229bd9061ce98c5fca", 
    "before": null, 
    "changed": true, 
    "warnings": []
}


 [WARNING]: Host file not found: /usr/local/etc/ansible/hosts
 [WARNING]: provided hosts list is empty, only localhost is available

Call a module (-m) with arguments (-a).

Playbook

Assuming the command line trial meets our expectations we can next build the task into Ansible’s reproducible Playbook format directly from Jupyter.

%%writefile deploy_superset.yml

---
- hosts: localhost

  tasks:
    - name: clone Superset git repo
      git:
        repo=https://github.com/airbnb/superset.git
        dest=~/experiment.superset
Overwriting deploy_superset.yml

One critical part of the above bash and playbook examples is the hosts, or localhost in both cases. This is how ansible determines where the tasks should be executed. We could easily replace localhost with a remote IP and, provided you have SSH access and the server has Python installed, Ansible would happily execute the same commands there - or anywhere!

Test

Finally, for confidence, we can use Ansibles command line tool to check the playbook works properly and that we don’t have any syntax errors. You might wonder, what would happen if we just executed the playbook without the -C argument - in most cases, nothing! Ansible is idempotent which means that it will not redundently execute tasks if the requirements are already satisfied. So we can execute Ansible commands repetatively without fear.

Let’s make sure we’re in the correct directory before proceeding - a simple thing with Jupyter notebooks! Don’t forget to change any paths if you decide to follow along at home.

cd ~/experiment/deploy_superset/
/Users/lsp/experiment/deploy_superset
%%bash
ansible-playbook -C deploy_superset.yml
 __________________ 
< PLAY [localhost] >
 ------------------ 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

 ______________ 
< TASK [setup] >
 -------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

ok: [localhost]
 ________________________________ 
< TASK [clone Superset git repo] >
 -------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

changed: [localhost]
 ____________ 
< PLAY RECAP >
 ------------ 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

localhost                  : ok=2    changed=1    unreachable=0    failed=0   



 [WARNING]: Host file not found: /usr/local/etc/ansible/hosts
 [WARNING]: provided hosts list is empty, only localhost is available

Ok, we have the repo downloaded locally, what’s next? Let’s consult the Superset documentation.

Install system dependencies with Homebrew

http://docs.ansible.com/ansible/homebrew_module.html

Command Line

I will write this script for execution on a Mac but we could quite easily customise the script to work on any platform.

%%bash
ansible localhost -m homebrew -a "name=pkg-config"
localhost | SUCCESS => {
    "changed": false, 
    "msg": "Package already installed: pkg-config"
}


 [WARNING]: Host file not found: /usr/local/etc/ansible/hosts
 [WARNING]: provided hosts list is empty, only localhost is available

Since I already have homebrew and use it regularly, this package is already installed. Now we will see one of the limitations of using the command line - I need to perform the same ansible action 4 times!

%%bash
ansible localhost -m homebrew -a "name=libffi"
localhost | SUCCESS => {
    "changed": true, 
    "msg": "Package installed: libffi"
}


 [WARNING]: Host file not found: /usr/local/etc/ansible/hosts
 [WARNING]: provided hosts list is empty, only localhost is available

Playbook

But, I wont, because that’s slow and frustrating. Let’s use the playbook we’re building to save our effort.

%%writefile deploy_superset.yml

---
- hosts: localhost

  tasks:
    - name: clone Superset git repo
      git:
        repo: https://github.com/airbnb/superset.git
        dest: ~/experiment.superset
    - name: install Superset system dependencies
      homebrew:
        name: ""
        state: present
      with_items:
        - pkg-config
        - libffi
        - openssl
Overwriting deploy_superset.yml

Take note of the simple iteration syntax with_items.

Test

Since we didn’t manually install every package using the command line, let’s run the whole playbook.

%%bash
ansible-playbook deploy_superset.yml
 __________________ 
< PLAY [localhost] >
 ------------------ 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

 ______________ 
< TASK [setup] >
 -------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

ok: [localhost]
 ________________________________ 
< TASK [clone Superset git repo] >
 -------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

changed: [localhost]
 ______________________________________ 
< TASK [install Superset dependencies] >
 -------------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

ok: [localhost] => (item=[u'pkg-config', u'libffi', u'openssl'])
 ____________ 
< PLAY RECAP >
 ------------ 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

localhost                  : ok=3    changed=1    unreachable=0    failed=0   



 [WARNING]: Host file not found: /usr/local/etc/ansible/hosts
 [WARNING]: provided hosts list is empty, only localhost is available

I’ll skip this stdout spam throughout the rest of this tutorial.

Install the Cryptography Python library

Next the docs tell us to install the Python library cryptography. However, it also asks for some temporary environment variables to be set specific to this library which ensure that the library uses the correct installation of Openssl (the one we just installed with Brew).

env LDFLAGS="-L$(brew --prefix openssl)/lib" CFLAGS="-I$(brew --prefix openssl)/include" pip install cryptography

We can easily test the installation of one library using our usual method:

Test

%%bash

ansible localhost -m pip -a "name=cryptography"
localhost | SUCCESS => {
    "changed": true, 
    "cmd": "/usr/local/bin/pip2 install cryptography", 
    "name": [
        "cryptography"
    ], 
    "requirements": null, 
    "state": "present", 
    "stderr": "", 
    "stdout": "Collecting cryptography\n  Downloading cryptography-1.8.1-cp27-cp27m-macosx_10_10_intel.whl (1.8MB)\nCollecting six>=1.4.1 (from cryptography)\n  Using cached six-1.10.0-py2.py3-none-any.whl\nRequirement already satisfied: setuptools>=11.3 in /usr/local/lib/python2.7/site-packages (from cryptography)\nCollecting cffi>=1.4.1 (from cryptography)\n  Downloading cffi-1.10.0-cp27-cp27m-macosx_10_6_intel.whl (226kB)\nCollecting idna>=2.1 (from cryptography)\n  Using cached idna-2.5-py2.py3-none-any.whl\nCollecting enum34 (from cryptography)\n  Downloading enum34-1.1.6-py2-none-any.whl\nCollecting asn1crypto>=0.21.0 (from cryptography)\n  Using cached asn1crypto-0.22.0-py2.py3-none-any.whl\nCollecting ipaddress (from cryptography)\n  Downloading ipaddress-1.0.18-py2-none-any.whl\nCollecting packaging (from cryptography)\n  Using cached packaging-16.8-py2.py3-none-any.whl\nCollecting pycparser (from cffi>=1.4.1->cryptography)\nCollecting pyparsing (from packaging->cryptography)\n  Using cached pyparsing-2.2.0-py2.py3-none-any.whl\nInstalling collected packages: six, pycparser, cffi, idna, enum34, asn1crypto, ipaddress, pyparsing, packaging, cryptography\nSuccessfully installed asn1crypto-0.22.0 cffi-1.10.0 cryptography-1.8.1 enum34-1.1.6 idna-2.5 ipaddress-1.0.18 packaging-16.8 pycparser-2.17 pyparsing-2.2.0 six-1.10.0\n", 
    "stdout_lines": [
        "Collecting cryptography", 
        "  Downloading cryptography-1.8.1-cp27-cp27m-macosx_10_10_intel.whl (1.8MB)", 
        "Collecting six>=1.4.1 (from cryptography)", 
        "  Using cached six-1.10.0-py2.py3-none-any.whl", 
        "Requirement already satisfied: setuptools>=11.3 in /usr/local/lib/python2.7/site-packages (from cryptography)", 
        "Collecting cffi>=1.4.1 (from cryptography)", 
        "  Downloading cffi-1.10.0-cp27-cp27m-macosx_10_6_intel.whl (226kB)", 
        "Collecting idna>=2.1 (from cryptography)", 
        "  Using cached idna-2.5-py2.py3-none-any.whl", 
        "Collecting enum34 (from cryptography)", 
        "  Downloading enum34-1.1.6-py2-none-any.whl", 
        "Collecting asn1crypto>=0.21.0 (from cryptography)", 
        "  Using cached asn1crypto-0.22.0-py2.py3-none-any.whl", 
        "Collecting ipaddress (from cryptography)", 
        "  Downloading ipaddress-1.0.18-py2-none-any.whl", 
        "Collecting packaging (from cryptography)", 
        "  Using cached packaging-16.8-py2.py3-none-any.whl", 
        "Collecting pycparser (from cffi>=1.4.1->cryptography)", 
        "Collecting pyparsing (from packaging->cryptography)", 
        "  Using cached pyparsing-2.2.0-py2.py3-none-any.whl", 
        "Installing collected packages: six, pycparser, cffi, idna, enum34, asn1crypto, ipaddress, pyparsing, packaging, cryptography", 
        "Successfully installed asn1crypto-0.22.0 cffi-1.10.0 cryptography-1.8.1 enum34-1.1.6 idna-2.5 ipaddress-1.0.18 packaging-16.8 pycparser-2.17 pyparsing-2.2.0 six-1.10.0"
    ], 
    "version": null, 
    "virtualenv": null
}


 [WARNING]: Host file not found: /usr/local/etc/ansible/hosts
 [WARNING]: provided hosts list is empty, only localhost is available

Setting the environment variables, however, is less easy. So let’s put them straight into the playbook.

Playbook

%%writefile deploy_superset.yml

---
- hosts: localhost

  tasks:
    - name: clone Superset git repo
      git:
        repo: https://github.com/airbnb/superset.git
        dest: ~/experiment.superset
    - name: install Superset system dependencies
      homebrew:
        name: ""
        state: present
      with_items:
        - pkg-config
        - libffi
        - openssl
    - name: install cryptography with homebrew installed openssl
      pip:
        name: cryptography
      environment:
        LDFLAGS: '-L$(brew --prefix openssl)/lib'
        CFLAGS: '-I$(brew --prefix openssl)/include'
Overwriting deploy_superset.yml

Test

%%bash
ansible-playbook deploy_superset.yml

Install other Python dependencies

Since we’re already familiar with how to install Python dependencies from the Cryptography example, we can add these straight to the playbook with confidence.

%%writefile deploy_superset.yml

---
- hosts: localhost

  tasks:
    - name: clone Superset git repo
      git:
        repo: https://github.com/airbnb/superset.git
        dest: ~/experiment.superset

    - name: install Superset system dependencies
      homebrew:
        name: ""
        state: present
      with_items:
        - pkg-config
        - libffi
        - openssl

    - name: install cryptography with homebrew installed openssl
      pip:
        name: cryptography
        state: present
        executable: ~/miniconda3/bin/pip
      environment:
        LDFLAGS: '-L$(brew --prefix openssl)/lib'
        CFLAGS: '-I$(brew --prefix openssl)/include'

    - name: install superset and postgresql database adapter library
      pip:
        name: superset psycopg2
        state: present
        executable: ~/miniconda3/bin/pip

Overwriting deploy_superset.yml

Test

Don’t forget to test it though!

%%bash

ansible-playbook -C deploy_superset.yml

Finishing touches

Finally, the last few commands that the superset docs require before running the server. These are basic bash commands which Ansible does not provide a specific module for so we should leverage the Ansible Shell module which enables execution of arbitrary bash commands.

%%writefile deploy_superset.yml

---
- hosts: localhost

  tasks:
    - name: clone Superset git repo
      git:
        repo: https://github.com/airbnb/superset.git
        dest: ~/experiment.superset

    - name: install Superset system dependencies
      homebrew:
        name: ""
        state: present
      with_items:
        - pkg-config
        - libffi
        - openssl

    - name: install cryptography with homebrew installed openssl
      pip:
        name: cryptography
        state: present
        executable: ~/miniconda3/bin/pip
      environment:
        LDFLAGS: '-L$(brew --prefix openssl)/lib'
        CFLAGS: '-I$(brew --prefix openssl)/include'

    - name: install superset and postgresql database adapter library
      pip:
        name: superset psycopg2
        state: present
        executable: ~/miniconda3/bin/pip

    - name: run app init commands
      shell: superset 
      args:
        chdir: ~/experiment/superset
      with_items:
        - db upgrade
        - load_examples
        - init

Test

%%bash

ansible-playbook -C deploy_superset.yml

LIve!

Make sure every task has been executed at least once on your machine and run the entire playbook.

%%bash
ansible-playbook deploy_superset.yml

Finally

There are two remaining things we need to do, according to the documentation.

  1. Run: fabmanager create-admin --app superset to setup an initial admin login user
  2. Run the server! superset server

Both of these could be turned into tasks. However, the admin would probably be made by interacting with the database directly and we probably don’t want the server run as a background job on our local machine, so best to control that ourselves for now.


Twitter Facebook Google+