Autoscaling HDInsight Spark Cluster using Unravel API

Prerequisites

  1. Install requests using pip:

  2. Install Azure CLI 1.0 (Azure CLI 2.0 does not support HDinsight cluster) Click Link to see installation instruction
  3.  After install Azure CLI 1.0 Run the following command to login

  4. Once you login to azure you should see existing HDinsight clusters using this command

    info

5. Download the customizable script from here: https://raw.githubusercontent.com/unravel-data/public/master/hdi/unravel-autoscaling/unravel_HDInsight_autoscaling.py

6. Open unravel_HDInsight_autoscaling.py and edit these variables

PropertyNotesExample Value
unravel_base_url
http://localhost:3000/
memory_thresholdscale up/down when memory_usage higher/lower 80%80
cpu_thresholdscale up when cpu_usage higher/lower 10%10
min_nodesmin worker nodes4
max_nodesmax worker nodes can scale up to10
resource_group
UNRAVEL01
cluster_name
estspk2rh75


Run auto scaling script

Below is a screenshot (Operations | Dashboard) showing the autoscaling of incoming jobs over for a period of 48 hours.