Autoscaling HDInsight Spark Cluster using Unravel API
Prerequisites
Install requests using pip:
# pip install requests
- Install Azure CLI 1.0 (Azure CLI 2.0 does not support HDinsight cluster) Click Link to see installation instruction
After install Azure CLI 1.0 Run the following command to login
# azure login
Once you login to azure you should see existing HDinsight clusters using this command
# azure hdinsight cluster list
5. Download the customizable script from here: https://raw.githubusercontent.com/unravel-data/public/master/hdi/unravel-autoscaling/unravel_HDInsight_autoscaling.py
6. Open unravel_HDInsight_autoscaling.py
and edit these variables
Property | Notes | Example Value |
---|---|---|
unravel_base_url | http://localhost:3000/ | |
memory_threshold | scale up/down when memory_usage higher/lower 80% | 80 |
cpu_threshold | scale up when cpu_usage higher/lower 10% | 10 |
min_nodes | min worker nodes | 4 |
max_nodes | max worker nodes can scale up to | 10 |
resource_group | UNRAVEL01 | |
cluster_name | estspk2rh75 |
Run auto scaling script
# python unravel_HDInsight_autoscaling.py
Below is a screenshot (Operations | Dashboard) showing the autoscaling of incoming jobs over for a period of 48 hours.