Under-utilized Redshift Cluster Nodes

Risk level: High

Rule ID: RS-001

Identify any Amazon Redshift clusters that appear to be under-utilised . Either downsize them or reduce the number of nodes to help lower the cost of your monthly AWS bill.

By default, an AWS Redshift cluster is considered under-utilised when matches the following criteria:

  • The average CPU utilization has been less than 60% for the last 30 days.
  • The total number of ReadIOPS and WriteIOPS registered per day for the last 30 days has been less than 100 on average.

The AWS CloudWatch metrics utilized to detect underused Redshift clusters are:

  • CPUUtilization - the percentage of CPU utilization (Units: Percent).
  • ReadIOPS - The average number of disk read operations per second. (Units: Count/Second)
  • WriteIOPS - The average number of disk write operations per second. (Units: Count/Second)

You can change the default threshold values for this rule on the nOps console and set your own values for CPU utilization, the total number of ReadIOPS and WriteIOPS to configure the underuse level for your Redshift clusters.

This rule can help you work with the AWS Well-Architected Framework

Audit

To identify any underused Redshift clusters provisioned within your AWS account, perform the following:

 

Using AWS Console

1. Sign to the AWS Management Console.
 
2. Navigate to Redshift dashboard at https://console.aws.amazon.com/redshift/.
 
3. In the left navigation panel, under Redshift Dashboard, click Clusters.
 
4. Choose the Redshift cluster that you want to examine then click on its identifier link, listed in the Cluster column.
 
5. On the selected cluster settings page, choose the Cluster Performance tab to access the monitoring panel.
 
6. On the Cluster metrics displayed for the selected cluster, select the following from drop down menus:

  • From the Time Range dropdown list, select Last 30 Days.
  • From the Period list, select Data for every 1 Hour.
  • From the Statistic dropdown list, select Average.

Once the monitoring data is loaded,

a. check the cluster CPU usage for the last 30 days. If the average usage (percent) has been less than 30%, e.g.
 

 
b. ReadIOPS and WriteIOPS metrics are not displayed directly in AWS Redshift Console. To get information about those metrics, go to CloudWatch and perform the following operations

  • Click All Metrics from the side navigation menu and search for Redshift
  • Click on Redshift > Aggregated by Cluster
  • Choose ReadIOPS and WriteIOPS for the cluster you are examining (e.g redshift-cluster-1) and wait for the graphs to load above.


 
As you can see the ReadIOPS and WriteIOPS for the last 30 days are well below 100, the selected Redshift cluster qualifies as candidate for the underutilized cluster.


 
7. Repeat steps no. 4 – 6 to verify the CPU, ReadIOPS and WriteIOPS metrics usage data recorded within the selected time frame (30 days) for the rest of the Redshift clusters available in the current region.
 
8. Change the AWS region from the navigation bar and repeat the audit process for other regions.
 

Using AWS CLI

1. Run describe-clusters command (OSX/Linux/UNIX) using custom query filters to list the IDs of all AWS Redshift clusters created in the selected region:

aws redshift describe-clusters \\
    --region us-east-1 \\
    --query 'Clusters[*].ClusterIdentifier'

 
2. The command output should return all active Redshift Cluster Ids in the region:

[
    "redshift-cluster-1",
    "xcm-dev-analytics"
]

 
3. Run get-metric-statistics command (OSX/Linux/UNIX) to get the statistics recorded by CloudWatch for the CPUUtilization metric which represents the CPU usage of the selected Redshift cluster.

The following command example returns the average CPU utilization for an AWS Redshift cluster identified by the ID redshift-cluster-1, usage data captured during a 30-day time range, using 1 hour time frame as the granularity of the returned datapoints:

aws cloudwatch get-metric-statistics \\
    --region us-east-1 \\
    --metric-name CPUUtilization \\
    --start-time 2021-07-24T00:00:00 \\
    --end-time 2021-08-24T00:00:00 \\
    --period 3600 \\
    --namespace AWS/Redshift \\
    --statistics Average \\
    --dimensions Name=ClusterIdentifier,Value=redshift-cluster-1

 
4. The command output should return the cluster CPU usage details requested:

{
    "Label": "CPUUtilization",
    "Datapoints": [
        {
            "Timestamp": "2021-08-23T15:00:00+00:00",
            "Average": 3.713087740421722,
            "Unit": "Percent"
        },
        {
            "Timestamp": "2021-08-22T11:00:00+00:00",
            "Average": 3.9820199541024768,
            "Unit": "Percent"
        },
				...
				{
            "Timestamp": "2021-08-22T22:00:00+00:00",
            "Average": 3.9409044950757934,
            "Unit": "Percent"
        },
        {
            "Timestamp": "2021-08-23T21:00:00+00:00",
            "Average": 3.7483632284688104,
            "Unit": "Percent"
        }
    ]
}

If the average CPU usage data returned is less than 60%, the selected AWS Redshift cluster qualifies as candidate for the underused cluster.
 
5. Run again get-metric-statistics command (OSX/Linux/UNIX) to get the statistics recorded by AWS CloudWatch for the ReadIOPS metric, representing the number of Read I/O operations per second.

The following command example returns the total number of ReadIOPS used by an Amazon Redshift cluster identified by the name redshift-cluster-1, IOPS usage data captured during a 30-day time period, using 1 hour time range as the granularity of the returned datapoints:

aws cloudwatch get-metric-statistics \\
    --region us-east-1 \\
    --metric-name ReadIOPS \\
    --start-time 2021-07-24T00:00:00 \\
    --end-time 2021-08-24T00:00:00 \\
    --period 3600 \\
    --namespace AWS/Redshift \\
    --statistics Sum \\
    --dimensions Name=ClusterIdentifier,Value=redshift-cluster-1

 
6. The command output should return the ReadIOPS usage details requested:

{
    "Label": "ReadIOPS",
    "Datapoints": [
        {
            "Timestamp": "2021-08-23T07:00:00+00:00",
            "Sum": 0.0,
            "Unit": "Count/Second"
        },
        {
            "Timestamp": "2021-08-23T02:00:00+00:00",
            "Sum": 4.399962222851841,
            "Unit": "Count/Second"
        },
        {
            "Timestamp": "2021-08-22T13:00:00+00:00",
            "Sum": 0.0,
            "Unit": "Count/Second"
        }
				...
				{
            "Timestamp": "2021-08-23T18:00:00+00:00",
            "Sum": 4.266951149042095,
            "Unit": "Count/Second"
        },
        {
            "Timestamp": "2021-08-23T13:00:00+00:00",
            "Sum": 0.0,
            "Unit": "Count/Second"
        }
    ]
}

If the total number of ReadIOPS has been less than 100 in the last 30 days, the selected Redshift cluster qualifies as candidate for the underutilized cluster.
 
7. Run get-metric-statistics command (OSX/Linux/UNIX) to get the statistics recorded by Amazon CloudWatch for the WriteIOPS metric, representing the number of Write I/O operations per second.

The following command example returns the total number of WriteIOPS used by an AWS Redshift cluster identified by the name redshift-cluster-1, usage data captured during a 30-day time range, using 1 hour time period as the granularity of the returned datapoints:

aws cloudwatch get-metric-statistics \\
    --region us-east-1 \\
    --metric-name WriteIOPS \\
    --start-time 2021-07-24T00:00:00 \\
    --end-time 2021-08-24T00:00:00 \\
    --period 3600 \\
    --namespace AWS/Redshift \\
    --statistics Sum \\
    --dimensions Name=ClusterIdentifier,Value=redshift-cluster-1

 
8. The command output should return the WriteIOPS usage details requested:

{
    "Label": "WriteIOPS",
    "Datapoints": [
        {
            "Timestamp": "2021-08-23T07:00:00+00:00",
            "Sum": 0.0,
            "Unit": "Count/Second"
        },
        {
            "Timestamp": "2021-08-23T02:00:00+00:00",
            "Sum": 8.133265556685167,
            "Unit": "Count/Second"
        },
				...
				{
            "Timestamp": "2021-08-23T18:00:00+00:00",
            "Sum": 7.867191181046362,
            "Unit": "Count/Second"
        },
        {
            "Timestamp": "2021-08-23T13:00:00+00:00",
            "Sum": 0.0,
            "Unit": "Count/Second"
        }
    ]
}

If the total number of WriteIOPS has been less than 100 within the past 30 days, the selected Redshift instance qualifies as candidate for the underused database instance.
 
9. If the usage data returned at steps no. 3 - 8 satisfy the conditions set by the nOps rules, the selected Redshift cluster is considered "underutilized" and should be resized in order to reduce your AWS Redshift usage costs.
 
10. Repeat steps no. 3 – 8 to verify the CPU, ReadIOPS and WriteIOPS metrics usage data recorded within the selected time range for the rest of the Redshift clusters provisioned in the current region.
 
11. Change the AWS region by updating the --region command parameter value and repeat steps no. 1 - 9 to perform the entire audit process for other regions.
 

Remediation / Resolution

To resize any Amazon Redshift cluster that is currently running in "underutilized" mode, perform the following actions:

 

Using AWS Console

1. Sign to the AWS Management Console.
 
2. Navigate to Redshift dashboard at https://console.aws.amazon.com/redshift/.
 
3. In the navigation panel, under Redshift Dashboard, click Clusters.
 
4. Select the Redshift cluster that you want to resize then click on its identifier link, listed in the Cluster column (see Audit section part I to identify the right resource).
 
5. Click the Actions dropdown button from the dashboard top menu and select Create Snapshot.


 
6. Inside the Create Snapshot dialog box,

a. enter a unique name for your cluster snapshot in the Snapshot Identifier box

b. Specify a Retention period

c. click Create to take the snapshot.

The process may take several minutes. Once the snapshot is created it will be listed on your Redshift Snapshots page under Maintenance tab.


 
7. Select the Amazon Redshift cluster snapshot created at step no. 6.
 
8. Click Restore From Snapshot.
 
9. In the Restore From Snapshot options, choose the following:

a. Select the node type to downsize to (e.g. dc1.large) from the Node Type dropdown list.

b. In the Cluster Identifier box, enter a unique name (e.g redshift-cluster-2)for the new (downsized) Redshift cluster.

c. Reduce the number of Nodes if needed. It's recommended to reduce by halves to ensure the data is distributed evenly.

d. Configure the rest of the options (Cluster Parameter Group, Availability Zone, VPC Security Groups, etc) based on the configuration information taken from the original Redshift cluster.

e. Click Restore Cluster from Snapshot to create the new (resized) Redshift cluster.

 
10. As soon as the build process is complete, update your application configuration to refer to the new cluster endpoint, e.g: redshift-cluster-2.ciuhjksagsoe.us-east-1.redshift.amazonaws.com
 
11. Once the Redshift cluster endpoint is changed within your application configuration, you can remove the initial (original) cluster from your AWS account by performing the following actions:

a. In the navigation panel, under Redshift Dashboard, click Clusters.

b. Choose the Redshift cluster that you want to remove, then click on its identifier link listed in the Cluster column.

c. From the Actions dropdown button, select Delete.

d. Inside the Delete Cluster dialog box, enter a unique name for the final snapshot in the Snapshot name box then click Delete Cluster to confirm the action. Once the snapshot is created the selected cluster removal process begins.


 
12. Repeat steps no. 4 - 12 to downsize (resize) any other underutilized Amazon Redshift clusters provisioned within the current region.
 
13. Change the AWS region from the navigation bar and repeat the entire remediation process for other regions.
 

Using AWS CLI

1. Run describe-clusters command (OSX/Linux/UNIX) to describe the configuration information of the AWS Redshift cluster that you want to downsize (see Audit section part II to identify the right cluster):

aws redshift describe-clusters \\
    --region us-east-1 \\
    --cluster-identifier redshift-cluster-1

 
2. The command output should return the requested configuration metadata, information that will be useful later when the new Redshift cluster will be created:

{
    "Clusters": [
        {
            "ClusterIdentifier": "redshift-cluster-1",
            "NodeType": "dc2.large",
            "ClusterStatus": "available",
            "ClusterAvailabilityStatus": "Available",
            "MasterUsername": "awsuser",
            "DBName": "dev",
            "Endpoint": {
                "Address": "redshift-cluster-1.ciuhjksagsoe.us-east-1.redshift.amazonaws.com",
                "Port": 5439
						},
            "ClusterCreateTime": "2021-08-22T10:33:03.468000+00:00",
            "AutomatedSnapshotRetentionPeriod": 1,
						...
						],
            "ClusterRevisionNumber": "29551",
            "Tags": [],
            "EnhancedVpcRouting": false,
            "IamRoles": [],
            "MaintenanceTrackName": "current",
            "ElasticResizeNumberOfNodeOptions": "[4]",
            "DeferredMaintenanceWindows": [],
            "NextMaintenanceWindowStartTime": "2021-08-29T00:00:00+00:00",
            "AvailabilityZoneRelocationStatus": "disabled",
            "ClusterNamespaceArn": "arn:aws:redshift:us-east-1:695292474035:namespace:2c061636-f6ce-4094-8cee-e5c87ac2f8f0",
            "TotalStorageCapacityInMegaBytes": 800000,
            "AquaConfiguration": {
                "AquaStatus": "disabled",
                "AquaConfigurationStatus": "auto"
            }
        }
    ]
}

 
3. Run create-cluster-snapshot command (OSX/Linux/UNIX) to create a manual snapshot of the existing Redshift cluster:

aws redshift create-cluster-snapshot \\
    --region us-east-1 \\
    --cluster-identifier redshift-cluster-1 \\
    --snapshot-identifier redshift-cluster-1-snapshot-20210823

 
4. The command output should return the snapshot configuration metadata:

{
    "Snapshot": {
        "SnapshotIdentifier": "redshift-cluster-1-snapshot-20210823",
        "ClusterIdentifier": "redshift-cluster-1",
        "SnapshotCreateTime": "2021-08-25T19:40:48.573000+00:00",
        "Status": "creating",
        "Port": 5439,
        "AvailabilityZone": "us-east-1b",
        "ClusterCreateTime": "2021-08-22T10:33:03.468000+00:00",
        "MasterUsername": "awsuser",
        "ClusterVersion": "1.0",
        "EngineFullVersion": "1.0.29551",
        "SnapshotType": "manual",
        "NodeType": "dc2.large",
        "NumberOfNodes": 2,
        "DBName": "dev",
        "VpcId": "vpc-53ed4929",
        "Encrypted": false,
        "EncryptedWithHSM": false,
        "OwnerAccount": "695292474035",
        "TotalBackupSizeInMegaBytes": -1.0,
        "ActualIncrementalBackupSizeInMegaBytes": -1.0,
        "BackupProgressInMegaBytes": 0.0,
        "CurrentBackupRateInMegaBytesPerSecond": 0.0,
        "EstimatedSecondsToCompletion": -1,
        "ElapsedTimeInSeconds": 0,
        "Tags": [],
        "EnhancedVpcRouting": false,
        "MaintenanceTrackName": "current",
        "ManualSnapshotRetentionPeriod": -1
    }
}

 
5. Now run restore-from-cluster-snapshot command (OSX/Linux/UNIX) to create a new AWS Redshift cluster from the snapshot created at step no. 3, using the configuration information returned at step no. 2 and the desired cluster node type name and number of nodes as parameter:

aws redshift restore-from-cluster-snapshot \\
    --region us-east-1 \\
    --cluster-identifier redshift-cluster-2 \\
    --snapshot-identifier redshift-cluster-1-snapshot-20210823 \\
    --node-type dc1.large \\
    --vpc-security-group-ids sg-06d93f48 \\
    --cluster-subnet-group-name default \\
    --availability-zone us-east-1a \\
    --cluster-parameter-group-name default.redshift-1.0 \\
    --publicly-accessible

 
6. The command output should return the metadata of the new (downsized) Redshift cluster:

{
    "Cluster": {
        "ClusterIdentifier": "redshift-cluster-2",
        "NodeType": "dc1.large",
        "ClusterStatus": "creating",
        "ClusterAvailabilityStatus": "Modifying",
        "MasterUsername": "awsuser",
        "DBName": "dev",
        "AutomatedSnapshotRetentionPeriod": 1,
        "ManualSnapshotRetentionPeriod": -1,
        "ClusterSecurityGroups": [],
        "VpcSecurityGroups": [
            {
                "VpcSecurityGroupId": "sg-06d93f48",
                "Status": "active"
            }
        ],
        "ClusterParameterGroups": [
            {
                "ParameterGroupName": "default.redshift-1.0",
                "ParameterApplyStatus": "in-sync"
            }
        ],
        "ClusterSubnetGroupName": "default",
        "VpcId": "vpc-53ed4929",
        "AvailabilityZone": "us-east-1a",
        "PreferredMaintenanceWindow": "sun:00:00-sun:00:30",
        "PendingModifiedValues": {},
        "ClusterVersion": "1.0",
        "AllowVersionUpgrade": true,
        "NumberOfNodes": 2,
        "PubliclyAccessible": true,
        "Encrypted": false,
        "Tags": [],
        "EnhancedVpcRouting": false,
        "IamRoles": [],
        "MaintenanceTrackName": "current",
        "DeferredMaintenanceWindows": [],
        "NextMaintenanceWindowStartTime": "2021-08-29T00:00:00+00:00",
        "AquaConfiguration": {
            "AquaStatus": "disabled",
            "AquaConfigurationStatus": "auto"
        }
    }
}

 
7. Run describe-clusters command (OSX/Linux/UNIX) using the appropriate query filters to describe the new Redshift cluster endpoint:

aws redshift describe-clusters \\
    --region us-east-1 \\
    --cluster-identifier redshift-cluster-2
    --query 'Clusters[*].Endpoint.Address'

 
8. The command output should return the new cluster endpoint URL:

redshift-cluster-2.ciuhjksagsoe.us-east-1.redshift.amazonaws.com

 
9. Once the build process is complete, update your application configuration to point to the AWS Redshift cluster endpoint address returned at step no. 8.
 
10. Once the Redshift cluster endpoint is changed within your application configuration, run delete-cluster command (OSX/Linux/UNIX) to remove the original Redshift cluster from your AWS account:

aws redshift delete-cluster \\
    --region us-east-1 \\
    --cluster-identifier redshift-cluster-1 \\
    --final-cluster-snapshot-identifier redshift-cluster-1-final-snapshot

 
11. The command output should return the metadata of the cluster selected for deletion:

{
    "Cluster": {
        "ClusterIdentifier": "redshift-cluster-1",
        "NodeType": "dc2.large",
        "ClusterStatus": "final-snapshot",
        "ClusterAvailabilityStatus": "Modifying",
        "MasterUsername": "awsuser",
        "DBName": "dev",
        "Endpoint": {
            "Address": "redshift-cluster-1.ciuhjksagsoe.us-east-1.redshift.amazonaws.com",
            "Port": 5439
        },
        "ClusterCreateTime": "2021-08-22T10:33:03.468000+00:00",
        "AutomatedSnapshotRetentionPeriod": 1,
        "ManualSnapshotRetentionPeriod": -1,
        "ClusterSecurityGroups": [],
        "VpcSecurityGroups": [
            {
                "VpcSecurityGroupId": "sg-06d93f48",
                "Status": "active"
            }
        ],
        "ClusterParameterGroups": [
            {
                "ParameterGroupName": "default.redshift-1.0",
                "ParameterApplyStatus": "in-sync"
            }
        ],
        "ClusterSubnetGroupName": "default",
        "VpcId": "vpc-53ed4929",
        "AvailabilityZone": "us-east-1b",
        "PreferredMaintenanceWindow": "sun:00:00-sun:00:30",
        "PendingModifiedValues": {},
        "ClusterVersion": "1.0",
        "AllowVersionUpgrade": true,
        "NumberOfNodes": 2,
        "PubliclyAccessible": false,
        "Encrypted": false,
        "Tags": [],
        "EnhancedVpcRouting": false,
        "IamRoles": [],
        "MaintenanceTrackName": "current",
        "DeferredMaintenanceWindows": [],
        "NextMaintenanceWindowStartTime": "2021-08-29T00:00:00+00:00",
        "TotalStorageCapacityInMegaBytes": 800000,
        "AquaConfiguration": {
            "AquaStatus": "disabled",
            "AquaConfigurationStatus": "auto"
        }
    }
}

 
12. Repeat steps no. 1 – 11 to downsize (resize) any other underused Redshift clusters available in the selected region.
 
13. Change the AWS region by updating the --region command parameter value and repeat the entire process for other regions.

Still Need Help?

Come see why we are the #1 cloud management platform and why companies like Uber, Dickey’s BBQ Pit and Norwegian Cruise Line trust nOps to manage their cloud.