This project is a full monitoring stack that starts an EC2 instance that can monitor the your RDS MySQL clusters with Enhanced Monitoring metrics as well as RDS slow-query and error log flow.
To make the whole system work, you need to install Terraform
and Packer
from HashiCorp.
Follow these instructions if you are new to these tools:
This is a example of how the dashboard looks like:
The dashboard includes all critical part for monitoring RDS dataset traffic. It includes the RDS-traffic monitoring, the CPUUsage, the average task load, the general log input flow and the average connection. Each of them can show some aspect of the RDS traffic, combined together will give you a high level view of the whole traffic. The memory simply shows the memory usage. The read/write latency is a key fact for system health. The log stream can help engineer locate the instance that goes wrong and refer to the problematic instance quickly.
The whole system is built on aws, and launched by Terraform. The system first connects RDS and let RDS feed all metrics to CloudWatch. We then can hook data out of CloudWatch using rds_exporter and cloudwatch_exporter. By utilizing these exporters, Prometheus is able to get all needed data then feed into its great partner Grafana for visualization, finally, Alertmanager will take care and fire alerts.
My code structure follows this flow:
Project
|
|---README.md
|
|---configs
| |...
|
|---templates
| |...
|
|---Terraform_Scripts
| |
| |---dev
| |
| |---modules
| | |
| | |---db_parameter_group
| | | |...
| | |---monitoring_ec2
| | | |...
| | |---rds
| | | |...
|
|...
Where configs and templates saves scripts and configuration files for packer to create a new AWS AMI. If you want build a similar image as I do, please check section Packer_Config. The Terraform Scripts contains all the modules I launched using Terraform.
Before we start, make sure you configure your credential variables for AWS following this instruction: Credential Set Up. (Don't Hard Code them in your code!)
To make the whole system running smoothly, I recommend you to add these IAM policies to your IAM User:
AmazonRDSFullAccess
, AmazonRDSEnhancedMonitoringRole
, a new policy for rds_exporter:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1508404837000",
"Effect": "Allow",
"Action": [
"rds:DescribeDBInstances",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"
],
"Resource": [
"*"
]
},
{
"Sid": "Stmt1508410723001",
"Effect": "Allow",
"Action": [
"logs:DescribeLogStreams",
"logs:GetLogEvents",
"logs:FilterLogEvents"
],
"Resource": [
"arn:aws:logs:*:*:log-group:RDSOSMetrics:*"
]
}
]
}
A new policy for cloudwatch_exporter:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowReadingMetricsFromCloudWatch",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarmsForMetric",
"cloudwatch:ListMetrics",
"cloudwatch:GetMetricStatistics",
"cloudwatch:GetMetricData"
],
"Resource": "*"
},
{
"Sid": "AllowReadingTagsInstancesRegionsFromEC2",
"Effect": "Allow",
"Action": [
"ec2:DescribeTags",
"ec2:DescribeInstances",
"ec2:DescribeRegions"
],
"Resource": "*"
},
{
"Sid": "AllowReadingResourcesForTags",
"Effect": "Allow",
"Action": "tag:GetResources",
"Resource": "*"
}
]
}
To configure your config, you need to know the tools we used for Rthe monitoring stack. I used Prometheus
, along
with rds_exporter
, Cloudwatch_exporter
. To config the prometheus, you need to change prometheus.yml
in config fold
please check Prometheus_Configuration for
more information; Besides, to add alert rules to the Prometheus, you would like to refer to Alert_Rules
and change the alert_rules.yml
.
To config rds_exporter
, change config.yml
as shown in RDS_Exporter,
to config alertmanager
, change alertmanager.yml
following the Alertmanager.
The last step before starting the node, please set up AWS_ACCESS_KEY_ID
and
AWS_SECRET_ACCESS_KEY
.
After configuring all the files, follow the following step:
cd ./templates
packer validate example.json
packer build example.json
Then you get your own AMI image for the unique monitoring stack!
To start up the whole system, you need to change the image and owner of the
AMI in Terraform_Scripts/modules/monitoring_ec2/main.tf
to your own AMI created.
Then please do the follow the following steps:
cd ./Terraform_Scripts/dev
terraform init
terraform plan
terraform apply
After spinning up all the instances, we need to set up the EC2 instance to run the monitoring stack.
Refer to the move_configs.sh
and move and configuration to the right place.
Then add AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
to your system.
Config your go by:
export GOPATH=$HOME/go
export PATH=$PATH:/usr/local/go/bin
export GOROOT=/usr/local/go
At last run:
sudo systemctl start cloudwatch_exporter
sudo systemctl start rds_exporter
sudo systemctl start alertmanager
Then check your grafana at localhost:3000
and launch the
same dashboard as mine using the template grafana_dashboard.json
!