AWS Kubernetes Guide
This document is aimed at helping the user to deploy OpenIAM in AWS
Set up the environment
Configure the AWS CLI Version . Use Version 2.0.30.
Export the access and secret keys as environment variables. Supported regions are listed here.
cat ~/.bash_profileexport AWS_REGION=us-west-2export AWS_ACCESS_KEY_ID=<ACCESS_KEY>export AWS_SECRET_KEY=<SECRET_KEY>export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_KEY}Set the region variable in terraform.tfvars to region that supports EKS.
Set the AWS-specific variables in terraform.tfvars
Variable Name | Required | Default Value | Description |
---|---|---|---|
region | Y | The region to be deployed. For example, us-west-2 | |
replica_count | Y | The total number of nodes to be created in the kubernetes cluster | |
database.root.user | Y | The root username to the database | |
database.root.password | Y | The root username to the database | |
database.port | Y | Database port. | |
elasticsearch.aws.instance_class | Y | The instance class of the ES instance. See https://aws.amazon.com/elasticsearch-service/pricing/ | |
elasticsearch.aws.ebs_enabled | Y | Certain instance types (listed here: https://aws.amazon.com/elasticsearch-service/pricing/) use ESB. For these, you MUST set this flag to true. If your instance class uses SSD, and not EBS, then you MUST set this flag to false | |
elasticsearch.aws.ebs_volume_size | Y | The volume size of the EBS, in GB. Only used if ebs_enabled is set to true | |
redis.aws.instance_class | Y | The instance class of the Redis instance. See See https://aws.amazon.com/elasticache/pricing/ | |
database.aws.port | Y | The port where the RDS instance will be run | |
database.aws.engine | Y | See the 'Engine' parameter for the API: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateDBInstance.html | |
We support mariadb, postgres, oracle | |||
We do NOT currently support MSSQL in AWS RDS | |||
We do NOT support or oracle se or oracle-se1. They have a maximum version of Oracle 11, which | |||
is not compatible with our version of flyway. Also NOTE - oracle-ee has NOT been tested due to licencing limitations. Use at your own risk | |||
Thus - the only version tested version of Oracle in AWS that we fully support is oracle-se2. Use oracle-ee at your own risk | |||
database.aws.instance_class | N | mariadb - db.t2.medium | highly recommended. Instance class for the database instance. |
postgres - db.t2.medium | See - https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html - for a complete list. | ||
oracle-ee - db.m5.large | if not specified, we will use a sensible default value, based on the database engine. However, it is recommended to set this manually | ||
oracle-se2 - db.m5.large | |||
database.aws.multi_az | Y | Is this a mult-az Deployment? See https://aws.amazon.com/rds/details/multi-az/ | |
database.aws.parameters | Y | any additional parameters passed to the database instance upon creation. See https://www.terraform.io/docs/providers/aws/r/db_parameter_group.html | |
database.aws.allocated_storage | Y | See the 'AllocatedStorage' parameter for the API: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateDBInstance.html | |
database.aws.storage_type | Y | See the 'StorageType' parameter for the API: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateDBInstance.html | |
database.aws.backup_retention_period | N | See the 'BackupRetentionPeriod' parameter for the API: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateDBInstance.html | |
A value of '0' disables backups | |||
A non-zero value will resulut in terraform destroy failing. terraform destroy will attempt to | |||
delete the option group, which is associated with a database snapshop (which cannot be deleted) | |||
kubernetes.aws.machine_type | Y | Machine Type of EKS Cluster. See https://aws.amazon.com/ec2/pricing/on-demand/. Minimum is m4.xlarge |
- Finally, you will have to setup the AWSServiceRoleForAmazonElasticsearchService role-link for your account. To do this automatically via terraform, run the following commands:
cd bootstrap/awsterraform initterraform apply -var "region=${AWS_REGION}" -auto-approve
If you get an error like below, you can safely ignore it. This just means that the role-link has already been created:
Error creating service-linked role with name es.amazonaws.com: InvalidInput: Service role name AWSServiceRoleForAmazonElasticsearchService has been taken in this account, please try a different suffix
Note: You only need to run this step once per AWS organization. Any subsequent runs will result in the above error (which you can ignore).
Metricbeat and FileBeat
Metricbeat and filebeat are not available in AWS, due to incompatabilities with the latest version of Elasticsearch SAAS in AWS.
Important Note about Deploying
If you ever need to delete a pod, NEVER delete a pod with the '--force' flag. If you do, it will result in all DNS lookups from that pod failing. This looks to be a bug in AWS EKS.
OK:
kubectl delete pod/<pod_name>
NOT OK:
kubectl delete pod/<pod_name> --force --grace-period=0
If, when redeploying, a pod continuously fails to come up, it's likely because DNS is not resolving from within that pod. In order to fix this, simply delete the pod (again, do NOT use the --force tag!).
IAM
By default, only the user who created the cluster has access to the EKS cluster. That means that only that user can execute kubectl commands against that cluster. To fix this, you should modify the following file:
iam.auto.tfvars
There, you can add the necessary users and roles which should have access to this EKS cluster.
Destroying
Due to a bug with terraform's helm provider in AWS, destroying the objects in AWS must be performed in several automated and manual steps. In the below commands, replace ${region} with your region (i.e. us-west-2).
First, run these commands:
terraform state rm module.deployment.module.helmterraform state rm module.deployment.module.openiam-appterraform state rm module.deployment.module.kubernetes
Next, delete the associated Load Balancers (created by helm):
https://${region}.console.aws.amazon.com/ec2/v2/home?region=${region}#LoadBalancers:sort=loadBalancerName
And run the destroy command:
terraform destroy# enter 'yes' when asked to do so
Most likely, you will encounter the following error.
Error: Error deleting VPC: DependencyViolation: The vpc 'vpc-0ef7fda5d9d6cbc32' has dependencies and cannot be deleted.status code: 400, request id: 9abbe8ea-a647-4007-97a1-238aaa8371e8
This just means that you need to delete the VPC manually via the AWS Console. All other resources have been cleaned up.
Finally, you will have to delete terraform's state files:
rm -rf terraform.tfstate*
Known Issues:
When running terraform apply, you may get this error:
Error: error creating IAM policy test2021-CSI: EntityAlreadyExists: A policy called test2021-CSI already exists. Duplicate names are not allowed.status code: 409, request id: 384473b0-2f1f-4a1f-a000-cbcb016e984aon modules/core/aws/main.tf line 134, in resource "aws_iam_policy" "eks_worknode_ebs_policy":134: resource "aws_iam_policy" "eks_worknode_ebs_policy" {
Simply delete the policy in the AWS UI, and re-run terraform apply
, in this case.
Update Notes
Updating to 4.2.1.2
We've updated to a newer version of the Elasticsearch Driver (7.15). Unfortunately, AWS only supports up to 7.10. Thus, for 4.2.1.2, you must use an in-house version of Elasticsearch.
Follow the following steps:
- run
terraform state rm module.deployment.module.elasticsearch
- run
kubectl edit secret secrets
, and add the following two secrets: elasticsearchUserName: <base64 value ofelasticsearch.helm.authentication.username
in terraform.tfvars> elasticsearchPassword: <base64 value ofelasticsearch.helm.authentication.password
in terraform.tfvars> - update the terraform project to
HEAD
ofRELEASE-4.2.1.2
. - Use the dev tag in
env.sh
and run./setup.sh
- run
kubectl delete jobs --all
- run
terraform init
- run
terraform apply -auto-approve
You'll have to manually delete Elasticsearch SAAS from the AWS console.
Updating to 4.2.1.3
We've updated our Kubernetes version to 1.23
- In
main.tf
, set thecluster_version
to1.21
; - Follow the upgrade steps in the main upgrading file;
- Run
terraform apply
; - In
main.tf
, set thecluster_version
to1.22
; - Run
terraform apply
; - In
main.tf
, set thecluster_version
to1.23
; - Run
terraform apply
.