AWS Kubernetes guide
This document is aimed at helping the user to deploy OpenIAM in AWS.
Setting up the environment
Configure the AWS CLI Version . Use Version 2.0.30.
Export the access and secret keys as environment variables. Supported regions are listed here.
cat ~/.bash_profileexport AWS_REGION=us-west-2export AWS_ACCESS_KEY_ID=<ACCESS_KEY>export AWS_SECRET_KEY=<SECRET_KEY>export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_KEY}Set the region variable in
terraform.tfvars
to region that supports EKS.Set the AWS-specific variables in
terraform.tfvars
.
Variable Name | Required | Default value | Description |
---|---|---|---|
region | Y | The region to be deployed. For example, us-west-2. | |
replica_count | Y | The total number of nodes to be created in the Kubernetes cluster. | |
database.root.user | Y | The root username to the database. | |
database.root.password | Y | The root username to the database. | |
database.port | Y | Database port. | |
elasticsearch.aws.instance_class | Y | The instance class of the ES instance. See https://aws.amazon.com/elasticsearch-service/pricing/ . | |
elasticsearch.aws.ebs_enabled | Y | Certain instance types (listed here: https://aws.amazon.com/elasticsearch-service/pricing/) use ESB. For these, you MUST set this flag to true. If your instance class uses SSD, and not EBS, then you MUST set this flag to false. | |
elasticsearch.aws.ebs_volume_size | Y | The volume size of the EBS, in GB. Only used if ebs_enabled is set to true . | |
redis.aws.instance_class | Y | The instance class of the Redis instance. See https://aws.amazon.com/elasticache/pricing/ | |
database.aws.port | Y | The port where the RDS instance will be run. | |
database.aws.engine | Y | See the 'Engine' parameter for the API: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateDBInstance.html. We support MariaDB, Postgres, Oracle. We do NOT currently support MSSQL in AWS RDS. We do NOT support or oracle se or oracle-se1. They have a maximum version of Oracle 11, which is not compatible with our version of flyway. Note: Oracle-EE has NOT been tested due to licencing limitations. Use at your own risk. Thus - the only version tested version of Oracle in AWS that we fully support is oracle-se2. Use Oracle-EE at your own risk. | |
database.aws.instance_class | N | mariadb - db.t2.medium | Highly recommended. Instance class for the database instance. |
postgres - db.t2.medium | You can see the complete list here. | ||
oracle-ee - db.m5.large | If not specified, we will use a sensible default value, based on the database engine. However, it is recommended to set this manually. | ||
oracle-se2 - db.m5.large | |||
database.aws.multi_az | Y | Is this a mult-az Deployment? See https://aws.amazon.com/rds/details/multi-az/ | |
database.aws.parameters | Y | Any additional parameters passed to the database instance upon creation. See https://www.terraform.io/docs/providers/aws/r/db_parameter_group.html | |
database.aws.allocated_storage | Y | See the AllocatedStorage parameter for the API: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateDBInstance.html | |
database.aws.storage_type | Y | See the StorageType parameter for the API: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateDBInstance.html | |
database.aws.backup_retention_period | N | See the BackupRetentionPeriod parameter for the API: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateDBInstance.htmlA value of '0' disables backups. A non-zero value will resulut in terraform destroy failing. terraform destroy will attempt to delete the option group, which is associated with a database snapshot (which cannot be deleted). | |
kubernetes.aws.machine_type | Y | Machine Type of EKS Cluster. See https://aws.amazon.com/ec2/pricing/on-demand/. Minimum is m4.xlarge. |
- Finally, you will have to setup the
AWSServiceRoleForAmazonElasticsearchService
role-link for your account. To do this automatically via terraform, run the following commands.
cd bootstrap/awsterraform initterraform apply -var "region=${AWS_REGION}" -auto-approve
If you get an error like below, you can safely ignore it. This just means that the role-link has already been created.
Error creating service-linked role with name es.amazonaws.com: InvalidInput: Service role name AWSServiceRoleForAmazonElasticsearchService has been taken in this account, please try a different suffix
Metricbeat and FileBeat
Metricbeat and Filebeat are not available in AWS, due to incompatibilities with the latest version of Elasticsearch SAAS in AWS.
Important note about deploying
If you ever need to delete a pod, NEVER delete a pod with the --force
flag. If you do, it will result in all DNS lookups from that pod failing.
This seems to be a bug in AWS EKS.
OK:
kubectl delete pod/<pod_name>
NOT OK:
kubectl delete pod/<pod_name> --force --grace-period=0
If, when redeploying, a pod continuously fails to come up, it's likely because DNS is not resolving from within that pod. To fix this, simply delete the pod (again, do NOT use the --force
tag!).
IAM
By default, only the user who created the cluster has access to the EKS cluster. That means that only that user can execute kubectl
commands against that cluster. To fix this, you should modify the following file.
iam.auto.tfvars
There, you can add the necessary users and roles which should have access to this EKS cluster.
Destroying
Due to a bug with Terraform's Helm provider in AWS, destroying the objects in AWS must be performed in several automated and manual steps.
In the below commands, replace ${region}
with your region (i.e. us-west-2).
First, run the commands below.
terraform state rm module.deployment.module.helmterraform state rm module.deployment.module.openiam-appterraform state rm module.deployment.module.kubernetes
Next, delete the associated Load Balancers (created by Helm).
https://${region}.console.aws.amazon.com/ec2/v2/home?region=${region}#LoadBalancers:sort=loadBalancerName
And run the destroy command.
terraform destroy# enter 'yes' when asked to do so
Most likely, you will encounter the following error.
Error: Error deleting VPC: DependencyViolation: The vpc 'vpc-0ef7fda5d9d6cbc32' has dependencies and cannot be deleted.status code: 400, request id: 9abbe8ea-a647-4007-97a1-238aaa8371e8
This just means that you need to delete the VPC manually via the AWS Console. All other resources have been cleaned up.
Finally, you will have to delete Terraform's state files.
rm -rf terraform.tfstate*
Known issues
When running terraform apply, you may get the following error.
Error: error creating IAM policy test2021-CSI: EntityAlreadyExists: A policy called test2021-CSI already exists. Duplicate names are not allowed.status code: 409, request id: 384473b0-2f1f-4a1f-a000-cbcb016e984aon modules/core/aws/main.tf line 134, in resource "aws_iam_policy" "eks_worknode_ebs_policy":134: resource "aws_iam_policy" "eks_worknode_ebs_policy" {
Simply delete the policy in the AWS UI, and re-run terraform apply
, in this case.
Update notes
Updating to 4.2.1.2
We've updated to a newer version of the Elasticsearch Driver (7.15). Unfortunately, AWS only supports up to 7.10. Thus, for 4.2.1.2, you must use an in-house version of Elasticsearch.
Perform the following steps.
- Run
terraform state rm module.deployment.module.elasticsearch
. - Run
kubectl edit secret secrets
, and add the following two secrets
elasticsearchUserName
: <base64 value ofelasticsearch.helm.authentication.username
in terraform.tfvars>.elasticsearchPassword
: <base64 value ofelasticsearch.helm.authentication.password
in terraform.tfvars>.
- Update the terraform project to
HEAD
ofRELEASE-4.2.1.2
. - Use the dev tag in
env.sh
and run./setup.sh
. - Run
kubectl delete jobs --all
. - Run
terraform init
. - Run
terraform apply -auto-approve
.
You'll have to manually delete Elasticsearch SAAS from the AWS console.
Updating to 4.2.1.3
We've updated our Kubernetes version to 1.23. To be able to work further, follow the steps below.
- In
main.tf
, set thecluster_version
to1.21
. - Follow the upgrade steps in the main upgrading file. Choose the version you want to upgrade to.
- Run
terraform apply
. - In
main.tf
, set thecluster_version
to1.22
. - Run
terraform apply
. - In
main.tf
, set thecluster_version
to1.23
. - Run
terraform apply
.