Troubleshooting steps in a container-based cluster

In the event of an OpenIAM installation failure, users can troubleshoot and identify potential issues by collecting and examining logs. Below is a general guide on how to collect logs for failed installations and individual service failures, which is the first step in troubleshooting. By following the steps below, you can systematically identify and resolve issues in your container-based cluster.

Troubleshooting steps

  1. Check cluster status.

For Docker Swarm

  • Run the following command to check the status of nodes.
docker node ls

Ensure all nodes are in the Ready state. If a node shows Down or Unreachable, check network connectivity and the node’s status.

For Kubernetes

  • Run the following command to verify node status.
kubectl get nodes

Ensure all nodes are in the Ready state. If a node is NotReady, check logs and system resources.

  1. Check container and service status.

For Docker Swarm

  • Check the running services by running the following command.
docker service ls
  • If needed, restart a service with the following command.
docker service update --force <service_name>

For Kubernetes

  • Run the following command to list all pods across all namespaces.
kubectl get pods -A
  • In case you need to check a specific pod's details, run the following.
kubectl describe pod <pod_name>
  1. Check container logs.

For Docker Swarm

  • The logs for a specific container are checked with the following command.
docker logs <container_id>

For Kubernetes

  • You can check the logs for a pod with the command below.
kubectl logs <pod_name>

Then, you will need to follow live logs, using the command below.

kubectl logs -f <pod_name>
  1. Check network connectivity
  • Verify connectivity between nodes with the command below.
ping <node_ip>
  • Check service reachability as follows.
curl http://<service_ip>:<port>
  1. Check system resources.
  • Monitor system usage using the commands below.
top # Check CPU and Memory
df -h # Check disk space
free -m # Check available memory
  1. Check event logs.
  • in Docker Swarm, run the following.
docker events
  • in Kubernetes, use the following commands.
kubectl get events -A
  1. Restart services or nodes (if needed).
  • Restart a failing container with...
docker restart <container_id>
  • Restart a Kubernetes pod with ...
kubectl delete pod <pod_name>
  • Restart a node with the following command. This step is the last resort.
systemctl restart docker
  1. You can continue to debug further using shell access.
  • Access a container for debugging by running the following commands.
docker exec -it <container_id> /bin/sh // for Docker Swarm
kubectl exec -it <pod_name> -- /bin/sh // For Kubernetes
  1. Verify Image Versions and Configurations.
  • For Docker, check the running image version.
docker inspect <container_id>
  • Check environment variables inside a container.
env
  • For Kubernetes, check deployment configurations.
kubectl get deployment <deployment_name> -o yaml
  1. Check storage and volume issues.
  • List Docker volumes with the following commands.
docker volume ls
docker volume inspect <volume-name>
  • Check persistent volume claims in Kubernetes by running the following commands.
kubectl get pvc -A
Note: If the problem persists, consider checking external logs (e.g., system logs or application logs) and reviewing any recent updates or changes that may have impacted the cluster.

Using the tail command to capture relevant logs

When troubleshooting or preparing for performance testing, you might not need to capture full application log files. Log files can grow very large, making them difficult to share and time-consuming to review. In these cases, the Linux tail command is an efficient way to extract only the most relevant portions of a log.

The tail command displays the end of a file, which is usually where the most recent activity is recorded. By default, it shows the last 10 lines:

tail application.log

You can customize the number of lines with the -n option, as follows.

tail -n 100 application.log

This command fetches the last 100 lines, which is often enough to capture the error or performance event in question.

In case log is required for real-time monitoring, you can use tail -f option. It continuously streams new log entries as they are written.

tail -f application.log

Docker application

In Docker, you don’t usually use tail directly on the log file inside a container (since containers often write logs to stdout and stderr instead of local files). Instead, you use docker logs, which has the same behavior like tail.

The common way to show the last N lines of logs in Docker environment is as follows.

docker logs --tail 100 <container_name_or_id>

This is the same as tail -n 100 file.log, but for container logs.

Here, in the event you are asked to share the relevant snippet of logs, using tail helps avoid the need to transfer very large files. This reduces delays and keeps the focus on the specific performance issue being analyzed.