Troubleshooting steps in a container-based cluster
In the event of an OpenIAM installation failure, users can troubleshoot and identify potential issues by collecting and examining logs. Below is a general guide on how to collect logs for failed installations and individual service failures, which is the first step in troubleshooting. By following the steps below, you can systematically identify and resolve issues in your container-based cluster.
Troubleshooting steps
- Check cluster status.
For Docker Swarm
- Run the following command to check the status of nodes.
docker node ls
Ensure all nodes are in the Ready state. If a node shows Down or Unreachable, check network connectivity and the node’s status.
For Kubernetes
- Run the following command to verify node status.
kubectl get nodes
Ensure all nodes are in the Ready state. If a node is NotReady, check logs and system resources.
- Check container and service status.
For Docker Swarm
- Check the running services by running the following command.
docker service ls
- If needed, restart a service with the following command.
docker service update --force <service_name>
For Kubernetes
- Run the following command to list all pods across all namespaces.
kubectl get pods -A
- In case you need to check a specific pod's details, run the following.
kubectl describe pod <pod_name>
- Check container logs.
For Docker Swarm
- The logs for a specific container are checked with the following command.
docker logs <container_id>
For Kubernetes
- You can check the logs for a pod with the command below.
kubectl logs <pod_name>
Then, you will need to follow live logs, using the command below.
kubectl logs -f <pod_name>
- Check network connectivity
- Verify connectivity between nodes with the command below.
ping <node_ip>
- Check service reachability as follows.
curl http://<service_ip>:<port>
- Check system resources.
- Monitor system usage using the commands below.
top # Check CPU and Memorydf -h # Check disk spacefree -m # Check available memory
- Check event logs.
- in Docker Swarm, run the following.
docker events
- in Kubernetes, use the following commands.
kubectl get events -A
- Restart services or nodes (if needed).
- Restart a failing container with...
docker restart <container_id>
- Restart a Kubernetes pod with ...
kubectl delete pod <pod_name>
- Restart a node with the following command. This step is the last resort.
systemctl restart docker
- You can continue to debug further using shell access.
- Access a container for debugging by running the following commands.
docker exec -it <container_id> /bin/sh // for Docker Swarmkubectl exec -it <pod_name> -- /bin/sh // For Kubernetes
- Verify Image Versions and Configurations.
- For Docker, check the running image version.
docker inspect <container_id>
- Check environment variables inside a container.
env
- For Kubernetes, check deployment configurations.
kubectl get deployment <deployment_name> -o yaml
- Check storage and volume issues.
- List Docker volumes with the following commands.
docker volume lsdocker volume inspect <volume-name>
- Check persistent volume claims in Kubernetes by running the following commands.
kubectl get pvc -A
Using the tail command to capture relevant logs
When troubleshooting or preparing for performance testing, you might not need to capture full application log files. Log files can grow very large, making them difficult to share and time-consuming to review. In these cases, the Linux tail command is an efficient way to extract only the most relevant portions of a log.
The tail command displays the end of a file, which is usually where the most recent activity is recorded. By default, it shows the last 10 lines:
tail application.log
You can customize the number of lines with the -n option, as follows.
tail -n 100 application.log
This command fetches the last 100 lines, which is often enough to capture the error or performance event in question.
In case log is required for real-time monitoring, you can use tail -f option. It continuously streams new log entries as they are written.
tail -f application.log
Docker application
In Docker, you don’t usually use tail directly on the log file inside a container (since containers often write logs to stdout and stderr instead of local files). Instead, you use docker logs, which has the same behavior like tail.
The common way to show the last N lines of logs in Docker environment is as follows.
docker logs --tail 100 <container_name_or_id>
This is the same as tail -n 100 file.log, but for container logs.
Here, in the event you are asked to share the relevant snippet of logs, using tail helps avoid the need to transfer very large files. This reduces delays and keeps the focus on the specific performance issue being analyzed.