Analyze Pacemaker events using open source Log Parser – Part 4

This blog is the fourth in a series and it follows the blog Analyze Pacemaker events in Cloud Logging, which describes how you can install and configure Google Cloud Ops Agent to stream Pacemaker logs of all your high availability clusters to Cloud Logging. You can analyze Pacemaker events happening to any of your clusters in one central place. But what if you don’t have this agent installed and want to know what happened to your cluster?Let’s look at this open source python script logparser, which will help you consolidate relevant Pacemaker logs from cluster nodes and filter the log entries for critical events such as fencing or resource failure. It takes below log files as input files and generates an output file of log entries in chronological order for critical events.System log such as /var/log/messagesPacemaker logs such as /var/log/pacemaker.log and /var/log/corosync/corosync.loghb_report in SUSEsosreport in RedHatHow to use this script?The script is available to download from this GitHub repository and supports multiple platforms.PrerequisitesThe program requires Python 3.6+. It can run on Linux, Windows and MacOS. As the first step, install or update your Python environment. Second, clone the GitHub repository as shown below.Run the scriptSee ‘-h’ for help. Specify the input log files, optional time range or output file name. By default, the output file is ‘logparser.out’ in the current directory.The hb_report is a utility provided by SUSE to capture all relevant Pacemaker logs in one package. If ssh login without password is set up between the cluster nodes, it should gather all information from all nodes. If not, collect the hb_report on each cluster node.The sosreport is a similar utility provided by RedHat to collect system log files, configuration details and system information. Pacemaker logs are also collected. Collect the sosreport on each cluster node.You can also parse single system logs or Pacemaker logs.In Windows, execute the Python file logparser.py instead.Next, we need to analyze the output information of the log parser results.Understanding the Output InformationThe output log may contain a variety of information, including but not limited to fencing actions, resources actions, failures, or Corosync subsystem events.Fencing action reason and resultThe example below shows a fencing (reboot) action targeting a cluster node because the node left the cluster. The subsequent log entry shows the fencing operation is successful (OK).code_block[StructValue([(u’code’, u”2021-03-26 03:10:38 node1 pengine: notice: LogNodeActions: * Fence (reboot) node2 ‘peer is no longer part of the cluster’rnrn2021-03-26 03:10:57 node1 stonith-ng: notice: remote_op_done: Operation ‘reboot’ targeting node1 on node2 for crmd.2569@node1.9114cbcc: OK”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e50d18d0350>)])]Pacemaker actions to manage cluster resourcesThe example below illustrates multiple actions affecting the cluster resources, such as actions moving resources from one cluster node to another, or an action stopping a resource on a specific cluster node.code_block[StructValue([(u’code’, u’2021-03-26 03:10:38 node1 pengine: notice: LogAction: * Move rsc_vip_int-primary ( node2 -> node1 )rn2021-03-26 03:10:38 node1 pengine: notice: LogAction: * Move rsc_ilb_hltchk ( node2 -> node1 )rn2021-03-26 03:10:38 node1 pengine: notice: LogAction: * Stop rsc_SAPHanaTopology_SID_HDB00:1 ( node2 ) due to node availability’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e50d18d0e10>)])]Failed resource operationsPacemaker manages cluster resources by calling resource operations such as monitor, start or stop, which are defined in corresponding resource agents (shell or Python scripts). The log parser filters log entries of failed operations. The example below shows a monitor operation that failed because the virtual IP resource is not running.code_block[StructValue([(u’code’, u’2020-07-23 13:11:44 node2 crmd: info: process_lrm_event: Result of monitor operation for rsc_vip_gcp_ers on node2: 7 (not running)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e50c787ec10>)])]Resource agent, fence agent warnings and errorsA resource agent or fence agent writes detailed logs for operations. When you observe resource operation failure, the agent logs can help identify the root cause. The log parser filters the ERROR logs for all agents. Additionally, it filters WARNING logs for the SAPHana agent.code_block[StructValue([(u’code’, u”2021-03-16 14:12:31 node1 SAPHana(rsc_SAPHana_SID_HDB01): ERROR: ACT: HANA SYNC STATUS IS NOT ‘SOK’ SO THIS HANA SITE COULD NOT BE PROMOTEDrnrn2021-01-15 07:15:05 node1 gcp:stonith: ERROR – gcloud command not found at /usr/bin/gcloudrnrn2021-02-08 17:05:30 node1 SAPInstance(rsc_sap_SID_ASCS10): ERROR: SAP instance service msg_server is not running with status GRAY !”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e50c787e510>)])]Corosync communication error or failureCorosync is the messaging layer that the cluster nodes use to communicate with each other. Failure in Corosync communication between nodes may trigger a fencing action.The example below shows a Corosync message being retransmitted multiple times and eventually reporting an error that the other cluster node left the cluster.code_block[StructValue([(u’code’, u’2021-11-25 03:19:33 node2 corosync: message repeated 214 times: [ [TOTEM ] Retransmit List: 31609]rn2021-11-25 03:19:34 node2 corosync [TOTEM ] FAILED TO RECEIVErn2021-11-25 03:19:58 23:28:32 node2 corosync [TOTEM ] A new membership (10.236.6.30:272) was formed. Members left: 1rn2021-11-25 03:19:58 node2 corosync [TOTEM ] Failed to receive the leave message. failed: 1′), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e50c4fe00d0>)])]This next example shows that a Corosync TOKEN was not received within the defined time period and eventually Corosync reported an error that the other cluster node left the cluster.code_block[StructValue([(u’code’, u’2021-11-25 03:19:32 node1 corosync: [TOTEM ] A processor failed, forming new configuration.rn2021-11-25 03:19:33 node1 corosync: [TOTEM ] Failed to receive the leave message. failed: 2′), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e50c4fe0950>)])]Reach migration threshold and force resource offWhen the number of failures of a resource reaches the defined migration threshold (parameter migration-threshold), the resource is forced to migrate to another cluster node.code_block[StructValue([(u’code’, u’check_migration_threshold: Forcing rsc_name away from node1 after 1000000 failures (max=5000)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e50eabad4d0>)])]When a resource fails to start on a cluster node, the number of failures will be updated to INFINITY, which implicitly reaches the migration threshold and forces a resource migration. If there is any location constraint preventing the resource to run on the other cluster nodes or no other cluster nodes are available, the resource is stopped and cannot run anywhere.code_block[StructValue([(u’code’, u’2021-03-15 23:28:33 node1 pengine: info: native_color:tResource STONITH-sap-sid-sec cannot run anywherern2021-03-15 23:28:33 node1 pengine: info: native_color:tResource rsc_vip_int_failover cannot run anywherern2021-03-15 23:28:33 node1 pengine: info: native_color:tResource rsc_vip_gcp_failover cannot run anywherern2021-03-15 23:28:33 node1 pengine: info: native_color:tResource rsc_sap_SID_ERS90 cannot run anywhere’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e50eabad890>)])]Location constraint added due to manual resource movementAll location constraints with prefix ‘cli-prefer’ or ‘cli-ban’ are added implicitly when a user triggers either a cluster resource move or ban command. These constraints should be cleared after the resource movement, as they restrict the resource so it only runs on a certain node. The example below shows a ‘cli-ban’ location constraint was created, and a ‘cli-prefer’ location constraint was deleted.code_block[StructValue([(u’code’, u’2021-02-11 10:49:43 node2 cib: info: cib_perform_op: ++ /cib/configuration/constraints: <rsc_location id=”cli-ban-grp_sap_cs_sid-on-node1″ rsc=”grp_sap_cs_sid” role=”Started” node=”node1″ score=”-INFINITY”/>rnrn2021-02-11 11:26:29 node2 stonith-ng: info: update_cib_stonith_devices_v2: Updating device list from the cib: delete rsc_location[@id=’cli-prefer-grp_sap_cs_sid’]’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e50eabad710>)])]Cluster/Node/Resource maintenance/standby/manage mode changeThe log parser filters log entries when any maintenance commands are issued on the cluster, cluster nodes or resources. The examples below show the cluster maintenance mode was enabled, and a node was set to standby.code_block[StructValue([(u’code’, u”(cib_perform_op) info: + /cib/configuration/crm_config/cluster_property_set[@id=’cib-bootstrap-options’]/nvpair[@id=’cib-bootstrap-options-maintenance-mode’]: @value=truernrn(cib_perform_op) info: + /cib/configuration/nodes/node[@id=’2′]/instance_attributes[@id=’nodes-2′]/nvpair[@id=’nodes-2-standby’]: @value=on”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e50eabad590>)])]ConclusionThis Pacemaker log parser can give you one simplified view of critical events in your High Availability cluster. If further support is needed from the Google Cloud Customer Care Team, follow this guide to collect the diagnostics files and open a support case.If you are interested in learning more about running SAP on Google Cloud with Pacemaker, read the previous blogs in this series here:Using Pacemaker for SAP high availability on Google Cloud – Part 1What’s happening in your SAP systems? Find out with Pacemaker Alerts – Part 2Analyze Pacemaker events in Cloud Logging – Part 3
Quelle: Google Cloud Platform

Slim.AI Docker Extension for Docker Desktop

Extensions are great for expanding the capability of Docker Desktop. We’re proud to feature this extension from Slim.AI which promises deep container insight and optimization features. Follow along as Slim.AI walks through how to install, use, and connect with these handy new features!
A version of this article was first published on Slim.AI’s blog.

We’re excited to announce that we’ve been working closely with the team at Docker developing our own Slim.AI Docker Extension to help developers build secure, optimized containers faster. You can find the Slim Extension in the Docker Marketplace or on Docker Hub.
Docker Extensions give developers a simple way to install, and run helpful container development tools directly within the Docker Desktop. For more information about Docker Extensions, check out https://docs.docker.com/desktop/extensions/.
The Slim.AI Docker Extension brings some of the Slim Platform’s capabilities directly to your local environment. Our initial release, available to everyone, is focused on being the easiest way for developers to get visibility into the composition and construction of their images and help reduce friction when selecting, troubleshooting, optimizing, and securing images.

Why should I install the Slim.AI Extension?
At Slim, we believe that knowing your software is a key building block to creating secure, small, production-ready containers and reducing software supply chain risk. One big challenge many of us face when attempting to optimize and secure container images is that images often lack important documentation. This leaves us in a pinch when trying to figure out even basic details about whether or not an image is usable, well constructed, and secure.
This is where, we believe, the Slim Docker Extension can help.
Currently, the Slim Docker extension is free to developers and includes the following capabilities:
Available Free to All Developers without Authentication to the Slim Platform

Easy to access deep analyses of your local container images by tag with quick access to information like the local arch, exposed ports, shells, volumes, and certs
Security insights including whether the containers runs with a root user and a complete list of files that have special permissions
Optimization opportunities including a counts of deleted and duplicate files
Fully searchable File Explorer filterable by layer, instruction, and file type with the ability to view the contents of any text-based file
The ability to compare any two local images or image tags with deep analyses and File Explorer capabilities
Reverse engineered dockerfiles for each image when the originals are not available

Features available to developers with Slim.AI accounts (https://portal.slim.dev):

Search across multiple public and authenticated registries for quality images including support for Docker Hub, Github, DigitalOcean, ECR, MCR, GCR, with more coming soon.
View deep analysis, insights, and File Explorer for images across available registries prior to pulling them down to your local machine.

How do I install the extension?

Make sure you’re running Docker Desktop version 4.10 or greater. You can get the latest version of Docker Desktop at docker.com/desktop.
Go to Slim.AI Extension on Docker Hub. (https://hub.docker.com/extensions/slimdotai/dd-ext)
Click “Open in Docker Desktop”.
This will open the Slim.AI extension in the Extensions Marketplace in Docker Desktop. Click “Install.” The installation should take just a few seconds.

How do I use the Slim.AI Docker Desktop Extension?

Once installed, click on the Slim.AI extension in the left nav of Docker Desktop.
You should see a “Welcome” screen. Go ahead and click to view our Terms and Privacy Policy. Then, click “Start analyzing local containers”.
The Slim.AI Docker Desktop Extension will list the images on your local machine.
You can use the search bar to find specific images by name.
Click the “Explore” button or carrot icon to view details about your images:

The File Explorer is a complete searchable view into your image’s file system. You can filter by layer, file type, and whether or not it is an Addition, Modification, or Deletion in a given layer. You can also view the content of non-binary files by clicking on the file name then clicking File Contents in the new window.
The Overview displays important metadata and insights about your image including the user, certs, exposed ports, volumes, environment variables, and more.
The Docker File shows a reverse engineered dockerfile we generate when the original dockerfile may not be available.

Click the “Compare” button to compare two images or image tags.

Select the tag via the dropdown under the image name. Then, click the “Compare” button in its card.
Select a second image or tag, and click the “Compare” button in its card.
You will be taken to a comparison view where you can explore the differences in the files, metadata, and reverse engineered dockerfiles.

How do I connect the Slim.AI Docker Desktop Extension to my Slim.AI Account?

Once installed, click on the “Login” button at the top of the extension.
Sign in using your GitHub, GitLab, or BitBucket account. (Accounts are free for individual developers.)
Navigate back to the Slim Docker Desktop Extension
Once successfully connected, you can use the search bar to search over all of your connected registries and explore remote images before pulling them down to your local machine.

What if I don’t have a Slim.AI account?
The Slim platform is currently free to use. You can create an account from the Docker Desktop Extension by clicking the Login button in the top right of the extension. You will be taken to a sign in page where you can authenticate using Github, Gitlab, or Bitbucket.
What’s on the roadmap?
We have a number of features and refinements planned for the extension, but we need your feedback to help us improve. Please provide your feedback here.
Planned capabilities include:

Improvements to the Overview to provide more useful insights
Design and UX updates to make the experience even easier to use
More capabilities that connect your local experience to the Slim Portal

 

Interested in learning more about how extensions can expand your experience in Docker Desktop? Check out our Docker Hub extensions library or see below for further reading: 
 

Install the Slim.AI Docker Desktop Extension.
Read similar articles covering new Docker Extensions.
Learn how to create your own extensions for Docker Desktop.
Get started and download Docker Desktop for Windows, Mac, or Linux. 

Quelle: https://blog.docker.com/feed/