New Theme: Twenty Twenty

We’re pleased to announce that Twenty Twenty — the new WordPress default theme designed by Anders Norén— is available to all WordPress.com sites.

Twenty Twenty was designed with the flexibility of the new WordPress Editor at its core. If you want to use it for an organization or a business, you can combine columns, groups, and media to create dynamic layouts that show off your services or products. If you want to use it for a traditional blog, the centered content column and considered typography makes it perfect for that as well.

It also has been designed and developed to take maximum advantage of the creative freedom enabled by the block editor. Extra care has been given to the Columns and Group blocks, which can be combined into impressive landing pages with intricate blocks layouts. Twenty Twenty includes full editor styles for the block editor, so what you see in the editor will almost exactly match the end result.

Learn more about WordPress’s latest default theme here, or check out the demo site!
Quelle: RedHat Stack

Power Users Rejoice: You've Got SFTP and Database Access

Three new hosting management tools give you direct access to your site’s files and data.

Have a site on a Business or eCommerce plan? Now you have three new ways to customize your WordPress.com site: SFTP access, database access, and PHP version switching give you behind-the-scenes access to the nuts and bolts of your site, which means more freedom and flexibility than ever before. If you’ve ever gotten stuck trying to modify your WordPress.com site because you couldn’t manually upload file changes or delete something from your database, website management is about to get a lot easier.

You’ll find all these features in a new section of your dashboard: Manage > Hosting Configuration.

The Hosting Configuration Screen

SFTP credentials

SFTP is a secure way to access the files and folders on your WordPress.com site using a program on your local computer like Filezilla. Some custom plugins and themes ask you to create specific folders or add files via SFTP. While many of those tasks can be accomplished with tools already built into your WordPress.com dashboard, folks who like using SFTP can now have SFTP access so they can make these changes directly. Check out our step-by-step guide to get started.

Database access

At WordPress.com, we regularly optimize your database so you don’t have to, but there might still be times when you need an efficient way to modify data, like purging all the tables created by a plugin you’ve decided to delete. Accessing your site’s database is an effective way to do this. 

Database access is a powerful tool, so if you’re at all unsure about working with a database, reach out to our Happiness Engineers! If you’re unfamiliar with databases, you can also find information on getting started in our help documentation.

PHP version switching

PHP is still one of the key languages used to build the web, and a new version, PHP 7.4, was recently released. WordPress.com sites currently run PHP 7.3, which has been tested extensively across all of WordPress.com, but sites on Business or eCommerce plans can switch to version 7.4 immediately. Learn more.

Since these new tools let you dig into some of the code and data the powers your site, you’ll find a link for our support team right from the dashboard so you can get help if you need it. Have fun getting under the hood, power users!
Quelle: RedHat Stack

Improving customer service with an intelligent virtual assistant using IBM Watson

Gartner predicts that “by 2022, 70 percent of white-collar workers will interact with conversational platforms on a daily basis.” As a result, the research group found that more organizations are investing in chatbot development and deployment.
IBM Business Partners like Sopra Steria Group are making chatbot and virtual assistant technology available to businesses. Sopra Steria Group, a European leader in digital transformation, has developed an intelligent virtual assistant for organizations across several industries who want to use an AI conversational interface to answer recurrent customer service questions.
Choosing IBM AI technology
In developing our solution, we at Sopra Steria Group were looking for AI technology that was easy to configure and could support multiple languages and complex dialogs. We created a training set of 2,000 sentences such as “I would like to buy a car,” or “I would like to have a coffee,” and tested 15 different AI solutions. Of all the solutions, we found that IBM AI technologies were the most accurate and could best meet our requirements. We are now using IBM Watson Assistant, IBM Watson Language Translator, IBM Watson Natural Language Understanding and IBM Watson Text to Speech as the AI foundation for our intelligent virtual assistant accelerator.
We have developed an orchestrator and a front-end interface to manage dialogs and connectivity to external systems and functionalities, such as support for multiple languages. Our solution enables a faster deployment of customer chatbots than any other on the market because of our enhancements.
Sopra Steria developers use Red Hat OpenShift sandbox environments to experiment with new concepts and build applications for the company’s employees. It also uses the Red Hat OpenShift environment to support intelligent virtual assistants and operate other managed services for its customers.
Our intelligent virtual assistant solution is located primarily on the cloud, but it can also be located on site at a client’s location if that is what the client needs. Locating the solution on premises allows us to be able to connect the dialogs easily with internal systems like billing and HR information systems.
Accelerating deployment of the intelligent virtual assistant
The Sopra Steria Group intelligent virtual assistant accelerator engine enables our solution to be deployed rapidly. There’s an intuitive graphical user interface, so our clients can begin creating dialogs for their chatbots and using it in customer service scenarios almost immediately.
Currently the intelligent virtual assistant is live in 10 sites throughout Europe. In its most mature instance, the solution is processing 80 percent of customer queries. In the first month alone, the solution processed 75,000 conversations. The solution has scaled from 450,000 users to more than one million and a half, with the expectation of supporting three million within the coming year.
Based on 700 user evaluations of the quality of service provided by the intelligent virtual assistant at our most mature client, more than 64 percent of users felt that the treatment had been satisfactory. This is a much higher satisfaction rate than the support provided by another solution that we had implemented in the past, but that did not use Watson Assistant.
The Sopra Steria Group solution enables our clients to be available for their customers day and night, while reducing operating costs. It also frees customer service representatives to focus on more complex and high-value customer issues.
Read the case study for more details.
The post Improving customer service with an intelligent virtual assistant using IBM Watson appeared first on Cloud computing news.
Quelle: Thoughts on Cloud

Building and running SAP Commerce in OpenShift

This post describes the process for building and deploying the SAP Commerce platform (formerly known as SAP Hybris) application server and applications to the Red Hat OpenShift Container Platform. SAP Commerce is a comprehensive commerce platform that includes product content management, experience management, personalization and order management.
Architecture and approach
The deployment of SAP Commerce to OpenShift consists of two main components:

SAP Commerce Server
Deployed applications, components and configurations

Given that the OpenShift Container Platform leverages container images as the packaging model, a layered file system is in use which allows for a common base to be used regardless of the number of applications. Since images are atomic in nature, there is a guarantee that the same base can be replicated across all of the applications. In addition, a container delivery pipeline can be created that allows for applications to be rebuilt automatically whenever the base is updated, such as when updates are installed or a security vulnerability is discovered.
Taking advantage of this, the approach used during this post will build the application in 2 different phases:

During the first phase, the SAP Commerce ‘base image’ is built. To do that we are using a Docker Build strategy (dockerStrategy) building the container image by reading the Dockerfile specifications from our source repository. The final result will be an SAP Commerce container image stored in the OpenShift internal registry that can be used later as many times as we need.
During the second phase, the application and different components will be built and initialized using a ‘Source to Image’ (s2i) strategy where the application source will be merged with the SAP Commerce base image built during the previous phase. As a result, the custom SAP Commerce application container image will be pushed to the internal registry, so this can be used to deploy the application instance later, and reused for different environments and different purposes (Dev, QA, Prod…)

OpenShift configuration
For this post, we’re using OpenShift Container Platform 4.2. To support the build and deploy stages, we are using Nexus and a git server deployed in OpenShift. This procedure can be easily adapted to use any other artifact repository and/or source code management tool. These tools will contain the initial artifacts to build the SAP Commerce base image and the application code that will be deployed on top of that base image. Additionally we are using Secrets and Config Maps that are not part of the templates to store different credentials to login into these systems and allow us to pull in objects.
To support application portability between environments, it is recommended that application configurations be externalized and injected at runtime. This enables the core components of the application (JAR file(s) packaged into an atomic image) to have an independant lifecycle from the configurations. Configurations in SAP Commerce that both override a portion of the default server configuration and support deployed applications are specified within a file called local.properties. 
OpenShift supports storing application configurations within the platform in an object called a ConfigMap and sensible data in an object called a Secret. For this use case, we are storing the application configuration that we want to change during runtime in a ConfigMap called ‘hybrisconfig’ and the sensible data in a Secret called ‘hybrislicense’.
Build and deploy process
One of the benefits of the OpenShift Container Platform is that it provides a robust build facility for creating container images. Multiple build strategies are available, giving administrators and developers the flexibility to select an option that best suits their needs.
The process to build the SAP Commerce container base image, the final application container image and the deployment of this application into the platform is fully managed by OpenShift. Templates are used (so they are reusable as many times as needed) to describe how our application is going to be built and deployed, and what objects are created in the platform. The use of ‘everything as code’ (Infrastructure as Code, Configuration as Code…) is a great practice to achieve platform automation, idempotency, reusability and portability. 
All the instructions to deploy the required objects are captured in the following GitHub repository. As part of the process we need to download SAP Commerce and extract the ‘hybris’ directory to be used as the base for our SAP Commerce deployment. This is the directory we push to our Artifact Repository to be used during the SAP Commerce base image build process. For the purpose of this post a ‘develop’ environment is set up from the given config templates, but this can be adapted to your specific requirements. This initial “plain” configuration and SAP Commerce platform files, along with the instructions from our Dockerfile will form the final SAP Commerce base image.
For the final application build, the previous created base image is used. This image already has the instructions (the s2i directory previously injected) to build and run the new resultant image from this build process. During the build process the source from our Git server repository is injected in the build container and the instructions from the ‘assemble’ script are executed.

We won’t go deep into the building and customizing process for the SAP Commerce application. The best source for those resources are the SAP Help for SAP Commerce page or SAP Customer Experience Wiki. 
Once the final image has been built containing the application, it can be deployed. As part of the used templates, additional objects are created in OpenShift to run the application deployments and manage the internal and external request to the application endpoints. A DeploymentConfig which already has the information of the application container image to be used as well as instructions on how this container image should be deployed is then created.

Once the application has been deployed, they will be accessible via an OpenShift Route that exposes services externally.

The post Building and running SAP Commerce in OpenShift appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift

Simplifying OpenShift Case Information Gathering Workflow: Must-Gather Operator

Introduction
Collecting debugging information from a large set of nodes (such as when creating SOS reports) can be a time consuming task to perform manually. Additionally, in the context of Red Hat OpenShift 4.x and Kubernetes, it is considered a bad practice to ssh into a node and perform debugging actions. To better accomplish this type of operation in OpenShift Container Platform  4, there is a new command: oc adm must-gather, which will collect debugging information across the entire cluster (nodes and control plane). More detailed information on the must-gather command can be found in the platform documentation.
While using the must-gather command is fairly straightforward, the full end-to-end process to facilitate all of the available tasks can be time consuming. This process involves issuing the command, waiting for the associated tasks to complete, and then upload the resulting information to the Red Hat case management system.
A way to further streamline the process is to automate these actions.
Must-Gather Operator
The must-gather operator streamlines running the must-gather command and uploading the results to the Red Hat case management system. The must-gather operator is intended to be used only by the cluster administrator as it requires elevated permissions on the cluster.  A must-gather run can be started by creating a MustGather custom resource (CR) similar to the following:
apiVersion: redhatcop.redhat.io/v1alpha1

kind: MustGather

metadata:

name: example

spec:

caseID: ‘XXXXXXXX’

caseManagementAccountSecretRef:

name: case-management-creds

serviceAccountRef:

name: must-gather-admin

Within the MustGather CR, three parameters can be defined:

caseID. Red Hat Support case to which the resulting output will be attached.
caseManagementAccountSecretRef: secret containing the credentials needed to login and upload files to the Red Hat case management system.
serviceAccountRef: service account with the cluster-admin role that is used to run the must-gather command. Running as a cluster-admin is a must-gather requirement.

 
When this CR is created, the operator creates a job that runs must-gather operations, and uploads the resulting information in a compressed file.
 
The must-gather operator watches only the namespace in which it is deployed. This should make it easier for a cluster administrator to configure limited access to that namespace. This is recommended as that namespace needs to contain a service account with cluster-admin privileges for the reason seen before and therefore needs to be properly protected.
Running Additional Must-Gather Images
The must-gather command supports the option of running multiple must-gather compatible images that can be used for collecting additional information. This option is typically limited to OpenShift addons, such as Kubevirt and OpenShift Container Storage (OCS). The must-gather operator supports this functionality by allowing these images to be specified as in the following example:
apiVersion: redhatcop.redhat.io/v1alpha1

kind: MustGather

metadata:

name: example-more-images

spec:

caseID: ‘XXXXXXX’

caseManagementAccountSecretRef:

name: case-management-creds

serviceAccountRef:

name: must-gather-admin

mustGatherImages:

– quay.io/kubevirt/must-gather:latest

– quay.io/ocs-dev/ocs-must-gather

As you can see, the mustGatherImages property is an array of strings representing images. When added to a must-gather CR, all the specified images in addition to the default must gather image will be run.
Installation
The must gather operator can be installed via the OperatorHub or with a Helm chart.
The project GitHub repository contains detailed information on how to install the must-gather operator.
Conclusions
Being able to provide diagnosis information in a consistent fashion  makes it easier for Red Hat support to aid in the resolution of issues. A more streamlined and automatic information collecting process makes it more likely for the customer to be able to provide timely debugging information to Red Hat support. The must-gather operator aims to help in this space.
The post Simplifying OpenShift Case Information Gathering Workflow: Must-Gather Operator appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift

Community Blog Round Up 06 January 2020

Welcome to the new DECADE! It was super awesome to run the blog script and see not one, not two, but THREE new articles by the amazing Adam Young who tinkered with Keystone, TripleO, and containers over the break. And while Lars only wrote one article, it’s the ultimate guide to the Open Virtual Network within OpenStack. Sit back, relax, and inhale four great articles from the RDO Community.
Running the TripleO Keystone Container in OpenShift by Adam Young
Now that I can run the TripleO version of Keystone via podman, I want to try running it in OpenShift.
Read more at https://adam.younglogic.com/2019/12/running-the-tripleo-keystone-container-in-openshift/
Official TripleO Keystone Images by Adam Young
My recent forays into running containerized Keystone images have been based on a Centos base image with RPMs installed on top of it. But TripleO does not run this way; it runs via containers. Some notes as I look into them.
Read more at https://adam.younglogic.com/2019/12/official-tripleo-keystone-images/
OVN and DHCP: A minimal example by Lars Kellogg-Stedman
Introduction A long time ago, I wrote an article all about OpenStack Neutron (which at that time was called Quantum). That served as an excellent reference for a number of years, but if you’ve deployed a recent version of OpenStack you may have noticed that the network architecture looks completely different. The network namespaces previously used to implement routers and dhcp servers are gone (along with iptables rules and other features), and have been replaced by OVN (“Open Virtual Network”).
Read more at https://blog.oddbit.com/post/2019-12-19-ovn-and-dhcp/
keystone-db-init in OpenShift by Adam Young
Before I can run Keystone in a container, I need to initialize the database. This is as true for running in Kubernetes as it was using podman. Here’s how I got keystone-db-init to work.
Read more at https://adam.younglogic.com/2019/12/keystone-db-init-in-openshift/
Quelle: RDO

Disaster Recovery Strategies for Applications Running on OpenShift

Introduction
There is an increasing pressure to deploy stateful applications in Red Hat  OpenShift.These applications require a more sophisticated disaster recovery (DR) strategy than stateless applications, as state must also be taken into account as opposed to just traffic redirection.
Disaster recovery strategies become less generic and more application-specific as applications increase in complexity. That said, this document will attempt to illustrate high-level disaster recovery strategies that can be applied to common stateful applications.
It is also important to acknowledge that the disaster recovery strategy for the OpenShift platform itself is a separate topic aside from the disaster recovery for applications running within OpenShift. 
For the purpose of this discussion, we can make the assumption that OpenShift is deployed in one of the topologies depicted below:

One could argue that “two independent clusters” and “three or more independent clusters” are the same architectural pattern. However, as we will see, when we have more than two datacenters at our disposal, additional options become available in terms of disaster recovery and so it behooves us to distinguish those two architectures.
Notably, the option of an OpenShift cluster stretching across two datacenters is being excluded from this discussion. With this type of OpenShift deployment, when a disaster strikes, there are some operations that are needed to recover OpenShift (exactly what depends on the details of the deployment). We don’t want to find ourselves in a situation where we need to recover OpenShift’s control plane and at the same time the applications running on it. 
Disaster recovery strategies can be grouped into two main categories: 

Active/passive: with this approach, under normal circumstances, all the traffic goes to one datacenter. The second datacenter is in a standby mode in case of a disaster. If a disaster occurs, it is assumed that there will be downtime in order to perform tasks needed to recover the service in the other datacenter. 
Active/active: with this approach, load is spread across all available datacenters. If a datacenter is lost due to a disaster, there can be an expectation that there is no impact to the services.

For the remainder of the document, a sample application will be used to illustrate the alternative disaster recovery strategies. While this approach will make the discussion more opinionated, it will make for a more realistic and easy to understand use case.
A Sample Stateful Application
Our application can be depicted as the following:

A stateless front-end receives customer requests through a route and communicates to a stateful workload. In our example, this is a relational database. The database pod mounts a persistent volume for its data.
Active / Passive Strategies
Active / passive strategies are suitable for those scenarios where only two datacenters are available. For more insight on why two datacenter deployments only lend themselves to active / passive strategies, and to understand what types of compromises are possible to overcome this limitation, please see the following blog post.
In an active/passive scenario, the overall architecture is depicted below:
In the preceding diagram, a global load balancer (referred to in the diagram as a Global Traffic Manager [GTM]) directs traffic to one of the datacenters.
The application is configured to replicate its state to the passive site.
When a disaster strikes, the following needs to occur:

The application is activated (either started, or configured to be master) in the passive site.
The global load balancer needs to be switched to the passive site.

These actions can be automated and performed in a relatively timely fashion. However, the decision to trigger that automation depends on declaring a disaster on the primary site (a task that typically involves human interaction). As a result, downtime is typically experienced in the application.
Once the disaster has been resolved, we have to switch back to the primary site. Likely, the easiest way to accomplish this task is to perform the disaster procedure in the opposite direction of when the disaster occurred. So, again, this procedure can be automated, but it will likely imply some downtime.
State Synchronization Options
Previously, we have described a very generic process to design an active/passive disaster recovery scenario. The entire process hinges on the ability to replicate state from the active site to the passive site. The following are some ways to accomplish this task. Each workload is different, so these various approaches should be chosen according to their applicability to your environments.
Volume-level Replication
With volume replication, state is replicated at the storage level. Volume replication can be synchronous (typically used on low latency scenarios) or asynchronous. In the latter case, the application must be designed to work in a way that guarantees storage is always consistent or at least recoverable.

Most storage products support volume replication. However, Kubernetes does not offer a standard primitive to setup volume replication between two different clusters. So, at least for now, we need to rely on proprietary extensions for this capability. For example Portworx supports this capability.
Configuring volume replication outside of the Kubenertes abstraction is always a possibility. However, since the static nature of this kind of configuration usually conflicts with dynamic volume provisioning, this type of configuration must be carefully designed.
Backups and restores
While taking backups can provide invaluable protection against application misconfiguration, bugs or human error, they are not a recommended approach for a DR strategy. 
In the context of DR, backups and restores can be seen as a form of rudimentary asynchronous volume replication. The following are examples of issues found with backups and restore operations:

Full backup and restore exercises are done too infrequently (or never). The primary risk is of not being able to recover data exactly when they actually need it.
There is no abstraction layer in Kubernetes to issue or schedule a backup, or a restore. Proprietary extensions (such as Velero or Fossul) must be used, or configuration must occur directly at the storage product layer.
For very large datastores, the restore process can take a long time, potentially longer than the acceptable downtime for an application. 

Application-level Replication
With application level replication, the stateful application takes care of replicating the state. Again, the replication can be synchronous or asynchronous. Because the replication is application-driven, at least in this case, we can be certain that the storage will always be in a consistent state. Most traditional databases can be configured in this fashion with a master running in the active site, and a slave running in the passive site.

In order for the master to synchronize with the slave, it must be possible to establish a connection from the master to the slave (and vice-versa when recovering after a disaster). One way to establish the connection can be to expose the stateful workload via a Route or a LoadBalancer service and have the master connect to that endpoint. 

While this is a possible approach, it has the drawback that our stateful application is now exposed outside of the cluster. Also, it can be complicated to configure egress and ingress paths while maintaining individual pods’ identity when there is more than one pod per cluster (horizontal scaling). This is due to the fact that typically stateful application instances need to  contact peer instances individually (not via a load balancer) to be able to cluster up. When there is more than one instance per cluster, it’s not possible to use the usual ingress solutions (load balancer services, ingresses, routers) to load balance on the instances of each cluster.
A solution to this issue is to establish a network tunnel between the cluster in such a way that pods in one cluster can directly communicate to pods in the other clusters.

Unfortunately, Kubernetes does not offer a standard abstraction to create network tunnels between clusters. However, there are community projects that offer this functionality including Submariner and Cilium.
Proxy-level Replication
A third option to achieve replication is to create a proxy in front of the stateful workload and have the proxy be responsible for maintaining the state replication.

Such a proxy would have to be written for the specific network protocol used by the stateful workload, making this approach not always an option.
As for application-level replication, we need the ability to establish inter-cluster pod-to-pod communication (from the proxy to the stateful workload). 
Examples of this approach include Vitess (MySQL) and Citus (PostgreSQL). These are relatively complex applications that were originally created to allow scale-out solutions for databases by intelligently sharding tables. So, while these solutions can be used as a disaster recovery strategy, they should be adopted only if they are also needed to meet other requirements (es: large scale deployments).
Active / Active Strategies
For active / active strategies, we assume we have the requirement of consistency and availability for our stateful workloads. As described in this article, in order to meet these requirements, we will need at least three datacenters and an application with a consensus protocol that allows it to determine which instances of the cluster are active and healthy.
In this approach, the application is responsible for synchronising the state across the various instances.
When OpenShift is introduced, we can deploy this type of architecture on one single cluster stretched across multiple datacenters (or availability zones [AZs] if in the cloud) or multiple independent clusters on multiple datacenters (or AZs).
Single Stretched Openshift Cluster across Multiple Datacenters
In order to achieve this strategy, the latency between datacenters must be relatively small (etcd requires latency to be max 10ms). Since organizations typically do not have three datacenters in the same metropolitan area (low latency), this approach is more likely to be setup in the cloud. Cloud regions are comprised of multiple datacenters called Availability Zones (AZ), with very low latency between them. It is therefore possible to stretch an OpenShift cluster across three or more AZs, and in fact, it is the recommended method of installing OpenShift in the cloud.
The resulting architecture is depicted as the following:

When a disaster hits one of the AZs, no action needs to occur as both OpenShift and the stateful workload will autonomously react to the situation. In particular, the stateful workload will sense the loss of one of the instances and will continue using the remaining instances.
The same is true when the affected AZ is recovered. When the stateful instance in the recovered AZ comes back online, before the instance is allowed to join the cluster, it will need to resync its state. Again, this is handled autonomously and is part of the clustering features of these kinds of stateful workloads.
Examples of databases that have these features include:

CockroachDB (binary compatible with PostgreSQL)
YugabyteDB (binary compatible with PostgreSQL)
TiDB (binary compatible with MySQL)

This new generation of databases (as an offspring of Google Spanner) are slowly gaining popularity. As you can see, they are binary compatible with existing major open source databases, so the theory is that you will not need to change your client applications when migrating to them.
At the same time though, since these are relatively new products, they may present some operational risk (lack of skills, low product maturity, lack of management tools).
Multiple OpenShift Clusters in Multiple Datacenters
This deployment topology can be depicted as the following:

In this case, we have multiple datacenters (at least three) that are potentially geographically distributed. In each datacenter, we have independent OpenShift clusters. A global load balancer balances traffic between the datacenters (for design advice on how to configure a global load balancer using this approach, see this article). The stateful workload is deployed across the OpenShift clusters. This approach is more suitable than the previous for geographical, on premise and hybrid deployments.
Furthermore, it provides better availability because while we have the same assurances in terms of reaction to a disaster, in this configuration OpenShift does not act as a single failure domain.
When a disaster does occur, our global load balancer must be able to sense the unavailability of one of the datacenters and redirect all traffic to the remaining active datacenters. No action needs to occur on the stateful workload as it will self-reorganize to manage the loss of a cluster member.
In this configuration, the members of the stateful workload cluster need to be able to communicate with each other (pod to pod communication must be established). The same consideration as for the active/passive scenario with application-level replication applies.
Finally, deploying a stateful workload across multiple OpenShift clusters is not trivial. The process of standing up a complex stateful workload in a single OpenShift cluster is made simple by using operators. But, nearly all operators today are cluster-bound and are not capable of controlling the configuration across multiple clusters. Multi-CassKop is one (rare) example of multi-cluster operator for Cassandra. Moreover, this project showcases a possible framework for creating multi cluster controllers.
Conclusions
In this article, a set of disaster recovery strategies for applications running on OpenShift has been introduced. This list is certainly not complete, however it is the hope is that these alternatives will help Openshift practitioners get the disaster recovery design for their application started in the right direction. Also, it should be noted that the options presented are valid for generic stateful workloads. In a real world use case, knowing the specific features of a given stateful products may uncover additional options.
During the discussion, we have also identified a set of limitations that are found in the current Kubernetes feature set which make these types of deployments relatively complex. These limitations were centered around the following items:

Standard way to create volume replication across two Kubernetes clusters
Standard way to create network tunnels across two or more Kubernetes clusters
Standard way of creating multicluster operators.

It is the hope that as more awareness about these limitations is created, they will be addressed by the Kubernetes community in the future.
The post Disaster Recovery Strategies for Applications Running on OpenShift appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift

Trident + OpenShift: A NetApp + Red Hat solution to dynamically provision storage for OpenShift

This is a guest post written by Bala Ramesh, Technical Marketing Engineer, NetApp

Trident is an open-source storage provisioner for Kubernetes and Kubernetes-based container orchestrators such as Red Hat OpenShift. Trident v19.10 is optimized for OpenShift 4. The simplicity you can obtain by using Trident for dynamically creating PVCs, coupled with its production grade CSI drivers and data management capabilities make it a key option for stateful storage requirements for OpenShift. Applications generate data and access to storage should be painless and on-demand.
Here’s how to install Trident in an OpenShift cluster and use Trident to orchestrate storage on NetApp backends.

My environment

This blog post uses a multi-node OpenShift 4.2 cluster (3 masters + 2 worker nodes) that was deployed using the bare metal method [I have deployed my OpenShift nodes as virtual machines on a KVM hypervisor]. You can choose to install OpenShift on other infrastructure providers (for example, your choice of a hyperscaler, Red Hat OpenStack Platform, or VMware vSphere). The install procedure detailed for Trident will remain the same, independent of where you choose to run OpenShift.
export KUBECONFIG=/root/ocp4/auth/kubeconfig
# oc login -u kubeadmin -p Login successful. </span>

You have access to 53 projects, the list has been suppressed. You can list all projects with ‘oc projects’

Using project “default”.
# oc get nodes
NAME STATUS ROLES AGE VERSION
master0.ocp4.example.com Ready master 39d v1.14.6+463c73f1f
master1.ocp4.example.com Ready master 39d v1.14.6+463c73f1f
master2.ocp4.example.com Ready master 39d v1.14.6+463c73f1f
worker0.ocp4.example.com Ready worker 39d v1.14.6+463c73f1f
worker1.ocp4.example.com Ready worker 39d v1.14.6+463c73f1f

On the storage side, I have a NetApp All Flash FAS (AFF) cluster that runs ONTAP 9.5, with dedicated Storage Virtual Machines (SVMs) created for NFS and iSCSI workloads. Think of an SVM as a logical container meant to isolate data between teams and manage lifecycles independently. Each storage cluster can house multiple SVMs. Ideally you should create multiple SVMs to isolate your workloads and create backends on a per-SVM basis. In this blog, I have one SVM that I will use for creating NFS PVCs.
Installing Trident
The installation procedure for Trident is straightforward, you can find the documentation here. Strat by retrieving the Trident installer from the GitHub site (https://github.com/NetApp/trident/releases). I am using the latest available version of Trident at this moment, which happens to be Trident 19.10.

After downloading and extracting the installer, your directory should look like this:

The sample-input directory contains a number of sample definitions for StorageClasses, PVCs and Trident backends to help you get started with Trident. There are a couple of ways you can install Trident:

Generic install: This is the easiest way to install Trident. If your OpenShift cluster does not have any network restrictions and has access to pull images from the outside world, this is the way to go.
Customized install: You can also choose to customize your install. In air gapped environments you can point to a private image repository for Trident to pull its images from. Take a look at https://netapp-trident.readthedocs.io/en/stable-v19.10/kubernetes/deploying.html#customized-installation to get started.

I’m performing a generic installation in this blog. This is what I do to install Trident:
1.Add the tridentctl binary to my path.
PATH=$PATH:$PWD/trident-installer

Install Trident by running the tridentctl install command. I specify a namespace for Trident (“trident-ns”) to install Trident’s resources and CRDs in. As part of installing, Trident creates CRDs to maintain its state and these CRD objects will be namespaced, only accessible in “trident-ns”.

# tridentctl install -n trident-ns
INFO Starting Trident installation. namespace=trident-ns
INFO Created service account.
INFO Created cluster role.
INFO Created cluster role binding.
INFO Added security context constraint user. scc=privileged user=trident-csi
INFO Added finalizers to custom resource definitions.
INFO Created Trident pod security policy.
INFO Created Trident service.
INFO Created Trident secret.
INFO Created Trident deployment.
INFO Created Trident daemonset.
INFO Waiting for Trident pod to start.
INFO Trident pod started. namespace=trident-ns pod=trident-csi-7c88dd6588-x95bz
INFO Waiting for Trident REST interface.
INFO Trident REST interface is up. version=19.10.0
INFO Trident installation succeeded.

As can be seen from the logtrace, Trident creates CRDs, defines a Service, Deployment and a DaemonSet that runs on all worker nodes. Once the Trident pod comes up and is able to communicate with the storage cluster, the installation completes and you can move on to the next step.
I can examine the resources that Trident has created.
# oc get pods -n trident-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
trident-csi-7c88dd6588-x95bz 4/4 Running 0 17m 10.254.3.22 worker0.ocp4.example.com
trident-csi-7v2zj 2/2 Running 0 17m 192.168.7.11 worker0.ocp4.example.com
trident-csi-c5czk 2/2 Running 0 17m 192.168.7.21 master0.ocp4.example.com
trident-csi-lxrw9 2/2 Running 0 17m 192.168.7.12 worker1.ocp4.example.com
trident-csi-mwhdq 2/2 Running 0 17m 192.168.7.22 master1.ocp4.example.com

Trident creates a Controller server which runs on one of the worker nodes. This is managed by a Deployment (named “trident-csi”) that creates and maintains a replica of the controller pod (trident-csi-7c88dd6588-x95bz). The Controller pod contains the Trident container and the necessary CSI sidecars to use NetApp’s CSI Drivers. The other pods are created by Trident’s DaemonSet on all available worker nodes. These pods contain a Trident container that talks to the Controller server to provision and attach volumes. You can choose the node(s) the Controller server is scheduled on by generating custom YAMLs and modifying the selector.
Creating a backend
Now that Trident is installed, I proceed to create a Trident backend. Before continuing any further, let’s take a look at what a Trident backend is:
A Trident backend represents a storage cluster that will be used by Trident to provision PVCs. Each instance of Trident can manage multiple backends and orchestrate storage. After you install Trident, you must 1. create a backend and 2. Map StorageClasses to backends. Once this is done, creating PVCs with these StorageClasses will instruct Trident to provision volumes on the storage cluster and expose it as a PV to the OpenShift cluster. This way, Trident manages the lifecycle of the PV from creation to deletion.
The <strong>sample-input</strong> directory contains sample definitions for all supported Trident backends. Here is my backend definition:
# cat backend.json
{
“debug”:true,
“managementLIF”:”10.11.12.13″,
“dataLIF”:”10.11.12.14″,
“svm”:”test”,
“backendName”: “nas_backend”,
“aggregate”:”aggr_01″,
“username”:”admin”,
“password”:””,
“storageDriverName”:”ontap-nas”,
“storagePrefix”:”bala_”,
“version”:1
}

tridentctl create backend -f backend.json -n trident-ns
+————-+—————-+————————————–+——–+———+
| NAME | STORAGE DRIVER | UUID | STATE | VOLUMES |
+————-+—————-+————————————–+——–+———+
| nas_backend | ontap-nas | 2d55fcc6-dec3-4c5f-960a-c1cc7a3678df | online | 0 |
+————-+—————-+————————————–+——–+———+

My backend definition mentions the StorageDriver that should be used (“ontap-nas”) and contains the details about the storage cluster. Trident offers a number of storage drivers that target different use cases and scalability requirements.
Creating a StorageClass and PVCs
Once a backend is available, I can create StorageClasses that map to backends and use them to create PVCs. Here’s a simple StorageClass definition that I use to create a “nas” StorageClass:
# cat sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nas
provisioner: csi.trident.netapp.io
parameters:
backendType: “ontap-nas”
snapshots: “True”
provisioningType: “thin”
encryption: “true”

The <strong>provisioner</strong> string is set to “csi.trident.netapp.io”, instructing requests for this StorageClass be handled by Trident’s CSI Drivers. Parameters can also be specified as key-value pairs to narrow down the candidate list of backends. This StorageClass will be used to create PVCs that support snapshot creation and encrypted volumes.
You can always examine the list of backends that satisfy a StorageClass by running tridentctl get backend -o json. Each backend has a storage.storageClasses attribute indicating the StorageClasses which can be used to provision a volume.
Here’s my PVC definition:
# cat pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nas-volume-claim
annotations:
trident.netapp.io/exportPolicy: “default”
trident.netapp.io/snapshotPolicy: “default”
spec:
accessModes:
– ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: “nas”

Trident supports annotations that enable users to pass feature arguments on a per-PVC basis. In this example, I explicitly set the export policy (used to configure access from the storage cluster to the PV) and snapshot policy (for taking automatic storage-level snapshots) to reference pre-created policies.
# oc create -f sc.yaml
storageclass.storage.k8s.io/nas created
# oc create -f pvc.yaml
persistentvolumeclaim/nas-volume-claim created
# oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nas-volume-claim Bound pvc-9e72871d-0fc2-11ea-9d76-525400af4de6 5Gi RWO nas 12s

There it is! Trident took care of creating a volume and presenting it as a PV to my OpenShift cluster, and I can reference this PVC in my pod/deployment definitions just like any other.
By now, it’s easy to see how Trident can add significant value to a stateful application. Persistent storage is a key requirement to successfully adopt container-centric DevOps and the focus must be on simplifying the consumption of storage natively at the Kubernetes layer. This eliminates the hard dependency on infrastructure/storage admins and empowers developers to focus on appdev, and DevOps/SRE professionals to keep the operation running. NetApp is a leader in the storage industry and provides a comprehensive suite of data management solutions to meet performance and efficiency service levels. Coupled with Trident, OpenShift users can scale workloads with less effort and avoid the storage bottleneck to provide a more seamless end-to-end experience. Trident’s documentation and Slack workspace are great places to get started and ask questions. In addition, netapp.io is home to NetApp’s contributions in the Open Ecosystems space and contains a rich set of blogs on Trident.
 
The post Trident + OpenShift: A NetApp + Red Hat solution to dynamically provision storage for OpenShift appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift

TakingOpenShift’s Security for Containerized Applications to the next level with Aqua

The Red Hat OpenShift Container Platform has a number of built-in security capabilities. Aqua provides an additional layer of security in development and protects containerized applications in runtime. Aqua recently developed a Kubernetes Operator that was successfully tested and validated by Red Hat OpenShift standards for integration and supportability. Aqua completed technical validations to become a Red Hat OpenShift Certified Operator, allowing our joint customers to deploy Aqua seamlessly on the OpenShift platform. 
One key differentiator of OpenShift Container Platform is that it allows users to leverage image streams when building environments using different registries.
Install, Deploy, and Check
You can use OperatorHub embedded for Red Hat OpenShift to download Aqua’s Operator. After installing the Aqua Operator and logging on to the Aqua Command Center, you can deploy the Aqua Enforcer container through a Daemonset. This helps to confirm that Aqua Enforcer runs on worker nodes in the OpenShift cluster.
What are Image Streams?
In an earlier Aqua blog, we spoke at length about image streams. Image Streams are an abstraction layer that provides mapping between image stream tags and actual images stored either in the internal OpenShift registry or in any external registry. Image streams can also be seen as pointers to actual images. A single image stream may consist of multiple tags, each of them pointing to an image from a different registry. 
Red Hat’s OpenShift Container Platform allows users to build environments that work more efficiently for large and diversified setups, by using Image Streams instead of regular images when building and deploying applications. From a security perspective, this requires a different approach for tracking security issues that should work natively with OpenShift. 
Once created, image streams can be referenced by all deployments and builds within the same project and used just like a regular image without making any special configurations to support it.  
The Aqua platform automatically discovers and connects to the image stream engine, providing the same experience and feature set as when scanning regular images from regular registries. 
Automating the Mundane
Aqua recently built a RHEL-based Operator to automate the maintenance of mundane operational duties. This makes the use of Aqua’s Cloud Native Security Platform (CSP), particularly the deployment and scanning pieces, more seamless.
When deploying Aqua CSP, you can leverage the Operator as an alternative to a deployment that uses a Helm chart or large, complicated YAML files. The Operator only requires one YAML file to deploy the Aqua infrastructure components, and another YAML file to deploy Aqua Enforcers in your production environment.
The Aqua Operator can also be configured to manage the Aqua Scanner container and scale it automatically when more resources are needed.  You can configure the minimum and maximum number of scanners you would like the Operator to deploy. You can even decide how many images you would like to allocate per scanner.  For example, if you have one scanner deployed, 500 images in your scan queue, and your maximum number of scanners is configured to 5, you’ll have 5 Aqua Scanners scaled automatically to scan all 500 images.
Aqua’s OpenShift certified operator is also available to deploy through the OpenShift console and OperatorHub.io.
OpenShift Hardening Made Easy
Kubernetes CIS benchmarks were designed to check security configurations before running Kubernetes. Red Hat took this opportunity to create a hardening guide of its own to determine if various parts of the CI pipeline were configured correctly. Aqua took this hardening guide and put it directly into their product. With this guide, you can automatically check and run tests to see if the clusters are configured correctly according to Red Hat’s guidelines.
In the image below, you can see a list of failures, warnings, pauses, and info. You can drill down for more information.
Collaboration and Innovation
Becoming a Red Hat Certified Technology Partner was a significant step in our continued work with OpenShift. Among other developments, the Aqua Operator allows OpenShift customers to scale Aqua runtime protection components more easily and handle a large number of Aqua Enforcers automatically. This capability, coupled with image streams and OpenShift hardening, extends OpenShift’s security capabilities and contributes to upgrading enterprises’ security posture.
 
The post TakingOpenShift’s Security for Containerized Applications to the next level with Aqua appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift

Liveness and Readiness Probes

The scope of use
Liveness and Readiness probes are Kubernetes capabilities that enable teams to make their containerised applications more reliable and robust. However, if used inappropriately they can result in none of the intended benefits, and can actually make a microservice based application unstable. 
The purpose of each probe is quite simple and is described well in the OpenShift documentation here. The use of each probe is best understood by examining the action that will take place if the probe is activated. 

Liveness : Under what circumstances is it appropriate to restart the pod?
Readiness : under what circumstances should we take the pod out of the list of service endpoints so that it no longer responds to requests? 

Coupled with the action of the probe is the type of test that can be performed within the pod :

HTTP GET request – For success, a request to a specific HTTP endpoint must result in a response between 200 and 399.
Execute a command – For success, the execution of a command within the container must result in a return code of 0.
TCP socket check – For success, a specific TCP socket must be successfully opened on the container.

Liveness
I come from a background of real-time embedded systems applications that need to run for many years without service interruption. I have never been a huge fan of the universal computer fix: “switch it off and back on again.” So to use a liveness probe we must identify a suitable test from the three available options that will sensibly result in a requirement to perform a pod restart if the test fails. 
It is important to identify that the pod with the liveness probe attached is the pod that needs to be restarted. This means not running tests with the liveness probe that result in a transaction involving other containers in other pods. However, if we make the liveness probe too simple, it may never give a meaningful indication of the poor health of the container. Ultimately, liveness probes are not the only facility that should be used in the management and monitoring of complex microservices-based applications; a structured logging and monitoring capability should be designed into the architecture from the start. Something like Prometheus. 
As a simple example of the use of liveness probes, consider a web application server for a travel booking site. The application server responds to web site requests on URL endpoints such as /flights, /trains and /hotels. Each of these URL’s results in a transaction that will require interaction with further containerised applications that look up seat or room availability. They  may also perform tasks such as obtaining the users profile and looking up their frequent traveller points etc. In short, this is not a trivial transaction, and it involves multiple downstream interfaces. If we use one of the above URL endpoints as part of the liveness probe, then the result may be that we restart the container after a failure or slow response from one of the downstream services. The application server may be performing perfectly, but we may restart it for the wrong reasons. Alternatively, we look for a different endpoint to give an indication of pod health. Adding a new URL such as /health -which simply validates that the application server is operating and serving requests – is a more atomic test that will only result in restarting the pod if the application server is unable to respond to a simple request. 
The process of finding a credible liveness test is not yet complete, however. Having identified a suitable endpoint we now have to identify the appropriate parameters for the liveness test to ensure that it operates under the right circumstances. The parameters required for the configuration of a URL response test involve the following:

initialDelaySeconds – Number of seconds after the container has started before the probe is initiated.
periodSeconds – The delay between performing probes.
timeoutSeconds – Number of seconds of inactivity after which the probe times-out and the containerised application is assumed to be failing. Note that this parameter is not used with a probe that involves the execution of a command in the container. The user must find some alternative way to stop the probe, such as using the timeout command line utility.
failureThreshold – The number of times that the probe is allowed to fail before the liveness probe restarts the container (or in the case of a readiness probe marks the pod as unavailable).
successThreshold – The number of times that the probe must report success after it begins to fail in order to reset the probe process.

The initialDelaySeconds parameter must be set to an appropriate value at which the health check probe should begin. Given that the /health probe runs on the same application server platform as the other more resource consuming URL’s, the initial delay must be long enough to ensure that the health check URL will be active. Setting this value too high will leave a period of time during which the container applications are active and the probe is not. 
The periodSeconds parameter should be treated with care as this will dictate the frequency at which the Kubernetes platform probes the pod to see if it is successfully operating. Too aggressive a setting results in unnecessary workload, while too big a gap between probes results in long delays between probing actions. 
Probe Timing
The example below illustrates the probe in action through a timeline. The first probe is successful, but the second, third and fourth fail. Assuming a default setting of 3 for the failureThreshold the pod will restart after the failure of the fourth probe.

 
Assuming that a pod fails to start successfully, the lowest amount of time that can elapse before the pod is restarted due to the liveness probe is given by:
time = initialDelaySeconds + failureThreshold – 1 periodSeconds + timeoutSeconds
Under normal steady state operation, and assuming that the pod has operated successfully for a period of time, the initialDelaySeconds parameter becomes irrelevant. In such a case, the minimum amount of time that will elapse between a failure and a pod restart is given by:
time = failureThreshold – 1 periodSeconds + timeoutSeconds
As shown by the image below, the failure point can occur just before a probe occurs or just after a successful probe has occurred. If the periodTime is long such that there is minimal interference on the pod, then it is possible that the time before the pod restarts could result in almost an additional periodSeconds time interval being added before restart.

The failureThreshold parameter must be used with caution. If the parameter is set too high there is a danger that time is wasted while a pod has failed and it is not being restarted. If this parameter is set too low then there is a danger that the pod may be restarted prematurely if the pod comes under an unusually heavy load such that, while serving hotel and train booking requests, it does not respond to the health probe quickly enough. If this situation arises and the pod is restarted, then the system has lost part of the workload serving customer requests and more workloads will be placed on the remaining pods, which will make their overall performance spiral down further. 
Readiness
Everything stated above regarding liveness probes applies equally to readiness probes. The obvious difference is the end result when the probe takes action, and in the case of a readiness probe the action is to remove the pod from the list of available service endpoints. Under normal circumstances the endpoint reports all pods that support it as shown below:
oc get ep/node-app-slave -o json
{
“apiVersion”: “v1″,
“kind”: “Endpoints”,

“subsets”: [
{
“addresses”: [
{
“ip”: “10.128.2.147”,

After a readiness probe has been actioned the addresses line changes to:
oc get ep/node-app-slave -o json
{
“apiVersion”: “v1″,
“kind”: “Endpoints”,

“subsets”: [
{
“notReadyAddresses”: [
{
“ip”: “10.128.2.147”,

One of the obvious differences between a liveness probe and a readiness probe is that the pod is still running after a readiness probe has taken action. This means that the successThreshold parameter has a greater role to play. Even after taking the pod out of the list of endpoints, the readiness probe will continue to probe the pod. If the pod somehow manages to self-correct (perhaps due to the fact that it was temporarily under severe workload and unable to respond to the probes), then the pod may start to respond successfully to the probes. The default value for the successThreshold is just 1 so there only needs to be one successful probe response to cancel the period of isolation and add the pod back into the service endpoint list. Consequently, readiness probes act particularly well at giving the pod some breathing space to recover from being overwhelmed. 
Once again, the question must be asked – ‘Given the overall architecture of the application and the expected workloads, under which it must operate, what action do we want to take when a pod is overwhelmed?’ Similarly to unnecessary pod restarts that place additional workload on other pods within the system, temporarily taking a pod ‘out of service’ may simply indicate that the functionality of the application or the overall architecture needs a rethink in terms of how it behaves under load.
Taking a pod out of service
For a readiness probe, the failureThreshold parameter defines how many times the probe must fail before the pod is removed from the endpoint list. Consider the situation in which  a readiness probe has a failureThreshold of 5 and a default successThreshold of 1. In the diagram below the pod suffers three consecutive failures to respond the to probe, followed by one successful response (probe 5). This successful response resets the counter on failures such that a further sequence of five failures occur (probes 6 to 10) before the removal of the pod from the endpoint list at probe 10.

Returning a pod to service
For a readiness probe the successThreshold parameter works alongside the failureThreshold to define the circumstances under which the pod should be returned to the endpoint list. Obviously, the successThreshold parameter has no impact on a liveness probe. Consider the situation in which a readiness probe has a failureThreshold of 5 and a successThreshold of 3. In the diagram below the pod suffers a fifth failure to respond at probe number 5 resulting in the removal of the pod from the endpoint list. Note that the pod remained in the endpoint list through the period of time from probe 1 to probe 5 even though it was struggling to respond successfully. One further probe failures occur at probe 6 before the health of the pod improves and it responds successfully at probe 7. Since the successThreshold is set to three it takes a further two successful probe responses at probes 8 and 9 before the pod is returned to the endpoint list.

The minimum amount of time for which a pod can be returned to health but still not servicing requests is given by :
time = successThreshold – 1 periodSeconds
Consider an Istio Circuit Breaker
Many microservices-based applications will benefit from the introduction of Red Hat Service Mesh for the management, monitoring and observability of services. As a consequence it may be appropriate to introduce the Istio circuit breaker capability. This operates in a similar manner to the readiness probe, in that it will remove the poorly performing pods from the routing rules it manages. Istio can take more granular actions on pods, however. If one begins to fail, for example, Istio can turn the flow of traffic to that pod down to a trickle, then reopen the flow when performance improves. 
Conclusion
Liveness and readiness probes have a part to play in the reliable and efficient delivery of application services. By considering exactly what is being used as the probe and what action we want to take for failure and recovery, it is possible to make good use of probes to help to manage the continued delivery of a microservice application.
The post Liveness and Readiness Probes appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift