Enabling OpenShift 4 Clusters to Stop and Resume Cluster VMs

There are a lot of reasons to stop and resume OpenShift Cluster VMs:

Save money on cloud hosting costs

Use a cluster only during daytime hours – for example for
exploratory or development work. If this is a cluster for just one
person it does not need to be running when the only user is not
using it.

Deploy a few clusters for students ahead of time when teaching a
workshop / class. And making sure that the clusters have
prerequisites installed ahead of time.

Background
When installing OpenShift 4 clusters a bootstrap certificate is created
that is used on the master nodes to create certificate signing requests
(CSRs) for kubelet client certificates (one for each kubelet) that will
be used to identify each kubelet on any node.
Because certificates can not be revoked, this certificate is made with a
short expire time and 24 hours after cluster installation, it can not be
used again. All nodes other than the master nodes have a service account
token which is revocable. Therefore the bootstrap certificate is only
valid for 24 hours after cluster installation. After then again every 30
days.
If the master kubelets do not have a 30 day client certificate (the
first only lasts 24 hours), then missing the kubelet client certificate
refresh window renders the cluster unusable because the bootstrap
credential cannot be used when the cluster is woken back up.
Practically, this requires an OpenShift 4 cluster to be running for at
least 25 hours after installation before it can be shut down.
The following process enables cluster shutdown right after installation.
It also enables cluster resume at any time in the next 30 days.
Note that this process only works up until the 30 day certificate
rotation. But for most test / development / classroom / etc. clusters
this will be a usable approach because these types of clusters are
usually rather short lived.
Preparing the Cluster to support stopping of VMs
These steps will enable a successful restart of a cluster after its VMs
have been stopped. This process has been tested on OpenShift 4.1.11 and
higher – including developer preview builds of OpenShift 4.2.

From the VM that you ran the OpenShift installer from create the
following DaemonSet manifest. This DaemonSet pulls down the same
service account token bootstrap credential used on all the
non-master nodes in the cluster.
cat << EOF >$HOME/kubelet-bootstrap-cred-manager-ds.yaml.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kubelet-bootstrap-cred-manager
namespace: openshift-machine-config-operator
labels:
k8s-app: kubelet-bootrap-cred-manager
spec:
replicas: 1
selector:
matchLabels:
k8s-app: kubelet-bootstrap-cred-manager
template:
metadata:
labels:
k8s-app: kubelet-bootstrap-cred-manager
spec:
containers:
– name: kubelet-bootstrap-cred-manager
image: quay.io/openshift/origin-cli:v4.0
command: [‘/bin/bash’, ‘-ec’]
args:
– |
#!/bin/bash

set -eoux pipefail

while true; do
unset KUBECONFIG

echo “———————————————————————-”
echo “Gather info…”
echo “———————————————————————-”
# context
intapi=$(oc get infrastructures.config.openshift.io cluster -o “jsonpath={.status.apiServerURL}”)
context=”$(oc –config=/etc/kubernetes/kubeconfig config current-context)”
# cluster
cluster=”$(oc –config=/etc/kubernetes/kubeconfig config view -o “jsonpath={.contexts[?(@.name==”$context”)].context.cluster}”)”
server=”$(oc –config=/etc/kubernetes/kubeconfig config view -o “jsonpath={.clusters[?(@.name==”$cluster”)].cluster.server}”)”
# token
ca_crt_data=”$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o “jsonpath={.data.ca.crt}” | base64 –decode)”
namespace=”$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o “jsonpath={.data.namespace}” | base64 –decode)”
token=”$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o “jsonpath={.data.token}” | base64 –decode)”

echo “———————————————————————-”
echo “Generate kubeconfig”
echo “———————————————————————-”

export KUBECONFIG=”$(mktemp)”
kubectl config set-credentials “kubelet” –token=”$token” >/dev/null
ca_crt=”$(mktemp)”; echo “$ca_crt_data” > $ca_crt
kubectl config set-cluster $cluster –server=”$intapi” –certificate-authority=”$ca_crt” –embed-certs >/dev/null
kubectl config set-context kubelet –cluster=”$cluster” –user=”kubelet” >/dev/null
kubectl config use-context kubelet >/dev/null

echo “———————————————————————-”
echo “Print kubeconfig”
echo “———————————————————————-”
cat “$KUBECONFIG”

echo “———————————————————————-”
echo “Whoami?”
echo “———————————————————————-”
oc whoami
whoami

echo “———————————————————————-”
echo “Moving to real kubeconfig”
echo “———————————————————————-”
cp /etc/kubernetes/kubeconfig /etc/kubernetes/kubeconfig.prev
chown root:root ${KUBECONFIG}
chmod 0644 ${KUBECONFIG}
mv “${KUBECONFIG}” /etc/kubernetes/kubeconfig

echo “———————————————————————-”
echo “Sleep 60 seconds…”
echo “———————————————————————-”
sleep 60
done
securityContext:
privileged: true
runAsUser: 0
volumeMounts:
– mountPath: /etc/kubernetes/
name: kubelet-dir
nodeSelector:
node-role.kubernetes.io/master: “”
priorityClassName: “system-cluster-critical”
restartPolicy: Always
securityContext:
runAsUser: 0
tolerations:
– key: “node-role.kubernetes.io/master”
operator: “Exists”
effect: “NoSchedule”
– key: “node.kubernetes.io/unreachable”
operator: “Exists”
effect: “NoExecute”
tolerationSeconds: 120
– key: “node.kubernetes.io/not-ready”
operator: “Exists”
effect: “NoExecute”
tolerationSeconds: 120
volumes:
– hostPath:
path: /etc/kubernetes/
type: Directory
name: kubelet-dir
EOF

Deploy the DaemonSet to your cluster.
oc apply -f $HOME/kubelet-bootstrap-cred-manager-ds.yaml.yaml

Delete the secrets csr-signer-signer and csr-signer from the
openshift-kube-controller-manager-operator namespace
oc delete secrets/csr-signer-signer secrets/csr-signer -n openshift-kube-controller-manager-operator

This will trigger the Cluster Operators to re-create the CSR signer
secrets which are used when the cluster starts back up to sign the
kubelet client certificate CSRs. You can watch as various operators
switch from Progressing=False to Progressing=True and back to
Progressing=False. The operators that will cycle are
kube-apiserver, openshift-controller-manager,
kube-controller-manager and monitoring.
watch oc get clusteroperators

Sample Output.
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
cloud-credential 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
cluster-autoscaler 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
console 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
dns 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
image-registry 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
ingress 4.2.0-0.nightly-2019-08-27-072819 True False False 3h46m
insights 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
kube-apiserver 4.2.0-0.nightly-2019-08-27-072819 True True False 18h
kube-controller-manager 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
kube-scheduler 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
machine-api 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
machine-config 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
marketplace 4.2.0-0.nightly-2019-08-27-072819 True False False 3h46m
monitoring 4.2.0-0.nightly-2019-08-27-072819 True False False 3h45m
network 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
node-tuning 4.2.0-0.nightly-2019-08-27-072819 True False False 3h46m
openshift-apiserver 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
openshift-controller-manager 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
openshift-samples 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
operator-lifecycle-manager 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
operator-lifecycle-manager-catalog 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
operator-lifecycle-manager-packageserver 4.2.0-0.nightly-2019-08-27-072819 True False False 3h46m
service-ca 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
service-catalog-apiserver 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
service-catalog-controller-manager 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
storage 4.2.0-0.nightly-2019-08-27-072819 True False False 18h

Once all Cluster Operators show Available=True,
Progressing=False and Degraded=False the cluster is ready
for shutdown.

Stoppping the cluster VMs
Use the tools native to the cloud environment that your cluster is
running on to shut down the VMs.
The following command will shut down the VMs that make up a cluster on
Amazon Web Services.
Prerequisites:

The Amazon Web Services Command Line Interface, awscli, is
installed.

$HOME/.aws/credentials has the proper AWS credentials available to
execute the command.

REGION points to the region your VMs are deployed in.

CLUSTERNAME is set to the Cluster Name you used during
installation. For example cluster-${GUID}.

export REGION=us-east-2
export CLUSTERNAME=cluster-${GUID}

aws ec2 stop-instances –region ${REGION} –instance-ids $(aws ec2 describe-instances –filters “Name=tag:Name,Values=${CLUSTERNAME}-*” “Name=instance-state-name,Values=running” –query Reservations[*].Instances[*].InstanceId –region ${REGION} –output text)

Starting the cluster VMs
Use the tools native to the cloud environment that your cluster is
running on to start the VMs.
The following commands will start the cluster VMs in Amazon Web
Services.
export REGION=us-east-2
export CLUSTERNAME=cluster-${GUID}

aws ec2 start-instances –region ${REGION} –instance-ids $(aws ec2 describe-instances –filters “Name=tag:Name,Values=${CLUSTERNAME}-*” “Name=instance-state-name,Values=stopped” –query Reservations[*].Instances[*].InstanceId –region ${REGION} –output text)

Recovering the cluster
If the cluster missed the initial 24 hour certicate rotation some nodes
in the cluster may be in NotReady state. Validate if any nodes are in
NotReady. Note that immediately after waking up the cluster the nodes
may show Ready – but will switch to NotReady within a few seconds.
oc get nodes

Sample Output.
NAME STATUS ROLES AGE VERSION
ip-10-0-132-82.us-east-2.compute.internal NotReady worker 18h v1.14.0+b985ea310
ip-10-0-134-223.us-east-2.compute.internal NotReady master 19h v1.14.0+b985ea310
ip-10-0-147-233.us-east-2.compute.internal NotReady master 19h v1.14.0+b985ea310
ip-10-0-154-126.us-east-2.compute.internal NotReady worker 18h v1.14.0+b985ea310
ip-10-0-162-210.us-east-2.compute.internal NotReady master 19h v1.14.0+b985ea310
ip-10-0-172-133.us-east-2.compute.internal NotReady worker 18h v1.14.0+b985ea310

If some nodes show NotReady the nodes will start issuing Certificate
Signing Requests (CSRs). Repeat the following command until you see a
CSR for each NotReady node in the cluster with Pending in the
Condition column.
oc get csr

Once you see the CSRs they need to be approved. The following command
approves all outstanding CSRs.
oc get csr -oname | xargs oc adm certificate approve

When you double check the CSRs (using oc get csr) you should now see
that the CSRs have now been Approved and Issued (again in the
Condition column).
Double check that all nodes now show Ready. Note that this may take a
few seconds after approving the CSRs.
oc get nodes

Sample Output.
NAME STATUS ROLES AGE VERSION
ip-10-0-132-82.us-east-2.compute.internal Ready worker 18h v1.14.0+b985ea310
ip-10-0-134-223.us-east-2.compute.internal Ready master 19h v1.14.0+b985ea310
ip-10-0-147-233.us-east-2.compute.internal Ready master 19h v1.14.0+b985ea310
ip-10-0-154-126.us-east-2.compute.internal Ready worker 18h v1.14.0+b985ea310
ip-10-0-162-210.us-east-2.compute.internal Ready master 19h v1.14.0+b985ea310
ip-10-0-172-133.us-east-2.compute.internal Ready worker 18h v1.14.0+b985ea310

Your cluster is now fully ready to be used again.
Ansible Playbook to recover cluster
The following Ansible Playbook should recover a cluster after wake up.
Note the 5 minute sleep to give the nodes enough time to settle, start
all pods and issue CSRs.
Prerequisites:

Ansible installed

Current user either has a .kube/config that grants cluster-admin
permissions or a KUBECONFIG environment variable set that points
to a kube config file with cluster-admin permissions.

OpenShift Command Line interface (oc) in the current user’s PATH.

– name: Run cluster recover actions
hosts: localhost
connection: local
gather_facts: False
become: no
tasks:
– name: Wait 5 minutes for Nodes to settle and pods to start
pause:
minutes: 5

– name: Get CSRs that need to be approved
command: oc get csr -oname
register: r_csrs
changed_when: false

– name: Approve all Pending CSRs
when: r_csrs.stdout_lines | length > 0
command: “oc adm certificate approve {{ item }}”
loop: “{{ r_csrs.stdout_lines }}”

Summary
Following this process enables you to stop OpenShift 4 Cluster VMs right
after installation without having to wait for the 24h certificate
rotation to occur.
It also enables you to resume Cluster VMs that have been stopped while
the 24h certificate rotation would have occurred.
The post Enabling OpenShift 4 Clusters to Stop and Resume Cluster VMs appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift

OpenShift 4.2 Disconnected Install

In a previous blog, it was announced that Red Hat is making the OpenShift nightly builds available to everyone. This gives users a chance to test upcoming features before their general availability. One of the features planned for OpenShift 4.2 is the ability to perform a “disconnected” or “air gapped” install, allowing you to install in an environment without access to the Internet or outside world.
NOTE: that nightly builds are unsupported and are for testing purposes only!
In this blog I will be going over how to perform a disconnected install in a lab environment. I will also give an overview of my environment, how to mirror the needed images, and any other tips and tricks I’ve learned along the way.
Environment Overview
In my environment, I have two networks. One network is completely disconnected and has no access to the Internet. The other network is connected to the Internet and has full access. I will use a bastion host that has access to both networks. This bastion host will perform the following functions.

Registry server (where I will mirror the content)
Apache web server (where I will store installation artifacts)
Installation host (where I will be performing the installation from)

Here is a high-level overview of the environment I’ll be working on.

In my environment, I have already set up DNS, DHCP, and other ancillary services for my network. Also, it’s important to get familiar with the OpenShift 4 prerequisites before attempting an install.
Doing a disconnected install can be challenging, so I recommend trying a fully connected OpenShift 4 install first to familiarize yourself with the install process (as they are quite similar).
Registry Set Up
You can use your own registry or build one from scratch. I used the following steps to build one from scratch. Since I’ll be using a container for my registry, and Apache for my webserver, I will need podman and httpd on my host.
yum -y install podman httpd httpd-tools

Create the directories you’ll need to run the registry. These directories will be mounted in the container running the registry.
mkdir -p /opt/registry/{auth,certs,data}

Next, generate an SSL certificate for the registry.  This can, optionally, be self-signed if you don’t have an existing, trusted, certificate authority. I’ll be using registry.ocp4.example.com as the hostname of my registry. Make sure your hostname is in DNS and resolves to the correct IP.
cd /opt/registry/certs
openssl req -newkey rsa:4096 -nodes -sha256 -keyout domain.key -x509 -days 365 -out domain.crt

Generate a username and password (must use bcrypt formatted passwords), for access to your registry.
htpasswd -bBc /opt/registry/auth/htpasswd dummy dummy

Make sure to open port 5000 on your host, as this is the default port for the registry. Since I am using Apache to stage the files I need for installation, I will open port 80 as well.
firewall-cmd –add-port=5000/tcp –zone=internal –permanent
firewall-cmd –add-port=5000/tcp –zone=public   –permanent
firewall-cmd –add-service=http  –permanent
firewall-cmd –reload

Now you’re ready to run the container. Here I specify the directories I want to mount inside the container. I also specify I want to run on port 5000. I recommend you put this in a shell script for ease of starting.
podman run –name poc-registry -p 5000:5000
-v /opt/registry/data:/var/lib/registry:z
-v /opt/registry/auth:/auth:z
-e “REGISTRY_AUTH=htpasswd”
-e “REGISTRY_AUTH_HTPASSWD_REALM=Registry”
-e “REGISTRY_HTTP_SECRET=ALongRandomSecretForRegistry”
-e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd
-v /opt/registry/certs:/certs:z
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt
-e REGISTRY_HTTP_TLS_KEY=/certs/domain.key
docker.io/library/registry:2

Verify connectivity to your registry with curl. Provide it the username and password you created.
curl -u dummy:dummy -k https://registry.ocp4.example.com:5000/v2/_catalog

Note, this should return an “empty” repo

If you have issues connecting try to stop the container.
podman stop poc-registry

Once it’s down, you can start it back up using the same podman run command as before.
Obtaining Artifacts
You will need the preview builds for 4.2 in order to do a disconnected install. Specifically, you will need the client binaries along with the install artifacts. This can be found in the dev preview links provided below.

Client Binaries
Install Artifacts

Download the binaries and any installation artifacts you may need for the installation. The file names will differ depending on when you choose to download the preview builds (they get updated often).
You can inspect the nightly release notes and extract the build number from there. I did this with the curl command.
export BUILDNUMBER=$(curl -s https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/latest/release.txt | grep ‘Name:’ | awk ‘{print $NF}’)
echo ${BUILDNUMBER}

To download the client binaries to your staging server/area (in my case, it’s the registry server itself) use curl:
curl -o /var/www/html/

https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/latest/openshift-client-linux-${BUILDNUMBER}.tar.gz

curl -o /var/www/html/

https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/latest/openshift-install-linux-${BUILDNUMBER}.tar.gz

You’ll also need these clients on your registry host, so feel free to un-tar them now.
tar -xzf /var/www/html/openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
tar -xzf /var/www/html/openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/

Depending on what kind of install you will do, you would need to do one of the following.
PXE Install
If you’re doing a PXE install, you’ll need the BIOS, initramfs, and the kernel files. For example:
curl -o /var/www/html/

https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/rhcos-${BUILDNUMBER}-metal-bios.raw.gz

curl -o /var/www/html/

https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/rhcos-${BUILDNUMBER}-installer-initramfs.img

curl -o /var/www/html/

https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/rhcos-${BUILDNUMBER}-installer-kernel

Once you have staged these, copy them over into your environment. Once they are in your PXE install server and your configuration updated, you can proceed to mirror your images.
ISO Install
If you’re doing an ISO install, you’ll still need the BIOS file but only the ISO for the install.
curl -o /var/www/html/

https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/rhcos-${BUILDNUMBER}-metal-bios.raw.gz

curl -o /var/www/html/

https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/rhcos-${BUILDNUMBER}-installer.iso

Once these are staged, copy them over to where you’ll need them for the installation. The BIOS file will need to be on a web server accessible to the OpenShift nodes. The ISO can be burned onto a disk/usb drive or mounted via your virtualization platform.
Once that’s done, you can proceed to mirror the container images.
Mirroring Images
The installation images will need to be mirrored in order to complete the installation. Before you begin you need to make sure you have the following in place.

An internal registry to mirror the images to (like the one I just built)

You’ll also need the certificate of this registry
The username/password for access

A pullsecret obtained at https://cloud.redhat.com/openshift/install/pre-release

I downloaded mine and saved it as ~/pull-secret.json

The oc and openshift-install CLI tools installed
The jq command is also helpful

First, you will need to get the information to mirror. This information can be obtained via the dev-preview release notes. With this information, I constructed the following environment variables.
export OCP_RELEASE=”4.2.0-0.nightly-2019-08-29-062233″
export AIRGAP_REG=’registry.ocp4.example.com:5000′
export AIRGAP_REPO=’ocp4/openshift4′
export UPSTREAM_REPO=’openshift-release-dev’   ## or ‘openshift’
export AIRGAP_SECRET_JSON=’~/pull-secret-2.json’
export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=${AIRGAP_REG}/${AIRGAP_REPO}:${OCP_RELEASE}
export RELEASE_NAME=”ocp-release-nightly”

I will now go over how to construct these environment variables from the release notes

OCP_RELEASE – Can be obtained by the Release Metadata.Version section of the release page.
AIRGAP_REG – This is your registry’s hostname with port
AIRGAP_REPO – This is the name of the repo in your registry (you don’t have to create it beforehand)
UPSTREAM_REPO – Can be obtained from the Pull From section of the release page.
AIRGAP_SECRET_JSON – This is the path to your pull secret  with your registry’s information (which we will create later)
OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE – This environment variable is set so the installer knows to use your registry.
RELEASE_NAME – This can be obtained in the Pull From section of the release page.

Before you can mirror the images, you’ll need to add the authentication for your registry to your pull secret file (the one you got from try.openshift.com). This will look something like this:
{
“registry.ocp4.example.com:5000″:
{
“auth”: “ZHVtbXk6ZHVtbXk=”,
“email”: “noemail@localhost”
}
}

The base64 is a construction of the registry’s auth in the username:password format. For example, with the username of dummy and password of dummy; I created the base64 by running:
echo -n ‘dummy:dummy’ | base64 -w0

You can add your registry’s information to your pull secret by using jq and the pull secret you downloaded (thus creating a new pull secret file with your registry’s information).
jq ‘.auths += {“registry.ocp4.example.com:5000″: {“auth”: “ZHVtbXk6ZHVtbXk=”,”email”: “noemail@localhost”}}’ &lt; ~/pull-secret.json &gt; ~/pull-secret-2.json

Also, if needed and you haven’t done so already, make sure you trust the self-signed certificate. This is needed in order for oc to be able to login to your registry during the mirror process.
cp /opt/registry/certs/domain.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

With this in place, you can mirror the images with the following command.
oc adm release mirror -a ${AIRGAP_SECRET_JSON}
–from=quay.io/${UPSTREAM_REPO}/${RELEASE_NAME}:${OCP_RELEASE}
–to-release-image=${AIRGAP_REG}/${AIRGAP_REPO}:${OCP_RELEASE}
–to=${AIRGAP_REG}/${AIRGAP_REPO}

Part of the output will have an example imageContentSources to put in your install-config.yaml file. It’ll look something like this.
imageContentSources:
– mirrors:
– registry.ocp4.example.com:5000/ocp4/openshift4
source: quay.io/openshift-release-dev/ocp-release-nightly
– mirrors:
– registry.ocp4.example.com:5000/ocp4/openshift4
source: quay.io/openshift-release-dev/ocp-v4.0-art-dev

Save this output, as you’ll need it later
Installation
At this point you can proceed with the normal installation procedure, with the main difference being what you specify in the install-config.yaml file when you create the ignition configs.
Please refer to the official documentation for specific installation information. You’re most likely doing a Bare Metal install, so my previous blog would be helpful to look over as well.
When creating an install-config.yaml file, you need to specify additional parameters like the example below.
apiVersion: v1
baseDomain: example.com
compute:
– hyperthreading: Enabled
name: worker
replicas: 0
controlPlane:
hyperthreading: Enabled
name: master
replicas: 3
metadata:
name: ocp4
networking:
clusterNetworks:
– cidr: 10.254.0.0/16
hostPrefix: 24
networkType: OpenShiftSDN
serviceNetwork:
– 172.30.0.0/16
platform:
none: {}
pullSecret: ‘{“auths”:{“registry.ocp4.example.com:5000″: {“auth”: “ZHVtbXk6ZHVtbXk=”,”email”: “noemail@localhost”}}}’
sshKey: ‘ssh-rsa …. root@helper’
additionalTrustBundle: |
—–BEGIN CERTIFICATE—–
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
—–END CERTIFICATE—–
imageContentSources:
– mirrors:
– registry.ocp4.example.com:5000/ocp4/openshift4
source: quay.io/openshift-release-dev/ocp-release-nightly
– mirrors:
– registry.ocp4.example.com:5000/ocp4/openshift4
source: quay.io/openshift-release-dev/ocp-v4.0-art-dev

Some things to note here:

pullSecret – only the information about your registry is needed.
sshKey – the contents of your id_rsa.pub file (or another ssh public key that you want to use)
additionalTrustBundle – this is your crt file for your registry. (i.e. the output of cat domain.crt)
imageContentSources –  What is the local registry is and the expected original source that should be in the metadata (otherwise they should be considered as tampered)

You will also need to export the OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE environment variable. This tells OpenShift which image to use for bootstrapping. This is in the form of ${AIRGAP_REG}/${AIRGAP_REPO}:${OCP_RELEASE}. It looked like this in my environment:
export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.ocp4.example.com:5000/ocp4/openshift4:4.2.0-0.nightly-2019-08-29-062233

I created my install-config.yaml under my ~/ocp4 install directory. At this point you can create your Ignition configs as you would normally.
# openshift-install create ignition-configs –dir=/root/ocp4
INFO Consuming “Install Config” from target directory
WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings
WARNING Found override for ReleaseImage. Please be warned, this is not advised

Please note that it warns you about overriding the image and that, for the 4.2 dev preview, the masters are schedulable.

At this point, you can proceed with the installation as you would normally.
Troubleshooting
A good thing to do during the bootstrapping process is to login to the bootstrap server and tail the journal logs as the bootstrapping process progresses. Many errors or misconfigurations can be seen immediately when tailing this log.
[core@bootstrap ~]$ journalctl -b -f -u bootkube.service

There are times where you might have to approve the worker/master node’s CSR. You can check pending CSRs with the oc get csr command. This is important to check since the cluster operators won’t finish without any worker nodes added. You can approve all the pending CSRs in one shot with the following command.
[user@bastion ~]$ oc get csr –no-headers | awk ‘{print $1}’ | xargs oc adm certificate approve

After the bootstrap process is done, it’s helpful to see your cluster operators running. You can do this with the oc get co command. It’s helpful to have this in a watch in a separate window.
[user@bastion ~]$ watch oc get co

The two most common issues are that the openshift-install command is waiting for the image-registry and ingress to come up before it considers the install a success. Make sure you’ve approved the CSRs for your machines and you’ve configured storage for your image-registry. The commands I’ve provided should help you navigate any issues you may have.
Conclusion
In this blog, I went over how you can prepare for a disconnected install and how to perform a disconnected install using the nightly developer preview of OpenShift 4. Disconnected installs were a highly popular request for OpenShift 4, and we are excited to bring you a preview build.
Nightly builds are a great way to preview what’s up and coming with OpenShift, so you can test things before the GA release. We are excited to bring you this capability and hope that you find it useful. If you have any questions or comments, feel free to use the comment section below!
The post OpenShift 4.2 Disconnected Install appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift

Simplify modernization and build cloud-native with open source technologies

Cloud-native technologies are the new normal for application development. Cloud-native creates immeasurable business value with increased velocity and reduced operational costs. Together, these support emerging business opportunities.
Advancements in application development have focused on net new applications. We have seen that existing applications that cannot easily move to the cloud have been left on traditional technologies. As a result, less than 20 percent of enterprise workloads are deployed to the cloud according to an IBM-commissioned IBM-commissioned study by McKinsey & Company.
At IBM, we see open source as a foundation for the new hybrid multicloud world, and our recent acquisition of Red Hat underscores our long commitment to open technologies.
Open source allows consistency and choice
Key open source technologies – containers, Kubernetes, Istio, Knative and others – together define the new hybrid multicloud platform, providing consistency and choice across any cloud provider. These technologies allow developers to build applications to support enterprise workloads using a common technology base with flexible vendor choices. They establish freedom for enterprises to deploy applications across public, private and hybrid cloud platforms.
Kubernetes provides a container orchestration layer that consistently manages workloads. Developers have full freedom of choice on languages, runtimes, and frameworks, while Kubernetes maintains a consistent operational platform across diverse technologies. This approach provides a basis for microservices-based container applications as well as existing enterprise applications.
New IBM open source project accelerates the cloud journey
In 2017, IBM began modernizing our software portfolio into containers and Kubernetes, and optimized more than 100 products for Red Hat OpenShift. In addition to our own journey, we’ve learned a lot about modernization from our clients and partners. Together, we’ve migrated or modernized more than 100,000 workloads.
With the new IBM Cloud Pak for Applications, we’ve encoded our experience into a set of technologies to accelerate the journey to cloud. Built on open source technologies, IBM Cloud Pak for Applications delivers tools, technologies and platforms designed to bring WebSphere workloads to any cloud through Kabanero.io and Red Hat Runtimes.
IBM Cloud Pak for Applications provides a rich set of open source technologies and functions that allow enterprises to secure and curate their favorite frameworks and runtimes, including those using Java, Open Liberty, SPRING BOOT® with Tomcat®, JBoss®, Node.js®, Vert.x and more. IBM Cloud Pak for Applications performs vulnerability scanning on all open source frameworks and runtimes to prevent security issues. All IBM Cloud Paks are supported by IBM and contain Docker-certified middleware.
Move WebSphere applications to any cloud
For existing applications, modernization tools in the new IBM Cloud Pak chart a path to modernize WebSphere applications into a fully open source stack. IBM Cloud Pak for Applications analyzes applications and provides a modernization plan specific for each application. Many WebSphere applications can be migrated to containers with automation and without code changes.
In the end, traditional applications are ready to deploy to any cloud — from OpenShift on IBM Cloud, an existing infrastructure, or to your cloud of choice.
IBM Cloud Pak for Applications delivers the open technology platform for the future and enables businesses to address the 80 percent of enterprise workloads that have yet to move to cloud according the report.
Learn more about IBM Cloud Pak for Applications and register to join us for the upcoming IBM Application Modernization Technical Conference 2019, Chicago, IL, United States on 24-25 September 2019. Experience two days of in-depth technical sessions for developers, administrators and architects at the inaugural IBM Application Modernization Technical Conference 2019 and hear from top subject matter from our labs, IBM Business Partners and customers.
The post Simplify modernization and build cloud-native with open source technologies appeared first on Cloud computing news.
Quelle: Thoughts on Cloud

Kubeflow + OpenShift Container Platform + Dell EMC Hardware: A Complete Machine Learning Stack

Kubeflow is an open source machine learning toolkit for Kubernetes. It bundles popular ML/DL frameworks such as TensorFlow, MXNet, Pytorch, and Katib with a single deployment binary. By running Kubeflow on Red Hat OpenShift Container Platform, you can quickly operationalize a robust machine learning pipeline. However, the software stack is only part of the picture. You also need high performance servers, storage, and accelerators to deliver the stack’s full capability. To that end, Dell EMC and Red Hat’s Artificial Intelligence Center of Excellence recently collaborated on two white papers about sizing hardware for Kubeflow on OpenShift.
The first whitepaper is called “Machine Learning Using the Dell EMC Ready Architecture for Red Hat OpenShift Container  Platform.”  It describes how to deploy Kubeflow 0.5 and OpenShift Container Platform 3.11 on Dell PowerEdge servers. The paper builds on Dell’s Ready Architecture for OpenShift Container Platform 3.11 — a prescriptive architecture for running OpenShift Container Platform on Dell hardware. It includes a bill of materials for ordering the exact servers, storage and switches used in the architecture. The machine learning whitepaper extends the ready architecture to include workload-specific recommendations and settings. It also includes instructions for configuring OpenShift and validating Kubeflow with a distributed TensorFlow training job.
Kubeflow is developed on upstream Kubernetes, which lacks many of the security features enabled in OpenShift Container Platform by default. Several of OpenShift Container Platform default security controls are relaxed in this whitepaper to get Kubeflow up and running. Additional steps might be required to meet your organization’s security standards for running Kubeflow on OpenShift Container Platform in production. These steps may include defining cluster roles for the Kubeflow services with appropriate permissions, adding finalizers to Kubeflow resources for reconciliation, and/or creating liveness probes for Kubeflow pods.
The second whitepaper is called “Executing ML/DL Workloads Using Red Hat OpenShift Container Platform v3.11.” It explains how to leverage Nvidia GPUs with Kubeflow for best performance on inferencing and training jobs. The hardware profile used in this whitepaper is similar to the ready architecture used in the first paper except the servers are outfitted with Nvidia Tesla GPUs. The architecture uses two GPU models. The OpenShift worker nodes have Nvidia Tesla T4 GPUs. Based on the Turing architecture, the T4s deliver excellent inference performance in a 70-Watt power profile. The storage nodes have Nvidia Tesla V100 GPUs. The V100 is a state of the art data center GPU. Based on the Volta architecture, the V100s are deep learning workhorses for both training and inference. 

The researchers compared the GPU models when training the Resnet50 TensorFlow benchmark. This is shown in the figure above. Not surprisingly, the Tesla V100s outperformed the T4s when training. They have double the compute capability — both in terms of FP64 and TensorCores — along with higher memory bandwidth due to the HBM2 memory subsystem. But the T4s should give better performance per Watt than the V100s when running less floating-point intensive tasks, particularly inferencing in mixed precision.
These whitepapers make it easier for you to select hardware for running Kubeflow on premises. Dell and Red Hat are continuing to collaborate on updating these documents to the latest version of Kubeflow and OpenShift Container Platform 4.
The post Kubeflow + OpenShift Container Platform + Dell EMC Hardware: A Complete Machine Learning Stack appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift

Red Hat OpenShift Service Mesh is now available: What you should know

Today, Red Hat OpenShift Service Mesh is now available. 
As Kubernetes and Linux-based infrastructure take hold in digitally transforming organizations, modern applications frequently run in a microservices architecture and therefore can have complex route requests from one service to another. With Red Hat OpenShift Service Mesh, we’ve gone beyond routing the requests between services and included tracing and visualization components that make deploying a service mesh more robust. The service mesh layer helps us simplify the connection, observability and ongoing management of every application deployed on Red Hat OpenShift, the industry’s most comprehensive enterprise Kubernetes platform.
Red Hat OpenShift Service Mesh is available through the OpenShift Service Mesh Operator, and we encourage teams to try this out on Red Hat OpenShift 4 here.
Better track, route and optimize application communication
With hardware-based load balancers, bespoke network devices, and more being the norm in modern IT environments, it was complex, if not nearly impossible, to have a consistent, general purpose way to manage and govern service-to-service communications between applications and their services. With a service mesh management layer, containerized applications can better track, route and optimize communications with Kubernetes as the core. Service mesh can help manage hybrid workloads in multiple locations and is more granularly aware of data locality. With the official introduction of OpenShift Service Mesh, we believe this critical layer of the microservices stack has the power to further enable a multi- and hybrid cloud strategy in enterprises.
OpenShift Service Mesh is based on a set of community projects, including Istio, Kiali and Jaeger, providing capability that encodes communication logic for microservices-based application architectures. This helps to free development teams to focus on coding applications and services that can add real business value. 
Making Developers’ Lives Easier
 
Like we said before, until service mesh, much of the burden of managing these complex services interactions has been placed on the application developer. Developers need a set of tools that can help them manage their entire application lifecycle beginning with determining the success of deploying their code all the way to managing the traffic patterns in production. Each service needs to properly interact with other services for the complete application to run. Tracing provides a way that developers can track how each service interacts with these other functions to determine if there are latency bottlenecks as they operate together. 
The ability to visualize the connections between all the services and look at the topology of how they interconnect can also be helpful in understanding these complex service interconnections. By packaging these features together as part of the OpenShift Service Mesh, Red Hat is making it easier for developers to access more of the tools they need to successfully develop and deploy cloud-native microservices. 
To ease the implementation of a service mesh, the Red Hat OpenShift Service Mesh can be added to your current OpenShift instance through the OpenShift Service Mesh Operator. The logic of installation, connecting the components together and the ongoing management of all the bits is built into the Operator, allowing you to get right to managing the service mesh for your application deployments. 
By reducing the overhead of implementing and managing the service mesh, it becomes easier to introduce the concept earlier in you application lifecycle and enjoy the benefits before things start to get out of hand. Why wait until it’s too late and you’re overwhelmed managing the communication layer? OpenShift Service Mesh can enable the scalable features you will need in an easy to implement fashion before you start to scale.
The features that OpenShift customers can benefit from with OpenShift Service Mesh include:

Tracing and Measurement (via Jaeger): While enabling service mesh can come with performance trade-offs for the betterment of management, OpenShift Service Mesh now measures baseline performance. This data can be analyzed and used to drive future performance improvements.

 

Visualization (via Kiali): Observability of the service mesh allows for an easier way to view the topology of the service mesh, and to observe how the services are interacting together.  

 

Service Mesh Operator: A Kubernetes Operator minimizes the administrative burdens of application management, allowing for automation of common tasks such as install, maintenance of the service and lifecycle management. Adding business logic with these applications helps to ease management and bring the latest upstream features to customers sooner. The OpenShift Service Mesh operator deploys Istio, Kiali and Jaeger bundles along with the configuration logic needed to make implementing all the features at once easier. 
Multiple interfaces for networking (via multus): OpenShift Service Mesh takes out manual steps and enables developers to execute code with increased security via Security Context Constraint (SCC). This allows additional lockdown of workloads on the cluster, such as outlining which workloads in a namespace can run as root and not. With this feature, developers get the usability of Istio while cluster admins gain well-defined security controls.
Integrations with Red Hat 3scale API management: For developers or operations teams looking to further secure API access across services, OpenShift Service Mesh ships with the Red Hat 3scale Istio Mixer Adapter. This allows for deeper control over the API layer of service communications beyond the traffic management of the service mesh capabilities.

Looking to the future of service mesh capabilities and interoperability, we introduced the Service Mesh Interface (SMI) earlier this year. Our hope in collaborating across the industry is to continue working to abstract these components in an effort to make it easier for service meshes to run on Kubernetes. This collaboration can help maximize choice and flexibility for our Red Hat OpenShift customers, and once applied, has the potential to bring a “NoOps” environment for developers closer to reality.
Try OpenShift
Service mesh can help ease the journey in operating a microservices based stack across the hybrid cloud. Customers with a growing Kubernetes and containers based environment are invited to try out Red Hat OpenShift Service Mesh.
 
Learn more:

About OpenShift 4 and OpenShift Service Mesh
Attend a virtual event to learn more about building applications on OpenShift

The post Red Hat OpenShift Service Mesh is now available: What you should know appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift

The silent revolution: the point where flash storage takes over for HDD

The post The silent revolution: the point where flash storage takes over for HDD appeared first on Mirantis | Pure Play Open Cloud.
For years the old paradigm has held true. If you want fast, buy flash, but it is going to cost you. If you want cheap, large HDD based servers were the go-to. The old standby for reasonably fast block storage, the 2TBx24 chassis was ubiquitous. For years, it looked like flash would be relegated to performance tiers. But is this actually true? I’ve been suspicious for some time, but a few days ago I did a comparison for an internal project and what I saw surprised even me. 
Flash storage technology and comparative cost
Flash storage has developed at a breakneck pace in recent years. Not only have devices become more resilient and faster, but there is also a new interface to consider. Traditional SSDs are SATA, or in relatively rare cases, SAS based. This limits the performance envelope of the devices severely. SATA SSDs top out at about 550MB/s maximum throughput, and offer around 50k small file input/output operations per second (IOPS) regardless of the speed of the actual chips inside the device. 
This limitation is due to the data transfer speed of the bus and the need to translate the storage access request to a disk based protocol (SATA/SAS) and, inside the SSD, back to a memory protocol. The same thing happens on the way out when data is being read. 
Enter Non-Volatile Memory express (NVMe). This ‘interface’ is essentially a direct connection of the flash storage to PCIe lanes. A configuration of 4 lanes per NVMe is common, though technology exists to multiplex NVMes so more devices can be attached than there are PCIe lanes available. 
NVMe devices typically top out above 2GB/s, and can offer several hundred thousand IOPS – theoretically. They also consume a lot more CPU when operating in a software defined storage environment, which limits performance somewhat. However, in practical application they are still much faster than traditional SSDs – at what is usually a very moderate cost delta. 
If the performance of SATA SSDs is insufficient for a specific use case, moving to SAS SSDs is usually not worth the expense, as NVMe devices, which offer much better performance, are usually not more expensive than their SAS counterparts, so a move directly to NVMe is preferable.
One more note: If NVMes operate with the same number of CPU cores as SSDs, they are still somewhat faster and very comparable financially. The calculations below are designed to include more CPU cores for NVMe for performance applications.
Let’s look at how the numbers work out for different situations.
Small Environments
Let’s have a look at a 100TB environment with increasing performance requirements. Consider the following table that looks at HDDs, SSDs, and NVMe. Street prices are in US$x1000, and IOPS are rough estimates:

td { font-size: small; padding: 5px; }
.winner { background-color: lightgreen; }

100TB
HDD 6TB 12/4U cost [x1000 US$]
HDD 2TB 20/2U cost [x1000 US$]
SSD
Layout
SSD cost [x1000 US$]
NVMe
Layout
NVMe cost [x1000 US$]

10k IOPS
132
135
5X 10x 7.68TB
91
5x 4x 15.36TB
102

30k IOPS
345
271
5X 10x 7.68TB
91
5x 4x 15.36TB
102

50k IOPS
559
441
5X 10x 7.68TB
91
5x 4x 15.36TB
102

100k IOPS
1,117
883
5X 10x 7.68TB
91
5x 4x 15.36TB
102

200k IOPS
2,206
1,767
7x 14x 3.84TB
113
5x 10x 7.68TB
113

500k IOPS
5,530
4,419
14x 14x 1.92TB
168
7x 14x 3.84TB
133

1000k IOPS
11,034
8,804
42x 14x 1.92TB
470
13x 14x 2TB 
168

In this relatively small cluster, as expected, HDDs are no longer viable. The more IOPS required, the more extra capacity must be purchased to provide enough spindles. This culminates in a completely absurd $11 million for a 1000K IOPS cluster built on 6TB hard disks.
Middle of the Road
Of course, we all know that larger amounts of SSD storage are more expensive, so let’s quadruple storage requirements and see where we get. HDDs should become more viable, wouldn’t you think?

400TB
HDD 6TB 12/4U
HDD 2TB 20/2U
SSD
Layout
SSD cost [x1000 US$]
NVMe
Layout
NVMe cost [x1000 US$]

10k IOPS
250
510
14x 14x 7.68TB
326
7x 14x 15.36TB
348

30k IOPS
405
510
14x 14x 7.68TB
326
7x 14x 15.36TB
348

50k IOPS
655
510
14x 14x 7.68TB
326
7x 14x 15.36TB
348

100k IOPS
1,311
883
14x 14x 7.68TB
326
7x 14x 15.36TB
348

200k IOPS
2,593
1,767
14x 14x 7.68TB
326
7x 14x 15.36TB
348

500k IOPS
6,495
4,419
14x 14x 7.68TB
326
7x 14x 15.36TB
348

1000k IOPS
12,961
8,804
27x 14x 3.84TB
437
14x 14x 7.68TB
413

Surprise! Again we find that HDD is only viable for the slower speed requirements of archival storage. Note that 16TB NVMes are not much more expensive than the SSD solution!
A note about chassis: To get good performance out of NVMe devices, a lot more CPU cores are needed than in HDD based solutions. Four OSDs per NVMe and 2 cores per OSD are a rule of thumb. This means that stuffing 24 NVMes into a 2U chassis and calling it a day is not going to provide exceptional performance.  We recommend 1U chassis with 5-8 NVMe devices to reduce bottlenecking on the OSD code itself. (I’m also assuming that the network connectivity is up to transporting the enormous amount of data traffic.)
Petabyte Scale
If we enter petabyte scale, hard disks become slightly more viable, but at this scale (we are talking 64 4U nodes) the sheer physical size of the hard disk based cluster can become a problem:

1PB
HDD 6TB 12/4U
HDD 2TB 20/2U
SSD
Layout
SSD cost [x1000 US$]
NVMe
Layout
NVMe cost [x1000 US$]

10k IOPS
453
1257
34x 14x 7.68TB
789
17x 14x 15.36tb
871

30k IOPS
453
1257
34x 14x 7.68TB
789
17x 14x 15.36tb
871

50k IOPS
488
1257
34x 14x 7.68TB
789
17x 14x 15.36tb
871

100k IOPS
1850
1257
34x 14x 7.68TB
789
17x 14x 15.36tb
871

200k IOPS
2101
1767
34x 14x 7.68TB
789
17x 14x 15.36tb
871

500k IOPS
4619
4365
34x 14x 7.68TB
789
17x 14x 15.36tb
871

1000k IOPS
10465
8720
34x 14x 7.68TB
789
17x 14x 15.36tb
871

Note: The performance data for SSD and NVMe OSDs is estimated conservatively. Depending on the use case performance will vary.
So what do we learn from all this? 
The days of HDD are numbered. 
For most use cases even today SSD is superior. Also, SSD and NVMe are still nosediving in terms of cost/unit. SSD/NVMe based nodes also make for much more compact installations and are a lot less vulnerable to vibration, dust and heat. 
The health question
Of course, cost isn’t the only issue. SSDs do wear. The current crop is way more resilient long term than SSDs from a couple of years ago, but they will still eventually wear out. On the other hand, SSDs are not prone to sudden catastrophic failure triggered by either a mechanical event or marginal manufacturing tolerances.
That means that the good news is that SSDs do not fail suddenly in almost all cases. They develop bad blocks, which for a time are replaced with fresh blocks from an invisible capacity reserve on the device. You will not see capacity degradation until the capacity reserve runs out of blocks, wear leveling does all this automatically.
You can check the health of the SSDs by using SMART (smartmontools on Linux), which will show how many blocks have been relocated, and the relative health of the drive as a percentage of the overall reserve capacity.  
Bonus round: SSD vs 10krpm
In the world of low latency and high IOPS, the answer for HDD manufacturers is to bump up rotation speed of the drives. Unfortunately, while this does make them faster, it also makes them more mechanically complex, more thermally stressed and in a word: expensive.
SSDs are naturally faster and mechanically simple. They also — traditionally at least — were more expensive than the 10krpm disks, which is why storage providers have still been selling NASes and SANs with 10 or 15krpm disks. (I know this from experience, as I used to run high performance environments for a web content provider.)
Now have a look at this:

Device type
Cost [US$]
Cost/GB [US$]

HDD 1.8TB SAS 10krpm Seagate
370
0.21

SSD 1.92TB Micron SATA
335
0.17

NVMe 2.0TB Intel
399
0.20

HDD 0.9TB Seagate
349
0.39

In other words, 10krpm drives are obsolete not only from the cost/performance ratio, but even from the cost/capacity ratio! The 15krpm drives are even worse. The hard disks in this sector have no redeeming qualities; they are more expensive, drastically slower, more mechanically complex, and cost enormous amounts of money to run.
So why is there so much resistance to moving beyond them? I have heard the two arguments against SSDs:
Lifespan: With today’s wear leveling, this issue has largely evaporated. Yes, it is possible to wear out an SSD, but have a look at the math: a read optimized SSD is rated for about one Device Writes Per Day (DWPD) (that is, a write of the whole capacity of the device) over 5 years. Let’s compare this with an 1.8TB 10krpm HDD. With a workload that averages out at 70MB/s write (with a mix of small and large writes) and a 70/30 read/write ratio, this 10krpm HDD can write 1.81TB/day. 
In other words, you won’t wear out the SSD under the same conditions within 5 years. If you want to step up to 3xDWPD (mixed use), the drive is still below the cost for the HDD (US$350), and you will have enough resilience even for very write heavy workloads.
TCO: It is true that an SSD uses more power as throughput increases. Most top out at about 3x the power consumption of a comparable HDD if they are driven hard. They also will provide ~10x the throughput and >100x the small block IOPS of the HDD. If the SSD is ambling along at the sedate pace of the 10krpm HDD, it will consume less power than the HDD. If you stress the performance envelope of the SSD, you would have to have a large number of HDDs to match the single SSD performance, which would not even be in the same ballpark in both initial cost and TCO.
In other words, imagine having to put up a whole node with 20 HDDs to match the performance of this single $350 mixed use SSD that consumes 20W at full tilt operation. You would have to buy a $4000 server with 20 $370 HDDs — which would, by the way, consume an average of maybe 300W. 
So as you can see, an SSD is the better deal, even from a purely financial perspective, whether you drive it hard or not.
Of course there are always edge cases. Ask us about your specific use case, and we can do a tailored head-to-head comparison for your specific application.
So what’s next?
We are already nearing the point where NVMe will supersede the SATA or SAS interface in SSDs. So the SSDs, which came out on top when we started this discussion, are already on their way out again.
NVMe has the advantage of being an interface created specifically for flash memory. It does not pretend to be on a HDD, as SAS and SATA do, thus it does not need the protocol translation needed to transform the HDD access protocol internally into a flash memory access protocol and transform back on the way out. You can see the difference by looking at the performance envelope of the devices. 
New flash memory technologies push the performance envelope and the interface increasingly hampers performance, so the shift from SAS/SATA to NVMe is imminent. NVMe even comes in multiple form factors, with one closely resembling 2.5” HDDs for hotswap purposes, and one (m.2, which resembles a memory module) for internal storage that does not need hot swap capability. Intel’s ruler design and Supermicro’s m.2 carriers will further increase storage density with NVMe devices.
On the horizon, new technologies such as Intel Optane again increase performance and resilience to wear, currently still much higher cost to traditional flash modules.  
Maybe a few years from now everything is going to be nonvolatile memory and we simply can cut power to the devices. Either way, we will see further increases in density, performance and reliability and further decrease in cost. 
Welcome to the future of storage!
The post The silent revolution: the point where flash storage takes over for HDD appeared first on Mirantis | Pure Play Open Cloud.
Quelle: Mirantis

How AIOps helps Nextel Brazil predict and prevent network outages

Mobile smartphones are playing a significant role in the lives and productivity of people around the world. Consider these statistics about smartphone usage from TechJury.

Internet users worldwide who visit the Web on a mobile device: 67%
Percent of emails read on mobile devices: 49.1%
Smartphone users are addicted to their phones: 66%

Clearly, many people today don’t want to (or can’t) be without their smartphones. And like all telecommunications companies, Nextel Brazil is trying to be as customer centric as possible. We strive to make customer service part of the DNA of the company and treat customers as our primary asset, because they are.
I have a great team of 75 people working with me. There are three shifts working morning, noon and night. We do the best we can to satisfy our customer needs because we know our subscribers depend on their mobile phones to work and live their lives. Every second that there’s a network outage and customers don’t have service, we have to be there for them. That’s what being customer centric means to us, especially in operations.
Reducing network outages and meantime to repair with IBM Netcool
Mean time to repair (MTTR) is the key performance indicator for us. We started our partnership with IBM when we began using IBM Netcool Operations Insight software to correlate alarms and get to the root cause of problems faster. We have more than 25,000 established network elements and multiple management systems being monitored by Netcool. The solution has helped us reduce the MTTR to receive an alarm and solve a problem in the field or with some configuration from 30 minutes to less than five minutes.
Still people don’t even want to wait one minute, never mind five minutes, to get their services recovered. As our services and network increase in complexity, so has the amount of data generated.
After approximately three years of maturing this solution, we started to say, “Hey, we can do better. We can be more proactive to treat the problem. Let’s start looking at the data. Let’s start with some analytics into the data.”
We have to be able to predict and to be prepared for network problems, because we know that they will happen. This is our day to day. We wanted to be better prepared for incidents and be able to make adjustments to avoid a network outage.
Moving from reactive to predictive with AIOps
We began working with IBM Watson technology to implement artificial intelligence for IT operations (AIOps). Watson helps us to categorize all the incidents, so we have a better understanding of what is happening in the network, such as if the outage is due to a utilities problem. More than just knowing we have a problem, Watson tells us why we have the problem. Now we can group incidents together and focus on fixing things at the source.
We’re also working with The Weather Company, an IBM Business to predict weather-related incidents and prevent them from impacting service. Our Network Operations team has a high dependence on utility companies because our cell towers are based on electric power. We will have a problem when they have a problem, and they are very dependent on the weather.
With the Weather Company data, we can correlate and look into historical data and know every time that we have a certain threshold of rain, of wind speed, of soil moisture, or whatever set of parameters, that we will have a problem with cell towers in this region.
If we know that one of these conditions is going to happen in the next 72 hours, we can be more prepared to act. As a result, we might send a small generator or extra batteries to the site to keep it up longer. By better knowing the probability and duration of the fault, we can prepare such that we can help avoid an outage for our customers in that region.
AIOps with Watson and The Weather Company data has helped us complete the journey to being predictive in network operations. It’s a great feeling to know that we don’t just have to wait for something terrible to happen and then react to it. We can actually do something about it before it happens. And this means that our customers who depend on their mobile phone are less likely to be without service.
Watch the related video.
The post How AIOps helps Nextel Brazil predict and prevent network outages appeared first on Cloud computing news.
Quelle: Thoughts on Cloud

How Full is my Cluster – Part 5: A Capacity Management Dashboard

Introduction
This is the fifth installment in the series regarding capacity management with OpenShift.

In the first post, we saw the basic notions on how resource management works in OpenShift and a way to visualize node and pod resource consumption using kube-ops-view.
In the second post, we illustrated some best practices on how to protect the nodes from being overcommitted.
In the third post, we presented best practices on how to setup a capacity management process and which metrics to follow.
In the fourth post, we introduced the vertical pod autoscaler operator was introduced along with its potential use as a way to estimate the correct sizing of pods.

In this post, we will introduce a ready to use dashboard for which you can base your capacity management process.
The primary goal of this dashboard is to answer a very specific question: Do I need a new node?
Naturally, this is not the only question for a well conceived capacity management process. But, it is certainly a good starting point. It can be used as a foundation for a more sophisticated dashboard that will fully support your capacity management process.
The Dashboard
This capacity management dashboard works on a group of nodes and will help deciding if you need more or less nodes for that group. You can have many node groups in your cluster, so the user can select the node group they want to work on with a drop down list box.
The following illustrates the contents of the capacity management dashboard:
This primary metrics presented are for memory and cpu. The way metrics are collected is the same for memory and CPU. The metrics always refer to aggregates within the selected node group. The Dashboard displays three metrics:

Quota/Allocatable ratio
Request/Allocatable ratio
Usage/Allocatable ratio

If you may recall from the second post on overcommitment, Allocatable is the amount of resources actually available to pods on a given node. This value is calculated out of the total capacity of a node after reserved resources for the operating system and other basic services, such as the container runtime service and the node service, have been accounted for.
In OpenShift 4.x, Prometheus Alerts are also set up during installation. These alerts trigger when the when the Quota/Allocatable ratio passes 100% and when the Request/Allocatable ratio passes 80%. 
You can find the dashboard and its installation instructions at this repository.  In order to work properly, this dashboard requires the cluster nodes and projects to be organized in a certain way, see at the end of the post more information on this.
In the next section we are going to explain how the collected metrics can be interpreted and used to deduce or forecast whether you need more nodes.
Interpretation of quota/allocatable ratio
This metric is collected as in the formula:

This metric can be interpreted as: what has been promised (the granted quota) vs the actual available amount (the allocatable).
Changes to his metric do not occur frequently and fluctuations  typically only occur when new projects are introduced. This metrics is suitable for making long term projections of the needed cluster capacity. Organizations with OpenShift deployments where the nodes cannot be scaled quickly (non-cloud deployments such as bare metal deployments) should most likely use this metric to make a decision on when to scale up nodes. 
Depending on your tolerance to risk, it can be ok for this metric to be above 100%, which signals that you have overcommitted the cluster. 
Interpretation of request/allocatable ratio
This metric is collected as in the formula:

This metrics can be interpreted as: the value tenants are estimating (if you recall from blog post #3 of this series on developing a capacity management process, resource requests on containers should correspond to the estimated value that will be needed at runtime) they would need versus the  actual available amount.
This metric is more volatile than the previous metric because the value changes when new pods are added/removed and is more suitable for making a scaling decision when a new node can be provisioned quickly. This is likely to happen when you are running on cloud environments. 
The OpenShift 4.x cluster autoscaler uses this metric indirectly. In fact, it will trigger the addition of a new node if a pod is stuck in a pending state if it cannot be allocated due to a lack of resources. This is approximately the same as triggering a scale up when this metric is measuring close to 100%.
With an approach based node groups, however, we can be more flexible than the cluster autoscaler because we can decide at which threshold we want to scale. For example, with this metric, we can be proactive and add a new node when the ratio hits 80%, so that no pods need to wait before it can be provisioned.
Interpretation of Usage/Allocatable ratio
This metric is collected as in the following formula:

This metrics can be interpreted as what is being currently used vs the amount of available resources. This is clearly the most volatile metric, as it depends on an instant by instant load of running pods.
This metric is generally poor for making capacity forecasts because the values fluctuate too often. However, it can be used to, provide a general overview of the cluster. 
The primary function of this metric is to be able to compare the actual (what we are using) with the estimates (the sum of requests from the previous ratio). If these two measures diverge, it means that our tenants are not estimating their resources correctly, and corrective actions are needed.
Another function of this metric is that it allows us to judge whether we have enough resources for the current workload. However, we have to be cognizant that the fact that at the node group level there are enough resources does not guarantee by itself that individual nodes are not resource constrained. For that reason, another set of metrics and alerts needs to be utilized. Fortunately, these node level metrics and alerts are included in the Prometheus setup that comes with OpenShift.
Assumptions
In order for the dashboard to be effective and accurate, there are several  assumptions that must be considered 
Nodes are grouped into non-overlapping groups. Groups are identified by a node label. The default for the label is nodegroup. Each node must belong to exactly one group. The capacity management dashboard will tell us if we need to add a node to a specific group. Groups can be used to manage zones or areas of the cluster that house different workloads and may need to scale independently. For example, there might be a cluster with high-priority workload nodes, normal priority workload nodes, PCI-dedicated nodes, GPU-enabled nodes, and so on. In OpenShift 3.x, there are always at least three groups: master, infranodes and worker nodes (obviously worker nodes can be fragmented further if needed). In OpenShift 4.x there is no such requirement that infranodes must exist, however the same concepts apply.
ClusterResourceQuotas are defined to limit tenants resource consumption. ClusterResourceQuota is an OpenShift-specific API object that allows for a quota to be defined across multiple projects as opposed to ResourceQuota, which is associated to just one project. The ability of a ClusterResourceQuota to be defined across several projects allows the administrator to choose the granularity at which quotas are applied, achieving higher flexibility. For example, an organization can choose to grant quotas at the application level (an application usually exist on multiple projects to support the environments as defined by a SDLC), the business capability level (usually business capability are provided by a set of applications) or finally, at the line of business level (a set of business capabilities). As a result, ClusterResourceQuotas also allow for a flexible showback/chargeback model should a company decide to enact those processes. 
ClusterResourceQuotas must be defined for memory and cpu requests. At the moment, these are the only monitored resources. Other resources can be put under quota, but they would be ignored by the dashboard.
Each ClusterResourceQuota refers to only one node group. For the dashboard to function properly, each ClusterResourceQuota must have the same label as the label used to determine the node group. For example:
apiVersion: quota.openshift.io/v1
kind: ClusterResourceQuota
metadata:
name: quota1
labels:
  nodegroup: group1
spec:
quota:
  hard:
    requests.cpu: “2”
    requests.memory: 4Gi
selector:
  labels:
    matchLabels:
      quota: quota1
Each tenant project refers to a ClusterResourceQuota and deploys resources to the corresponding nodegroup. Each tenant project must be controlled by one and only one ClusterResourceQuota. The project default node selector must be configured to select the nodes belonging to the node group the ClusterResourceQuota refers to. For example:

kind: Namespace
apiVersion: v1
metadata:
 name: p1q1
 labels:
   quota: quota1
 annotations:
   openshift.io/node-selector: nodegroup=group1
Non-tenant projects, such as for example administrative projects, do not have to be under quota. However, the dashboard shows more accurate information if every project deployed on the monitored node groups is under quota.
The recommended approach is to define node labeling at cluster setup and to define the ClusterResourceQuotas and the project during the application onboarding process. The application onboarding process is the set of steps a development team must go through to be able to operate an application on OpenShift. Most organizations have a formalized process detailing these steps.
The below pseudo entity relationship diagram represents the configuration one would attain by the end of this preparation:

Conclusions
In this article, a capacity management dashboard was introduced that can be used as the baseline for an organization’s capacity management process.
With OpenShift 4.x and the introduction of the cluster autoscaler, the urgency of having such a dashboard may be reduced. However, the autoscaler is not always an option and even when it is implemented, it is currently very reactive (it triggers only when a pod cannot be allocated due to lack of capacity). As a result, this dashboard should provide value in understanding capacity management with OpenShift
 
The post How Full is my Cluster – Part 5: A Capacity Management Dashboard appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift