Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 2

Ready for more Fast Packets?!
In Part 1 we reviewed the fundamentals of achieving zero packet loss, covering the concepts behind the process. In his next instalment Federico Iezzi, EMEA Cloud Architect with Red Hat continues his series diving deep into the details behind the tuning.

Buckle in and join the fast lane of packet processing!

Getting into the specifics
It’s important to understand the components we’ll be working with for the tuning. Achieving our goal of zero packet loss begins right at the core of Red Hat OpenStack Platform: Red Hat Enterprise Linux (RHEL).
The tight integration between these products is essential to our success here and really demonstrates how the solid RHEL foundation is an incredibly powerful aspect of Red Hat OpenStack Platform.
So, let’s do it …
SystemD CPUAffinity
The SystemD CPUAffinity setting allows you to indicate which CPU cores should be used when SystemD spawns new processes. Since it only works for SystemD managed services two things  should be noted. Firstly, the kernel thread has to be managed in a different way and secondly, all user executed process must be handled very carefully as they
Photo: CC0-licensed (Fritzchens Fritz)
might interrupt either the PMD threads or the VNFs. So, CPUAffinity is, in a way, a simplified replacement for the kernel boot parameter isolcpus. Of course, isolcpus does much more, such as disabling kernel and process thread balancing, but it can often be counter-productive unless you are doing real-time and shouldn’t be used.
So, what happened to isolcpus?
Isolcpus was the way, until a few years ago, to isolate both kernel and user process to specific CPU cores. To make it more real-time oriented, the load balancing between the isolated CPU cores was disabled. This means that once a thread (or a set of threads) is created on an isolated CPU core, even if it is at 100% usage, the Linux process scheduler (SCHED_OTHER) will never move any of those threads away. For more info check out this article on the Red Hat Customer Portal (registration required).
IRQBALANCE_BANNED_CPUS
The IRQBALANCE_BANNED_CPUS allows you to indicate which CPU cores should be skipped when rebalancing the irqs. CPU core numbers which have their corresponding bits set to one in this mask will not have any IRQ’s assigned to them on rebalance (it can be double checked at /proc/interrupts).
Tickless Kernel
Setting the kernel boot parameter nohz prevents frequent timer interrupts. In this case it is common practice to refer to a system as “tickless.” The tickless kernel feature enables “on-demand” timer interrupts: if there is no timer to be expired for, say, 1.5 seconds when the system goes idle, then the system will stay totally idle for 1.5 seconds. The results will be fewer interrupts per second instead of scheduler interrupts occurring every 1ms.  
Adaptive-Ticks CPUs
Setting the kernel boot parameter nohz_full to the specific isolated CPU core value ensures the kernel doesn’t send scheduling-clock interrupts to CPUs in a single, runnable task. Such CPUs are said to be “adaptive-ticks CPUs.” This is important for applications with aggressive real-time response constraints because it allows them to improve their
Photo: CC0-licensed. (Roland Tanglao)
worst-case response times by the maximum duration of a scheduling-clock interrupt. It is also important for computationally intensive short-iteration workloads: If any CPU is delayed during a given iteration, all the other CPUs will be forced to wait in idle while the delayed CPU finishes. Thus, the delay is multiplied by one less than the number of CPUs. In these situations, there is again strong motivation to avoid sending scheduling-clock interrupts. Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
RCU Callbacks Offload
The kernel boot parameter rcu_nocbs, when set to the value of the isolated CPU cores, causes the offloaded CPUs to never queue RCU callbacks and therefore RCU never prevents offloaded CPUs from entering either dyntick-idle mode or adaptive-tick mode.
Fixed CPU frequency scaling
The kernel boot parameter, intel_pstate, when set to disable disables the CPU frequency scaling, setting the CPU frequency to the maximum allowed by the CPU. Having adaptive, and therefore varying, CPU frequency results in unstable performance.
nosoftlockup
Photo: CC0-licensed. (Alan Levine)
The kernel boot parameter nosoftlockup disables logging of backtraces when a process executes on a CPU for longer than the softlockup threshold (default is 120 seconds). Typical low-latency programming and tuning techniques might involve spinning on a core or modifying scheduler priorities/policies, which can lead to a task reaching this threshold. If a task has not relinquished the CPU for 120 seconds, the kernel will print a backtrace for diagnostic purposes.
Dirty pages affinity
Setting the /sys/bus/workqueue/devices/writeback/cpumask value to the specific cpu cores that are not isolated creates an affinity with the kernel thread which prefers to write dirty pages.
Execute workqueue requests
Setting the /sys/devices/virtual/workqueue/cpumask value to the cpu cores that are not isolated defines which kworker should receive which kernel task to do such things as interrupts, timers, I/O, etc.
Disable Machine Check Exception
Setting the /sys/devices/system/machinecheck/machinecheck*/ignore_ce value to 1, disables machine check exceptions. The MCE is a type of computer hardware error that occurs when a computer’s central processing unit detects a hardware problem.
KVM Low Latency
Both standard KVM modules and Intel KVM modules support a number of options to
Photo: CC0-licensed. (Marsel Minga)
reduce latency by removing unwanted VM Exit and interrupts. 
Pause Loop Exiting
In the kvm module, the parameter kvmclock_periodic_sync is set to 0.
Full details can be found on page 37 of “Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3C: System Programming Guide, Part 3 (pdf)”
Periodic Kvmclock Sync
In the kvm_intel module, the parameter ple_gap is set to 0.
Full details are found in the upstream kernel commit.
SYSCTL Parameters
Some sysctl parameters are inherited as the network-latency and latency-performance tuned profiles are children of the cpu-partitioning profile. Below are the essential parameters for achieving zero packet loss:

Parameter
Value
Details

kernel.hung_task_timeout_secs
600
Increases the Huge task timeout; however, no error will be reported given the nosoftlockup in the kernel boot parameters. From cpu-partitioning profile.

kernel.nmi_watchdog
0
Disables the Non-Maskable Interrupt (type of irq which gets force executed). From cpu-partitioning profile.

vm.stat_interval
10
Sets the refresh rate value of the virtual memory statistics update. The default value is 1 second. From cpu-partitioning profile.

kernel.timer_migration
1
In an SMP system, tasks are scheduled on different CPUs by the scheduler, interrupts are balanced across all of the available CPU cores by the irqbalancer daemon, but timers are still stuck on the CPU core which has created them. Enabling the timer_migration option, latest Linux Kernels – https://bugzilla.redhat.com/show_bug.cgi?id=1408308  – will always try to migrate the times away from the nohz_full CPU cores. From cpu-partitioning profile.

kernel.numa_balancing
0
Disables the automatic NUMA process balancing across the NUMA nodes. From network-latency profile.

Disable Transparent Hugepages
Setting the option “transparent_hugepages” to “never,” disables transparent hugepages. This is a way to force smaller memory pages (4K) to be merged into bigger memory pages (usually 2M).
Tuned parameters
The following tuned parameters should be configured to provide low-latency and disable power saving mechanisms. Setting the CPU governor to “performance” runs the CPU at the maximum frequency.

force_latency = 1
governor = performance
energy_perf_bias = performance
min_perf_pct = 100

Speeding to a conclusion
As you can see, there is a lot of preparation and tuning that goes into achieving zero packet loss. This blogpost detailed many parameters that require attention and tuning to make this happen.
Next time the series finishes with an example of how this all comes together!
Love all this deep tech? Want to ensure you keep your Red Hat OpenStack Platform deployment rock solid? Check out the Red Hat Services Webinar Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud today! 

The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.
Quelle: RedHat Stack

The Most Important DevOps Metric to Measure

Over the past few weeks, I’ve had the great privilege to partner on a series of roadshows with Gene Kim (@realgenekim) author of “The Phoenix Project”, “The DevOps Handbook” and the annual “State of DevOps Report”. The events are called “Culture, Containers and accelerating DevOps, the path to Digital Transformation” and they provide us with an opportunity to […]
Quelle: OpenShift

Tech N’ Talk: Kubernetes 1.7 Update with Aparna Sinha (Google)

Watch the first installment of the newly launched Tech N’ Talk podcast series for a conversation with Google’s Aparna Sinha on what’s new in the latest release of Kubernetes 1.7, a milestone release that adds security, storage, and extensibility features motivated by widespread production use of Kubernetes in some of the most demanding enterprise environments.
Quelle: OpenShift

IBM and Automation Anywhere: A new partnership to reinvent business process

As the digital economy continues to grow and evolve, some companies face the challenge of managing increasingly complex business processes that involve massive amounts of data. The growth of the digital economy connects people and organizations in new ways. But it has also generated repetitive tasks like requiring employees to manually enter data. Companies need a way to more efficiently process and manage data so that employees can spend more time on truly value-added parts of their job.
Today, we’re excited to announce a new partnership that helps IBM do just that. We’re teaming up with Automation Anywhere, the market leader in robotic process automation, to help companies like yours digitize their operations and drive new efficiencies across business processes. How? We’re bringing a set of tools to the table that automate and simplify data-intensive tasks within processes managed by IBM’s process management software.
Through our collaboration, we aim to help companies free their employees to focus on more meaningful aspects of their jobs. We want your processes to work for you—not the other way around.
Why now?
Today, knowledge workers spend far too much of their time doing “swivel chair” integration by copying and entering data across disjointed systems. Robotic Process Automation (RPA) automates these repetitive tasks.  When RPA is used in conjunction with business process management (BPM) and IBM Operational Decision Manager (ODM), we can reduce the level of investment needed to automate a process while maintaining total visibility of the process. What’s more, you can bring software bots into any critical business processes, regardless of the new or heritage systems that are included. Importantly, you can integrate all of the elements—including people, systems and bots—into one seamless process.
What this means for our clients
Say you run a bank. Your employees deal with thousands of complicated forms every day to open new accounts. This requires accuracy, time and effort. Our new solution could help in two ways. IBM’s process management software could help manage the broader process of opening accounts, while Automation Anywhere’s bots could pull information from different sources to help fill out the forms more quickly and accurately. Immense opportunity can be created when robots seamlessly help your employees do their work or perform tasks alongside them. This level of collaboration can shorten project lifecycle times, accelerate time-to-value for automation investments and boost ROI.
Interested in learning more? Join us on August 2nd  2017, at 12:30 PM Eastern Daylight Time (EDT) to hear more about this exciting new partnership. In the broadcast you can learn:

Why your peers are adopting RPA and why you should too
New opportunities for automation and how you can take advantage of them
How to integrate RPA into a broader process transformation strategy

The post IBM and Automation Anywhere: A new partnership to reinvent business process appeared first on Cloud computing news.
Quelle: Thoughts on Cloud

VCAS improves outcome for crime victims with a cloud-based case management system

In October 2007, Fiona Pilkington and her disabled daughter Francesca Hardwick died in a car fire Pilkington herself set. She saw no other way out of their unbearable circumstances. Over a 10-year period, the family was harassed and bullied by groups of local children.
The system failed the Pilkington family. Police missed opportunities to record information, investigate or take any action. The family should have been recognized as vulnerable but was not. Then it was too late.
This is why public service agencies need an effective multi agency case management application such as the Flexible Information Reporting Tool (FIRsT).
Using FIRsT to support victims
The Victim Care and Advice Service (VCAS) helps victims of crime to cope and subsequently recover following their victimization. VCAS uses the FIRsT system to record details of the incident, and assist in the development of a victim needs assessment and subsequent recovery plan. The system allows the victim care officer and their supervisor to monitor the case under supervision, carry out “quality of service” surveys and provide “outcome and output” monitoring data.
FIRsT is a secure, cloud-based CRM application that can act as a multi-agency case management system to connect public safety teams such as police, fire and rescue services, health and social care agencies, domestic violence organizations, mental health facilities, and housing associations, enabling them to share information and collaborate.
Screenshot showing council, police and fire incidents plotted on a map with details displayed for all users to view.
Here’s how FIRsT works: a victim care officer manages cases and provides support. FIRsT keeps track of basic client data and information about the crimes. A victim care officer performs a needs assessment to identify recovery and support needs. The assessment is broken down into eight specific areas, including physical/mental health, risk of repeat victimization, housing security, family and friends, social isolation, financial support needs, drug or alcohol misuse as a result of the crime, and attitude to recovery. The reasons why victims are vulnerable and in need of help to cope and recover vary. These could include if someone is young, old, a victim of a hate crime, has learning difficulties or a mental health illness, is a refugee or asylum seeker, or is lesbian, gay, bisexual or transgender. The main factor is that they need some help to move on following their victimization.

After the victim needs assessment is completed, the FIRsT system automatically generates a score that helps to identify the level of risk to the victim and is graded accordingly as either serious, high, medium or low. During the course of the support the grading of the case can fluctuate in response to the victims needs at the time. The victim care officer works together with the victim on a coping and recovery plan.
Everything is documented. Every contact, whether it be by telephone, letter or when a victim care officer engages with partner agencies, is recorded in the notes section of FIRsT. Doing this gives a very good history of what support has been given and helps to identify if there’s something that might have been missed.
Additionally, there’s an alert system that helps victim care officers manage their caseloads. If a case has been closed, it can easily be reopened if, for example, there is a court case coming up. The victim might be okay at the time (low score), but in as the court date approaches, they might be feeling anxious (high score). The victim care officer can re-engage, look at the needs assessment again and make adjustments.
FIRsT also enables supervisors to monitor victim care officers. It measures the difference based on needs assessment scores between the time a victim first engaged with VCAS and the time they left the service, which demonstrates agency impact.
A cloud-based solution
Because FIRsT is cloud-based, it is accessible from anywhere. The solution was developed by Orcuma Ltd, a software development company that concentrates mainly on the public sector community safety market.

FIRsT was born when Orcuma developers noticed that customer relationship management (CRM) products on the market weren’t flexible enough for some users and were expensive to implement. FIRsT can be tailored to other applications. For example, one of Orcuma’s clients uses it to manage their charity and volunteering case work.
Orcuma uses an IBM Bluemix bare metal server infrastructure to support FIRsT. The simple and speedy provisioning process and dedicated servers ensure compliance. Of utmost importance is that Bluemix provides security of data, uptime and availability without any dropout.
Meet Donald
Donald (not his real name), a VCAS client, is a man in his mid-50s with significant learning difficulties. He lived alone and had a job at a supermarket where he collected shopping carts. Donald used a moped to get himself to and from work.
Donald was targeted by local youths and over a period of 6 months, he suffered 6 burglaries and repeated acts of vandalism to his home. Events came to a head when his moped was stolen and set alight outside his address, this robbed him of a big piece of his independence, denying him the ability to travel to and from his workplace.
A group of neighborhood kids stole Donald’s moped and crashed it into his garage, breaking his windows. Then they set fire to the moped outside of his home. Donald woke up the next morning to find his garage had been ransacked and his moped was destroyed. He was devastated.
Following this incident, local youths began to taunt Donald in the street, mocking him about the fact that his moped had been destroyed and this had a significant impact upon his feeling of isolation and fear.
Local police invited VCAS to support Donald after he got into an altercation with kids that were harassing him. He lost his temper and retaliated, making threats in front of shoppers who didn’t realize the history, and that Donald was clearly at the end of his tether. The police conducted a risk assessment and identified Donald as being at a significant risk of further victimization, his mental health had been severely affected and he indicated that he had contemplated suicide.
As a result of these concerns a multi-agency meeting was held and the Victim Care and Advice Service were requested to provide Donald with emotional support and advocacy.
A needs assessment was conducted with Donald and a number of areas of vulnerability were identified and a subsequent recovery plan agreed. Donald’s needs assessment score was high. His mental state was very poor. VCAS held a multi-agency meeting to decide how to support him. He was taken to a mental health service cafeteria where he was introduced to people.
One key issue was Donald’s accommodation, he clearly needed to move from the area to escape the ongoing anti-social behavior and to be closer to his work place. Donald’s home was in a chaotic state due to his depression, he had not been tidying up and was hoarding large amounts of property, the housing providers would not consider moving him until the property was decluttered and cleaned.
VCAS enlisted the support of the local volunteer Police Cadets (young people aged 15-17 years) who spent a considerable amount of time at his home, cleaning and tidying the property. This had a huge positive impact upon Donald, who perhaps for the first time in his life was the center of people’s attention and he had the chance to meet decent and well-meaning youngsters rather than the criminal element who had abused him in the street. Donald began to take more of an effort in his appearance. He had his hair cut and no longer appeared disheveled. His self-esteem soared.
Some weeks later, there was an incident at Donald’s place of work, where he was harassed by the youths in the supermarket car park, Donald lost his temper and reacted with verbal abuse and threats. This was witnessed by a number of customers, who not knowing the backstory made formal complaints to the store’s management and Donald found himself in front of a discipline hearing facing dismissal from work.
Donald’s victim care officer attended his discipline hearing and was able to speak eloquently on Donald’s behalf drawing upon the facts and information recorded in the FIRsT system. Donald did not lose his job, and the store employed more security guards.
Eventually Donald did move house and is able to walk to work, although still affected by the impact of the crime he is in a much better and more positive place physically and mentally. His case has been closed but should he fall victim to any further crime, the FIRsT system has logged every phone call made to partner agencies, recovery strategies and other significant contacts which will provide an excellent blueprint for further support.
Donald’s needs assessment score was up and down during the time VCAS was engaged with him. The journey to recovery isn’t a straight line.
Donald is on a good path today. If he was to have further problems, VCAS are now in a position to refer to the case notes to establish the type of support that was most successful for him. There is therefore a really good blueprint for supporting him in the future.
Making a difference
FIRsT enables VCAS to see the differences made in people’s lives over time and demonstrate positive outcomes. The data collected in the system helps victim care officers identify the things that really matter for victims of crime.
Because of the FIRsT system, there is abundant evidence of VCAS’ activity, grading of needs and variable levels of risk. Everything is recorded, which gives the agency confidence in its processes.
The system likewise helps identify existing vulnerabilities that increase likelihood of people becoming victims in the first place such as having learning difficulties, suffering a mental illness, rough sleeping or having a drug or alcohol addiction.
VCAS is able to take preemptive steps prior to a crime, which reduces crime overall and improves the well-being of people and society as a whole. An example is support given to help to reduce the risk of becoming a victim of fraud.
Learn more about VCAS and Orcuma in the case study.
The post VCAS improves outcome for crime victims with a cloud-based case management system appeared first on Cloud computing news.
Quelle: Thoughts on Cloud

Removing ANSI Color Characters in Kibana

Removing ANSI escape characters from fields in the Kibana console gives both platform maintainers and application users the ability to fine tune how data is displayed and interpreted for analyzing the current state of applications. Get a walkthrough on using custom field formatters to remove ANSI codes using the aggregated logging framework in the OpenShift Container Platform.
Quelle: OpenShift

How to use your private cloud to optimize and innovate all your cloud apps

Today, IT operations managers and developers to expect more from their private clouds. With private cloud, they can achieve better speed-to-market, scalability, enhanced security, improved IT infrastructure manageability and flexibility.
Want to learn how to optimize and innovate your private cloud? Join the IBM cloud team for a webcast on July 24 at 11:00 AM Eastern Daylight Time (EDT) as we explore what’s possible with the advanced automation tool IBM Cloud Automation Manager. Learn how you can change the way you work with private clouds and their cloud based applications to improve delivery, agility, self-service and overall customer experience.
Register for the webinar today.
If you haven’t heard about our solution to manage your multi-cloud environment, here’s a quick overview:
IBM Cloud Automation Manager provides you with a single user interface from which your IT operation teams can effectively create, connect and manage applications and the associated IT infrastructure, ultimately leaving your team more time to value. Your developers can build and create applications within company policy and security while Cloud Automation Manager delivers a seamless end-to-end experience that allows your IT operations to deploy, automate and manage your IT environments.
During this webcast, experts will touch on the following key topics:

How to optimize legacy apps with cloud
How to extend your datacenter to work with cloud services
How to innovate new cloud-native applications

Join the webcast to learn how you can eliminate the pain of configuration, integration and management with Cloud Automation Manager. You will also learn how to create template-based provisioning of your IT services that enable your IT operations teams to see and manage your enterprise applications and define policies that control your specific security and regulatory needs.
Register for the webcast here.
You can also visit our website to learn more about IBM Cloud Automation Manager, and get started at no cost.
The post How to use your private cloud to optimize and innovate all your cloud apps appeared first on Cloud computing news.
Quelle: Thoughts on Cloud