RHEL for Edge: What's new in RHEL 8.4

In November of 2020, we announced a deployment option for Red Hat Enterprise Linux aimed at solving challenges common to edge computing environments. With the release of Red Hat Enterprise Linux (RHEL) 8.4, we are excited to continue the momentum in bringing our customers an operating system experience aimed at simplifying and securing workloads outside the core data center.
Quelle: CloudForms

Take control of your firewall rules with Firewall Insights

Corporate firewalls typically include a massive number of rules, which accumulate over time as new workloads are added. When rules stack up piecemeal like this, misconfigurations occur that, at best, create headaches for security administrators, and at worst, create vulnerabilities that lead to security breaches.  To address this, we have introduced the Firewall Insightsmodule in our Network Intelligence Center, which provides a single console for managing Google Cloud network visibility, monitoring and troubleshooting. What are Firewall insights?Historically, there hasn’t been an easy way to deal with the accumulation of complicated firewall rules. That was until we created Firewall Insights, which provides metrics reports and insight reports. These two reports contain information about firewall usage and the impact of various firewall rules on your VPC network. Even better, these insights and metrics are integrated into your Google Cloud Console for the VPC firewall and are also available via APIs.You can use metrics reports to verify that firewall rules are being used appropriately and as intended. This report can uncover leftover rules from the past that are not actively used, review that the firewall rules allow or deny what is intended, perform live debugging of connections dropped, and leverage Cloud Monitoring to discover malicious attempts to access your network. You can use insight reports to identify firewall misconfigurations, detect security attacks, and optimize and tighten your security rules.Let’s take a look at how these reports work.Metrics Report, a deeper diveMetrics insights analyzes your VPC firewalls usage by tracking metrics such as firewall hit counts and last used.  Let’s check this out with an example:When you navigate to your VPC network and click on Firewall, ensure that you have selected Logs, Hit count and also Last hit in the column display options:Figure 1.1 – VPC Firewall rules filter optionsAfter this is enabled, now you should be able to see your VPC firewall rules with the hit count and last hit metrics. In the image below you can see that if logs are not enabled for rules, hit count and last hit metrics will not be collected. If logs are enabled, these details are collected, and shown in the VPC firewalls console.Figure 1.2 – VPC Firewall rules with hit count, last hit and logsAs you can see in Figure 1.2, each firewall rule which has logging enabled will show hit counts, so based on the example above, the rule “uc1-db4-deny-http”  has been hit 109,154 times, the last time being 2021-03-10 (13:10:00). If you want to dive even deeper, you can click on the hit count and this will bring you to the logs viewer page where you can expand these logs and analyze all the details:Figure 1.3 – Logs viewerIn order to enable logs for a particular firewall rule, you can edit it and turn logs on. By default, metadata is added in firewall logs. If you want to reduce the log size, you can do so by excluding these additional fields. Excluding these metadata fields will not impact the functionality of Firewall Insights.Figure 1.4 – Enabling logs and metadata for a VPC firewall ruleFrom the “Firewall” page, you can also multi-select a group of firewall rules and turn on all logs at once.Firewall rule usage metrics are accurate only for the period of time during which Firewall Rules Logging is enabled.Insights Report, a deeper diveInsights Report provides an intelligent analysis of the configuration of your firewalls. A report can contain one or more insights. We will see examples now about the different insight reports which include:Shadowed firewall rulesAllow rules with no hit in the last six weeksDeny rules with hits in the last 24 hoursBy definition, a shadowed rule is a VPC firewall rule that is completely overshadowed by one or more firewall rules with higher or equal priority. We call that shadowing rules. Let’s see an example on how to find shadowed rules:Let’s say that you try to use ping between two VMs but it fails. Let’s take a look at the firewall rules applied to the VM:Figure 1.5 – Firewall rules enforced on a VMIf we look closely, we can see that there are two rules with the same target tag, and those are almost identical except their source IP range. The IP range of the deny rule “uc1-app2-deny-all” includes the IP range of the rule allow “uc1-app2-allow-app1”. Therefore, we can see that “uc1-app2-allow-app1” is shadowed by “uc1-app2-deny-all”, and the ping between these VMs failed due to that.To avoid such incidents in the future, Firewall Insights provides you with a convenient list of rules overshadowed by other firewall rules.To see shadowed rules, you can click on the column selector, and add “Insights” to the firewall and route details:Figure 1.6 – Add Insights to the firewall rules enforced on a VMOnce this is applied, you will be able to see shadowed rules here:Figure 1.7 – Insights enabled for firewall rules enforced on a VMAlternatively, you can navigate to Network Intelligence > Firewall Insights, where we will see the shadowed rule report:Figure 1.8 – Shadowed rulesIf we click on the insight we will get more details:Figure 1.9 – Shadowed firewall rules detailGoing back to the firewall insights card (Figure 1.8), we can also identify that “uc1-db4-allow-app3” is shadowed by a combination of two rules and if we clicked on the insight this would provide all the details.From the Firewall Insights dashboard, let’s move on to “allow rules with no hit”, where we can see firewall rules that have not logged any hits in the past six weeks. Such rules could be as a result of a misconfiguration, or leftover rules from past deployments. Tighten your security boundaries by identifying and removing such rules. If we go back to Network Intelligence > Firewall Insights, we can see allow rules with no hit:Figure 1.10 – Allow rules with no hitWe can drill down to view the full list of rules with no hit. We can see, for instance, one of the rules, the bottom one “uc2-app1-allow-internet”, might be a leftover rule to allow internet access from a past deployment, which means it has little likelihood of being hit in the future. So you may want to consider removing it from the rule set:Figure 1.10 – Allow rules with no hit detailClick on the firewall rule to review all the details, and take a look at the prediction, which is made based on the hit pattern that we saw for similar rules in the same organization:Figure 1.11 – Allow rules with no hit insight detailsLet’s go back to the Firewall Insights dashboard and move on to the last category, “deny rules with hits in the last 24 hours”, which help you capture the traffic blocked by deny rules. These traffic types could indicate external attacks into your VPC network, or compromised VM instances attempting to send traffic out of your network. Let’s examine future hit prediction on a firewall rule. Navigate to Network Intelligence > Firewall Insights and take a look at the deny rules with hits card:Figure 1.12 – Deny rules with no hitIf your click on the “uc4-web-deny-web” rule, you can see the hit count monitoring, and if you click on the hit number we can dive deeper in the logs:Figure 1.13 – Deny rules with no hit detailFigure 1.13 – Deny rules with no hit logs viewer detailThese tips should help you effectively leverage Firewall Insights to gain better control over a massive firewall rule set. Automate shadowed rule detection, quickly troubleshoot misconfigured rules, effortlessly pinpoint the overgranting rules hidden in the rule set, and identify the failed attempts to break into your network that were rejected by your firewall rules. To learn more about Firewall Insights, please refer to our documentation. Additionally, to see how we’re advancing intelligent automation in network security, check our recent blog post.Related ArticleHow we’re advancing intelligent automation in network securityWe’re announcing four new capabilities to help customers protect their users, data, and applications in the cloud.Read Article
Quelle: Google Cloud Platform

Google Cloud Research Innovators drive scientific advances on the cloud

In the past, it was rare for an astrophysicist to use the same tools as a molecular biologist. But the growth of scientific data and the shift to cloud computing is driving a convergence of computational tools and methods. Techniques such as computer vision can be applied across data as different as satellite imagery and pathology slides. As new discoveries increasingly depend on large-scale data analysis, researchers benefit from sharing technology expertise across otherwise unrelated disciplines.To encourage the cross-pollination essential to scientific advances, Google Cloud announced the Research Innovators program. Although these researchers come from the physical, social, and biological sciences, they all share one thing in common—they’re leveraging the cloud to make scientific breakthroughs. As a result, the program supports a global community of researchers using Google Cloud to solve the impossible and includes world-renowned experts who will collaborate across various scientific domains and create real world solutions, open source projects, and technical documentation—all with expanded support from Google and exclusive access to Google programs and technical resources.Today, we’re excited to announce our inaugural cohort of 31 Google Cloud Research Innovators, spanning 30 institutions and eight countries across both industry and academia. Their projects can be advanced rapidly by cloud-native technologies developed to address needs outside of research. For example, Dr. Benedikt Riedel and the IceCube Collaboration are investigating using Cloud Pub/Sub messaging to reliably perform distributed calculations while taking advantage of preemptible VMs to reduce cost in the field of multi-messenger astrophysics. Similarly, Dr. Teodora Szasz and the MiiE (Messages, Identity, and Inclusion in Education) Lab are using Vision AI to measure the representation of race and gender in children’s books through images and text. This can expand our understanding about diversity in content and can help us overcome the structural inequality that pervades society and our daily lives. Cloud technologies are accelerating sciences from solar to cellular scales.The range and scale of these projects show how cloud computing offers scientists more than just efficiency, compute power, and speed; these innovative researchers are using the cloud to ask new questions and imagine new solutions that they never could have envisioned before. To get started with Google Cloud, apply for free credits towards your research.Related ArticleGoogle Cloud initiatives offer researchers critical support during the pandemicOur new initiatives offer crucial support to overburdened researchers in these difficult times.Read Article
Quelle: Google Cloud Platform

Improving NCCL performance for cloud ML applications

The cloud is a great option for training deep neural networks because it offers the ability to scale on demand for specialized machine learning (ML) hardware, which provides increased agility. In addition, the cloud makes it easy to get started, and it provides pay-as-you-go usage models. Cloud also brings the latest GPU technologies to customers as and when it’s developed.A key barrier to adopting deep neural networks on large datasets is the time and resources required to train them. Since using deep learning models in production requires frequent retraining, minimizing training time becomes critical. Also, to drive higher prediction accuracy, models are getting larger and more complex, thus needing a high volume of compute and storage resources. Price to performance still remains a concern for customers, and optimizing inter-node latency is crucial for effective use of cloud for deep learning workloads. ML frameworks (such as TensorFlow or PyTorch) use NCCL libraries for distributed inter-node GPU communications. The NVIDIA Collective Communication Library (NCCL) is a commonly-used communication library that provides multi-GPU and multi-node communication integral for neural networks to converge quickly. NCCL also provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter, and point-to-point send and receive. Routines are optimized to achieve high bandwidth and low latency within a node and over the network across nodes.Reducing NCCL message latency is one vital element of delivering high application performance and scalability. We recently introduced several features and tunings that make it easy to run NCCL workloads and achieve optimal performance on Google Cloud. Collectively referred to as ‘best practices,’ these updates reduce NCCL latency and benefit applications that depend on small messages and collective operations.  This blog post demonstrates how to optimize GCP infrastructure to minimize deep learning training times. To achieve this, we’ll look at distributed/multi-node synchronous training using NCCL.Achieve optimal NCCL performance in Google Cloud1. Use latest DLVM with NCCL Fast SocketTo maximize NCCL collective communication performance for distributed ML training, we have designed a communication library called NCCL Fast Socket. Developed as an NCCL transport plugin, NCCL Fast Socket introduces optimizations that significantly improve NCCL performance on Google Cloud. The optimizations include:Use of multiple network flows to achieve maximum throughput. NCCL Fast Socket introduces additional optimizations over NCCL’s built-in multi-stream support, including better overlapping of multiple communication requests.Dynamic load balancing of multiple network flows. NCCL can adapt to changing network and host conditions. With this optimization, straggler network flows will not significantly slow down the entire NCCL collective operation.Integration with Google Cloud’s Andromeda virtual network stack. This increases overall network throughput by avoiding contentions in both Andromeda and guest virtual machines (VMs).Most importantly, the core NCCL library can  dynamically load NCCL Fast Socket at run time. Hence, Google Cloud users can take advantage of NCCL Fast Socket without changing or recompiling their applications, ML frameworks (such as TensorFlow or PyTorch), or even the NCCL library itself.Currently, the latest Deep Learning VM (DLVM) image release includes NCCL Fast Socket. It’s activated automatically when you use the NCCL library included in DLVM. You can also manually install NCCL Fast Socket by following the instructions here.2. Use gVNICTo achieve better network throughput in NCCL, be sure to enable Google Virtual NICs (gVNICs) when creating VM instances. For VMs with high performance and high network throughput requirements—such as those with GPUs and used for distributed ML training—we recommend using gVNIC as the default network interface. Currently, gVNIC can support network throughput up to 100 Gbps which provides a significant performance boost to NCCL. For detailed instructions on how to use gVNICs, please refer to the gVNIC guide. DLVM also provides images that support gVNIC out of the box.3. Maximize available bandwidthWithin a VM, GPU communication can take advantage of NVIDIA NVLink, which can achieve higher throughput than networking between VMs. Therefore, we recommend packing as many GPUs as possible in a single VM to maximize the usage of NVLink bandwidth. When working with multiple VMs, we recommend using high vCPU count (96) to maximize available networking bandwidth. Find more information on the network bandwidth and GPUsdocumentation page.4. Use compact placement policiesCertain GPU VM types (such as N1) can support compact placement policies. AllocatingVMs close to each other results in improved network performance and less interference. Because distributed training can be latency sensitive, be sure to use a VM type that supports compact placements. You can find more information on the placement policy documentation page.Examples of improved performance data:NCCL Fast Socket improves performance for both NCCL collective operations and distributed training of ML models. We show some examples below which compare NCCL Fast Socket with stock NCCL. (In our tests, NCCL Fast Socket used the same number of network flows and helper threads.) We used the gVNIC driver version v1.0.0.NCCL All_ReduceAll_Reduce is one of the most important collective operations in NCCL. It’sused frequently for gradient aggregation in distributed training of ML models. In the figure, we show the performance of NCCL All_Reduce tests with 16 NVIDIA V100 GPUs on two VMs. From the figure, we can see that NCCL Fast socket improves All_Reduce performance significantly and consistently across a range of message sizes. (Higher bars indicate better performance in the figure.)BERT-LargeWe also tested performance improvement of NCCL Fast Socket on a TensorFlow model: BERT-Large. We conducted the test using 64 NVIDIA V100 GPUs on eight VMs. We can see that the use of NCCL Fast Socket increases the training speed of BERT-Large significantly (by around 30%).NCCL applications on Google Cloud is now faster and easier to run Running NCCL applications on Google Cloud has never been easier. Applying these best practices (including using NCCL Fast Socket) can improve application performance. To simplify deployment of NCCL Fast Socket, we included it in the DLVM images to get the best out-of-the-box performance for your NCCL workloads on Google Cloud.To learn more, visit our documentation on using higher network bandwidth speeds with Fast Socket.Special thanks to Chang Lan and Soroush Radpour for their contributions.Related ArticleRead Article
Quelle: Google Cloud Platform

Answering Your Questions at DockerCon LIVE 2021

 Guest post by Docker Captain Bret Fisher, a DevOps consultant and the creator of the popular Docker Mastery Udemy course. Join us for DockerCon LIVE 2021 on Thursday, May 27. DockerCon LIVE for a free, one day virtual event at https://dockr.ly/2PSJ7vn

I have the pleasure of hosting many of the live events at DockerCon this year. You may remember my 7+ hour non-stop live stream from last year’s DockerCon LIVE 2020 with nearly 20 guests:

We’re back!

This year we’re calling them Live Panels. You’ll find them in their own track in the schedule.

If you’ve never visited one of my live streams before, they tend to be DevOps focused, and as practical and real-world as we can be. Come ready to ask my guests questions in chat on our selected topics, and we’ll do our best to answer as many as we can! You get to guide the conversation with the live stream chat Q&A.

I’m hosting three live panels on three topics. I wanted to discuss the top three things that I think are the hottest topics in Docker and Cloud Native container tech today for developers and DevOps professionals, so be sure to stop by my streams:

Easy CI With Docker

Thursday, May 27th, 10:45 AM US Pacific UTC -7

I’m joined by a panel of CI professionals as we take your questions live for an hour on all things Docker testing!  We’ll also discuss our opinions on our favorite tools, workflows, and where we think container-based CI is heading.

FutureOps: End-to-end GitOps

Thursday, May 27th, 1:00 PM US Pacific UTC -7

Now getting into more of the continuous deployment realm, my guest panel of DevOps professionals will take your questions for an hour on topics like CD, GitOps, and infrastructure-as-code!  We’ll also discuss our favorite tools, workflows, and guess where IaC and GitOps is heading.

Sysadminless: Running Containers Without Servers

Thursday, May 27th, 3:15 PM US Pacific UTC -7

The future of sysadmin is YAML!  Or maybe. I’m joined by a panel of infrastructure professionals and we take your questions live for an hour on all things cloud infra and self-managing container solutions!  We’ll also discuss our favorite tools, services, and where “sysadminless” might be headed.

I can’t wait to host another DockerCon live stream and I hope you’ll join us on the 27th for tons of sessions and live streams on the future of Docker. You can find me at @BretFisher and bretfisher.com.

Join us for DockerCon LIVE 2021 on Thursday, May 27. DockerCon LIVE is a free, one day virtual event that is a unique experience for developers and development teams who are building the next generation of modern applications. If you want to learn about how to go from code to cloud fast and how to solve your development challenges, DockerCon LIVE 2021 offers engaging live content to help you build, share and run your applications. Register today at https://dockr.ly/2PSJ7vn

The post Answering Your Questions at DockerCon LIVE 2021 appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/