Power your Azure GPU workstations with flexible GPU partitioning

Today we're sharing the general availability of NVv4 virtual machines in South Central US, East US, and West Europe regions, with additional regions planned in the coming months. With NVv4, Azure is the first public cloud to offer GPU partitioning built on industry-standard SR-IOV technology.

NVv4 VMs feature AMD’s Radeon Instinct MI25 GPU, up to 32 AMD EPYC™ 7002-series vCPUs with clock frequencies up to 3.3 GHz, 112 GB of RAM, 480 MB of L3 cache, and simultaneous multithreading (SMT).

Pay-As-You-Go pricing for Windows  deployments is available now. One- and three-year Reserved Instance and Spot Pricing for NVv4 VMs will be available on April 1. Support for Linux will be available soon.

Affordable, modern GPU powered virtual desktops in the cloud

As enterprises look to the cloud to provide virtual desktops and workstations in a secure way to a highly mobile workforce, they face the significant challenge of managing cost and performance while meeting user experience expectations. Traditionally, public clouds offered virtual machines with one or more GPUs, which are best suited for the most GPU intensive workloads that required the full power and resources of a GPU. But for the regular knowledge worker profile, a full GPU could be overkill. For some of these customers, multi-session virtual desktops like those offered by Windows Virtual Desktop fit the bill, by letting concurrent sessions share the GPU dynamically. However, some VDI customers need a dedicated virtual machine (VM) per user, either for performance or isolation reasons. For these kinds of workloads, customers are looking for a scale-down option to choose the right GPU size to meet the requirements.

Our customers needed cost-effective VM options that are sized appropriately with dedicated GPU resources for each user, starting from office workers running productivity apps to engineering workstations running GPU-powered workloads such as CAD, gaming, and simulation.

“With the new AMD-powered Workspot cloud desktops on Azure, we now have several perfectly sized cloud workstations for our different workloads. We’ve found the new entry level cloud workstation, using a fraction of the AMD GPU, is just right for our users running Microsoft Office 365 productivity tools and Adobe design tools (Photoshop, Illustrator and InDesign). This fills in an additional much-needed point on the price/performance curve, which allows us to move even more users to the AMD-powered Workspot cloud desktops on Azure.” Andy Knauf, CIO, Mead & Hunt

Pick the right GPU virtual machine size for the VDI user profile

The NVv4 virtual machine series is designed specifically for the cloud virtual desktop infrastructure (VDI) and the desktop-as-a-service (DaaS) markets. We wanted to bring GPU processing power to the masses by putting a slice of the GPU in every desktop in the cloud. NVv4 enables enterprises to provide modern desktops in the cloud, with the ideal balance of price and performance for their workloads.

The following diagram shows how the different VM sizes align with the different VDI user profiles and requirements.

“Based on the application requirements of each engineer, we can dedicate all or a fraction of the AMD GPU to their Workspot workstation on Azure. This finer resolution of control gives us the financial edge we need to move more people to Workspot cloud desktops on Azure and increase our overall productivity.”  Eric Quinn, CTO, C&S Companies.

Predictable performance and security with hardware partitioning of the GPU

In Azure, the security of the customer's workload is always a top priority. SR-IOV based GPU partitioning provides a strong, hardware-backed security boundary with predictable performance for each virtual machine. We partition a single AMD Radeon Instinct MI 25 GPU and allocate it up to eight virtual machines. Each virtual machine can only access the GPU resources dedicated to them and the secure hardware partitioning prevents unauthorized access by other VMs.

“The Azure NVv4 VM series offers ArcGIS Pro users an exceptional graphical user experience. The four NVv4 sizes provide flexibility to accommodate workloads ranging from light GIS editing to 3D manipulation. ”  Ryan Danzey, Sr. Product Engineer – Performance, ESRI ArcGIS

Designed to work with Windows Virtual Desktop and VDI partners you use today

Customers in the VDI segment have many choices for remote protocol and infrastructure management. We worked closely with the key partners to ensure support for NVv4 virtual machines.

Windows Virtual Desktop supports the new NVv4 virtual machines with native WVD deployments that use RDP as well as solutions delivered by Citrix and VMware, our approved providers.

NVv4 virtual machines support Microsoft Remote Desktop Protocol (RDP), Teradici PCoIP, and HDX 3D Pro. The graphics API support covers DirectX 9 through 12, OpenGL 4.6, and Vulkan 1.1.

Windows Virtual Desktop, Citrix, Teradici, Workspot, and Nutanix Frame are some of the Azure VDI partners who have extensively validated the new NVv4 virtual machines and are ready to offer it to their customers.

"This is exciting news for our Citrix customers who are delivering Citrix Workspaces from the cloud. As we see more customers migrate to the cloud, the release of the NVv4 instance ensures that customers have more options to deliver graphically accelerated  Citrix workloads  on Azure while optimizing costs." – Carisa Stringer, Sr Director Workspace Services Product Marketing

"The new Azure NV_v4 series will give our Xi Frame customers a wider range of GPU options for their virtual desktop and application streaming needs. By enabling virtualized GPUs in the cloud, Azure now delivers a whole new level of value that unlocks a much broader set of use cases."  Carsten Puls, Sr. Director of Xi Frame at Nutanix.

“The flexibility that Azure NVv4 provides to share and access GPU resources as needed is a valuable feature that we see will benefit many Teradici customers. We are excited to be working with Microsoft and AMD to enable more flexible, cost-effective GPU options for virtual desktop and virtual workstation use cases such as AEC.”  Ziad Lammam, Vice President of Product Management at Teradici

“With the new AMD-powered Workspot cloud workstations and the use of industry leading cloud offerings in Azure, ASTI and Workspot are positioned to address the needs of the SMB market for Virtual Desktop Infrastructure in the AEC industry. These new AMD-powered systems will provide the computing power and graphics power of enterprise class systems, that allow an organization to spend less time managing their resources and more time completing projects.  They provide a balance of computing power and graphics performance without costly over provisioning.” Doug Dahlberg, Director of IT Operations, Applied Software (ASTI) – Workspot and Microsoft Partner

Next steps

For more information on topics covered here, see the following documentation:

NVv4 virtual machine documentation.
Virtual machine pricing.
AMD EPYC™ 7002-series.

Quelle: Azure

Xbox Series X: Microsoft legt technische Daten offen

Im 360-mm²-Chip der Xbox Series X stecken acht Kerne und eine RDNA2-Grafikeinheit mit Raytracing. Die asymmetrischen 16 GByte Arbeitsspeicher werden mit einer 1-TByte-NVMe-PCIe-Gen4-SSD kombiniert. Die Microsoft-Konsole hat knapp sieben Liter Volumen bei 30 cm Höhe. (Xbox Series X, Microsoft)
Quelle: Golem

Introduction to Security Contexts and SCCs

With Role Based Access Control, we have an OpenShift-wide tool to determine the actions (or verbs) each user can perform against each object in the API. For that, rules are defined combining resources with the API verbs into sets called roles, and with the role binding we attribute those rules to users. Once we have those Users or Service Accounts, we can attribute them to particular resources to give them access to those actions. For example, a Pod may be able to delete a ConfigMap, but not a Secret when running under a specific Service Account. That’s an upper level control plane feature that doesn’t take into account the underlay node permission model, meaning the Unix permission model, and some of it’s newer kernel accouterments.
So, the container platform is protected with good RBAC practices from it’s created object,s but the node may not be. That is where a Pod may not be able to delete an object in etcd using the API because it’s restricted by RBAC, but it may delete important files in the system and even stop kubelet if properly programmed for that. To prevent this scenario, SCCs (Security Context Constraints) can come to the rescue.
Linux Processes and Privileges
Before going into deep waters with SCCs, let’s go back in time and take a look at some of the key concepts Linux brings to us regarding processes. A good start is entering the command man capabilities on a Linux terminal. That’s the manual page that contains very important fundamentals to understand the goal behind the SCCs.
The first important distinction that we need to do is between privileged and unprivileged processes. While privileged processes will have user ID 0 being the superuser or root, unprivileged processes will have non-zero user IDs. Privileged processes bypass kernel permission checks. That means that the actions that a process or thread can perform on operating systems objects such as files, directories, symbolic links, pseudo filesystems (procfs, cgroupfs, sysfs etc.) and even memory objects such as shared memory regions, pipes and sockets… Those actions are unlimited and not verified by the system. Meaning, the kernel won’t check user, group or others permissions (taking from the Unix permission model UGO – user, group and others) to grant access to that specific object in behalf of the process.
If we look at the list of running processes on a Linux system using the command ps -u root we will find very important processes such as systemd for example that has the PID 1 and is responsible for bootstrapping the user space in most distributions and initializing most common services. For that it needs non restricted access to the system.
Unprivileged processes, though, are subject to full permission checking based on process credentials (user ID, group ID and supplementary group list etc.). The kernel will make an iterative check under each category user, groups and others trying to match the user and group credentials on the running process with the target object’s permissions in order to grant or deny access. Keep in mind that this is not the service account in OpenShift. This is the system’s user that runs the container process if we want to speak containers.
After kernel 2.2 the concept of capabilities was introduced. In order to have more flexibility and enable the use of superuser or root features in a granular way, those super privileges were broken into small pieces that can be enabled or disabled independently. That is what we call capabilities. We can take a deeper look on http://man7.org/linux/man-pages/man7/capabilities.7.html
As an example, let’s say that we have an application that needs special networking configurations. Let’s say that we need to configure one interface, open a port on the system’s firewall, create a NAT rule for that and punt a new custom route on the system’s routing table. But you don’t need to make arbitrary changes to any file in the system. We can set CAP_NET_ADMIN instead of running the process as a privileged one.
Beyond privileges and capabilities we have SELinux and AppArmor that are both kernel security modules that can be added on top of capabilities to get even more fine grained security rules by using access control security policies or program profiles. In addition, we have Seccomp which is a secure computing mode kernel facility that reduces the available system calls to the kernel for a given process.
Finally, adding to all that, we still have interprocess communications, privilege escalation and access to the host namespace when we begin to talk about containers. That is out of scope here at this point but…
How does that translate to containers?
That said, we come back to containers and ask: what are containers again? They are processes segregated by namespaces and cgroups and on that note they have all the same security features described above. So how do we create containers with those security features then?
Let’s first take a look at what is the smallest piece of software that creates the container process: runc. As its definition on the github page says, it’s a tool to spawn and run containers according to the OCI specification. It’s the default choice for OCI runtimes although we have others such as kata containers. In order to use runc, we need to have a file system image and a bundle with the configuration for the process. The short story on the bundle is we must put a json formatted specification for the container where all the configurations will be taken into account. Check this part of it’s documentation: https://github.com/opencontainers/runtime-spec/blob/master/config.md#linux-process
From there we have fields such as apparmorProfile, capabilities or selinuxLabel. We can set user ID, group ID and supplementary group IDs. What tool then automates the process of getting the file system ready and passing down those parameters for us?
We can use podman, for example, for testing or development, running isolated containers or pods. It allows us to do it with special privileges as we show below:
Privileged bash terminal:
sudo podman run –privileged -it registry.access.redhat.com/rhel7/rhel /bin/bash
Process ntpd with privilege to change the system clock:
sudo podman run -d –cap-add SYS_TIME ntpd
Ok. Cool. But when it comes the time to run those containers on Kubernetes or OpenShift how do we configure those capabilities and security features?
Inside the OpenShift platform CRI-O container engine is the one that runs and manages containers. It is compliant with the Kubernetes Container Runtime Interface (CRI). It complies with kubelet rules in order to give it a standard interface to call the container engine and all the magic is done automating runc behind the scenes while allowing other features to be developed under the engine itself.
Following the workflow above to run a pod in Kubernetes or OpenShift, we’ll first make an API call to kubernetes asking to run a particular Pod. It could come from an oc command or from code, for example. Then the API will process that request and store it in etcd; the pod will be scheduled for a specific node since the scheduler watches those events; finally, kubelet, in that node, will read that event and call the container runtime (CRI-O) with all the parameters and options requested to run the pod. I know it’s very summarized. But the important thing here is that we need to pass parameters down to the API in order to have our Pod with the desired privileges configured. In the example below a new pod gets scheduled to run in node 1.

What goes into that yaml file in order to request those privileges? Two different objects are implemented under the Kubernetes API: PodSecurityContext and SecurityContext. The first one, obviously, related to Pods and the second one related to the specific container. They are part of their respective types. So you can find those fields on Pod and Container Specs on yaml manifests. With that they can be applied to an entire Pod, no matter how many containers are there or to specific containers into that Pod. Then the SecurityContext settings take precedence over the PodSecurityContext ones. You can find the security context source code under https://github.com/kubernetes/api/blob/master/core/v1/types.go.
Here we can find a few examples on how to configure security contexts for Pods. Below I present the first three fields of the SecurityContext object.
type SecurityContext struct {
// The capabilities to add/drop when running containers.
// Defaults to the default set of capabilities granted by the container runtime.
// +optional
Capabilities *Capabilities `json:”capabilities,omitempty” protobuf:”bytes,1,opt,name=capabilities”`
// Run container in privileged mode.
// Processes in privileged containers are essentially equivalent to root on the host.
// Defaults to false.
// +optional
Privileged *bool `json:”privileged,omitempty” protobuf:”varint,2,opt,name=privileged”`
// The SELinux context to be applied to the container.
// If unspecified, the container runtime will allocate a random SELinux context for each
// container. May also be set in PodSecurityContext. If set in both SecurityContext and
// PodSecurityContext, the value specified in SecurityContext takes precedence.
// +optional
SELinuxOptions *SELinuxOptions `json:”seLinuxOptions,omitempty” protobuf:”bytes,3,opt,name=seLinuxOptions”`
<…>
}

Here is an example of a yaml manifest configuration with capabilities on securityContext field:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-4
spec:
containers:
– name: sec-ctx-4
image: gcr.io/google-samples/node-hello:1.0
securityContext:
capabilities:
add: [“NET_ADMIN”, “SYS_TIME”]

Ok. Now what? We have an idea on how to give super powers to a container or Pod even though they may be RBAC restricted. How can we control this behavior?
Security Context Constraints
Finally we get back to our main subject. How can I make sure that a specific Pod or Container doesn’t request more than what it should request in terms of process privileges and not only OpenShift object privileges under it’s API?
That’s the role of Security Context Constraints. To check beforehand if the system can pass that pod or container configuration request, with privileged or custom security context, further onto the cluster API that will end up running a powerful container process. To have a taste on what a SCC looks like here is an example:
oc get scc restricted -o yaml

allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities: null
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
type: MustRunAs
groups:
– system:authenticated
kind: SecurityContextConstraints
metadata:
annotations:
kubernetes.io/description: restricted denies access to all host features and requires
pods to be run with a UID, and SELinux context that are allocated to the namespace. This
is the most restrictive SCC and it is used by default for authenticated users.
creationTimestamp: “2020-02-08T17:25:39Z”
generation: 1
name: restricted
resourceVersion: “8237”
selfLink: /apis/security.openshift.io/v1/securitycontextconstraints/restricted
uid: 190ef798-af35-40b9-a980-0d369369a385
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities:
– KILL
– MKNOD
– SETUID
– SETGID
runAsUser:
type: MustRunAsRange
seLinuxContext:
type: MustRunAs
supplementalGroups:
type: RunAsAny
users: []
volumes:
– configMap
– downwardAPI
– emptyDir
– persistentVolumeClaim
– projected
– secret

That above is the default SCC that has pretty basic permissions and will accept Pod configurations that don’t request special security contexts. Just by looking at the name of the fields we can have an idea on how many features it can verify before letting a workload with containers pass by the API and get scheduled.
In conclusion, we have at hand a tool that allows an OpenShift admin to decide whether an entire pod can run in privileged mode, have special capabilities, access directories and volumes on the host namespace, use special SELinux contexts, what ID the container process can use among other features before the Pod gets requested to the API and passed to the container runtime process.
In the next blog posts we’ll explore each field of an SCC, explore their underlying Linux technology, present the prebuilt ones and understand their relationship with the RBAC system to grant or deny special security contexts declared under Pod’s or container’s Spec field. Stay tuned!
The post Introduction to Security Contexts and SCCs appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift

Ryzen Mobile 4000: Das kann AMDs Renoir

Mit optimierten Zen-2-Kernen und Vega-Grafikeinheit, sparsamem LPDDR4X-Speicher im 7-nm-Verfahren: Noch nie war ein Notebook-Chip besser gegen Intel aufgestellt als AMDs Renoir alias Ryzen Mobile 4000. Ein Bericht von Marc Sauter (AMD Zen, Prozessor)
Quelle: Golem