Overcoming the challenges of hybrid multicloud IT management

Hybrid cloud environments have become the norm among most businesses. In our latest Voice of the Enterprise: Digital Pulse, Budgets and Outlook 2019 survey, we asked 916 IT professionals to describe their overall IT approach and strategy. Among the respondents, we found that

62 percent said they now use a hybrid IT environment with integrated on-premises systems and off-premises cloud/hosted resources
17percent said their IT environment is completely off-premises, distributed across various SaaS, IaaS and PaaS clouds
Nine percent are using a hybrid cloud with limited or no interoperability between the on-premises and off-premises environments
Four percent are building an on-premises environment only
Just eight percent claimed clouds are not an important part of their IT strategy.

It’s now safe to say that interoperable on-premises infrastructure and hybrid multiclouds are common enterprise IT architectures, and they are likely to remain so for several years to come. However, getting these distributed environments to operate efficiently and effectively to better serve the business needs of enterprises is another matter.
Navigating hybrid multicloud management
There is no industry standard for hybrid multicloud architecture or management. Workload placement across multiple distributed execution venues is highly subjective to each enterprise and depends on a range of factors including the value to risk ratio tied to workloads, lifecycle stages, usage patterns, application behavior characteristics, data criticality, data sovereignty, the price, performance and risk characteristics of various execution venues, and so on. So too, the hybrid offerings from cloud service providers vary. Each has well-designed and comprehensive hybrid cloud architectures, but they differ in their design, deployment and management models.
As workloads, data and processes shift across diverse and disparate execution venues (e.g., on-premises infrastructure, managed services, clouds), there will be a need for a new approach to hybrid multicloud management – one that requires a uniform means for provisioning, access control, capacity management, performance analysis, billing and cost control, among others. Enterprises will demand that IT vendors craft a holistic platform to allocate workloads strategically to the best execution venue, and do so while managing business continuity across hybrid IT architecture. This will drive the development of a new generation of cloud management technology we refer to as unified infrastructure management (UIM) platforms.
Tackling next-generation challenges with unified infrastructure management
To tackle the challenges of hybrid IT management, a unified infrastructure management platform needs to be able to answer two fundamental questions and execute upon the findings. The first: “Under what conditions do I put what workloads on what execution venues?” This requires an understanding of workload characteristics and the capabilities of execution venues (beyond cost) to intelligently map workloads to their best execution venues and to migrate, monitor and manage workloads across execution venues.
To manage data and logic placement across distributed architecture, the UIM must also be able to answer and execute upon the second question: “Do I move the logic to the data or the data to the logic?” For example, in the case of core/fog/edge IoT architecture, the issue is how to intelligently and dynamically choose and shift where logic is computed – i.e., in the core (cloud), in the fog (nodes), on the edge (devices) – and how to minimize data in motion.
Such decisions require detailed analysis of many complex variables beyond cost. We believe that next-generation unified infrastructure management platforms will gradually be equipped with various open source technologies to answer such questions and provide a means to execute the following capabilities:

Analyze and compare the economics (price and performance characteristics) of various execution venues.
Analyze workloads to determine their performance characteristics and operational requirements.
Automate the provisioning of compute, storage, network, security, application stacks and data.
Intelligently deploy workloads and services determined by economic analysis as well as any compliance policies required of on-premises infrastructure, managed services, and private and public clouds.
Intelligently redeploy workloads to other execution venues when venue or workload characteristics change.
Interoperate with, build and deploy container and microservices coding platforms to coordinate cloud services for automated iterations of application and workload deployments.
Manage security, identity authentication and access control for administrators, tenants and user accounts.
Provide financial metering, reporting and chargeback/viewback by cloud, tenant, user, application, compute and other consumption-based services.
Orchestrate events and manage runtime execution and performance of all venues, and enact policies to automate scaling, bursting, high availability and disaster recovery.
Maintain a service library that includes operating system images, databases, middleware, message busses, load balancers and servers.
Control and dynamically allocate network resources in response to the transmission, latency and security requirements of specific data and workloads.

Trusted IT and cloud-enabling technology vendors are now crafting such unified infrastructure management platforms. Going forward, unified infrastructure management platforms may also include and/or integrate with orchestration tools to execute and synchronize business processes that span execution venues, analytics that pave the way for predictive and prescriptive deterministic reasoning, and even autonomic self-healing capabilities empowered by machine learning and artificial intelligence (AI) technologies that can also expose insights for continuous improvement. Highly valued vendors will be those that embrace this opportunity and can ensure that the advantages and business agility promised of hybrid multicloud IT architecture indeed become reality. Additional insights and improved strategies are required to reap the full range of benefits of a true hybrid IT environment.
Learn more about hybrid and multicloud strategy for the enterprise.
The post Overcoming the challenges of hybrid multicloud IT management appeared first on Cloud computing news.
Quelle: Thoughts on Cloud

OpenShift Scale-CI: Part 2 – Deep Dive

OpenShift Scale-CI: Part 2 – Deep Dive

In part one of the series, we saw how the Red Hat OpenShift Scale-CI evolved. In this post, we will look at the various components of Scale-CI. OpenShift Scale-CI is not a single tool, it’s a framework which orchestrates a bunch of tools to help analyze and improve the Scalability and Performance of OpenShift. It does this by:

Loading thousands of objects on a large scale production cluster to stress the control plane ( ApiServer, Controller, Etcd ), kubelet and other system components.
Running various benchmarks, gathering performance data during the run and visualizing the data to identify bottlenecks and tuning opportunities.
Repeating the scale tests on OpenShift deployed on various clouds including AWS, Azure, OpenStack and GCP to monitor Performance and Scalability regressions.

The motivation behind building Scale-CI is also to onboard and enable other teams to take advantage of the automation, tooling and hardware to see how well their application/component performs at scale instead of building and maintaining their own clusters, infrastructure and tools.
Architecture

Scale-CI comprises of the following components:

Scale-CI pipeline: Acts as the orchestrator for all tools to deploy, configure, monitor and diagnose OpenShift. This is the entrypoint for onboarding workloads which will be run automatically at scale.
Workloads: Tools an OpenShift cluster and runs OpenShift Performance and Scale workloads.
Scale-CI deploy: Collection of playbooks and scripts to provision and install OpenShift on various cloud platforms including AWS, Azure, GCP and OpenStack. It also supports scaling and upgrading the cluster to the desired payload.
Images: Hosts the container images source files for Scale-CI. The builds are triggered by commits into this repo. In addition we will periodically trigger rebuilds when tools in dependent containers are built and published.
Scale-CI graphshift: Deploys mutable Grafana with Performance Analysis Dashboards for OpenShift.

Scale-CI diagnosis: Running OpenShift at high scale is expensive. There is a chance that a particular config, component logs or metrics need to be looked at after the cluster has been terminated to find an issue during a particular scale test run on the cluster. This motivated us to create this tool. It helps in debugging the issues by capturing  Prometheus database from the running Prometheus pods to the local file system. This can be used to look at the metrics later by running Prometheus locally with the backed up DB. It also captures OpenShift cluster information including all the operator managed components using must-gather.
The Performance and Scalability team at Red Hat have built a number of other tools to help with our work:

Cluster Loader: Deploys large numbers of various objects to a cluster, which creates user-defined cluster objects. Build, configure, and run Cluster Loader to measure the performance metrics of your OpenShift Container Platform deployment in various cluster states. It is part of both OKD and upstream kubernetes.
Pbench:  This is a  benchmarking and performance analysis framework which runs benchmarks across one or more systems, while properly collecting the configuration of those systems, their logs and specified telemetry from various tools (sar, vmstat, perf, etc.). The collected data is shipped off to the Pbench server which is responsible for archiving the resulting tar balls, indexing them and unpacking them for display.

A typical Scale-CI run installs OpenShift on a chosen cloud provider, sets up tooling to run a pbench-agent DaemonSet, runs Conformance (e2e test suite ) to check the sanity of the cluster, scales up the cluster to the desired node count, runs various scale tests focusing on Control plane density, kubelet density, HTTP/Router, SDN, Storage, Logging, Monitoring and Cluster Limits. It also runs a Baseline workload which collects configuration and performance data on an idle cluster to know how the product is moving across various OpenShift releases. The results are shipped to the Pbench server after processing for analysis and long term storage. The results are scraped to generate a machine readable output ( JSON ) of the metrics to compare with previous runs to pass/fail the Job and send a green/red signal. 
For large and long running clusters, components like Prometheus need more disk and resources including CPU and memory. Instead of using bigger worker nodes, we create infrastructure nodes with huge amounts of disk, CPU and memory using custom MachineSets and modify the node selector to ensure that the components including Prometheus, Logging, Router and Registry run on the infrastructure nodes. This is part of the day two operation and is needed for large scale clusters.
Adding a new workload to the framework or making changes to the existing Jobs is as simple as creating a PR using the sample templates provided in the repositories.. The Scale-CI watcher picks up the change after the PR gets merged and updates the respective Jobs.
 

We spoke about the Automated OpenShift/Kubernetes Scalability testing at KubeCon + CloudNativeCon North America 2018. The slides are here, you can also watch the presentation online at https://youtu.be/37naDDcmDo4.
We recently scale tested OpenShift 4.1 before the general availability, Keep an eye out for our next blog OpenShift Scale-CI: part 3, which will have the highlights of the OpenShift  4.1 scalability run. Like always, any feedback or contributions are most welcome.
The post OpenShift Scale-CI: Part 2 – Deep Dive appeared first on Red Hat OpenShift Blog.
Quelle: OpenShift

USAA and Google Cloud work together to speed auto claims

When U.S. military personnel and their families want to obtain insurance or financial solutions, one of the first places they turn to is USAA. For nearly 100 years, this San Antonio, Texas-based financial services organization has helped families facilitate their financial security, with a focus on building deep relationships with its members. Another focus for USAA has been using technology to deliver easy and convenient digital experiences. In fact, USAA was the first financial services company to introduce mobile deposits, allowing its members to deposit their checks wherever they are. And, the company was a pioneer in the use of drones and aerial imagery to assess property damage in hard-to-access areas after major natural disasters, such as hurricanes. Now, in collaboration with Google Cloud, USAA is using machine learning (ML) to speed up the auto claims process, obtaining near-real-time damage estimates from digital images. We’re proud to work with USAA on their digital solutions, helping create faster and more cost-efficient estimates within the claims process.Traditionally, assessing auto damage claims has been a time-intensive, laborious process. A claims adjuster needs to physically inspect each vehicle, determine the parts that need replacement, and understand the labor required for repair. They then submit a report to the insurer, who must approve the estimate before sharing it with the repair shop and car owner. This process can sometimes take weeks from beginning to end.Digitizing and moving to the cloud greatly streamlines this process. The appraiser takes photos of the car and uploads them to Google Cloud where, by applying machine learning, they can identify the damage in real-time and send that data to USAA and then into the Mitchell Estimating platform. Mitchell’s software searches parts and labor databases, applies the information to their estimate, and returns that estimate directly to USAA. Appraisers will review the estimate and make adjustments as needed. The new, cloud-based process will allow for faster and more cost-efficient estimates.”This technology now gives claims teams the opportunity to focus more deeply on connecting and providing guidance to our members, during what can often be a trying time,” said Ramon Lopez, USAA VP Innovation. Early results are promising, as USAA has been able to predict high-level damage across a diverse vehicle set with great accuracy. We’ve also worked with USAA employees across the evolving digital environment, providing direct engineer-to-engineer collaboration with our own ML experts. USAA plans to add even more features to streamline the claims process for its customers, so that eventually, the majority of low-complexity assessments will be touchless—allowing appraisers to focus on more complex cases that best utilize their expertise.”Innovation at USAA is not about technology for the sake of technology,” said Lopez. “It’s about enabling our businesses to keep pace with member expectations and needs, all while maintaining the level of service our members deserve.”
Quelle: Google Cloud Platform

Introducing cost-effective increment snapshots of Azure managed disks in preview

The preview of incremental snapshots of Azure managed disks is now available. Incremental snapshots are a cost-effective point-in-time backup of managed disks. Unlike current snapshots, which are billed for the full size, incremental snapshots are billed for the delta changes to disks since the last snapshot. They are always stored on the most cost-effective storage i.e., standard HDD irrespective of the storage type of the parent disks. Additionally, for increased reliability, they are stored on Zone redundant storage (ZRS) by default in regions that support ZRS. They cannot be stored on premium storage. If you are using current snapshots on premium storage to scale up virtual machine deployments, we recommend you to use custom images on standard storage in Shared Image Gallery. It will help you to achieve a more massive scale with lower cost. 

Incremental snapshots provide a differential capability, a unique capability available only in Azure managed disks. It enables customers and independent solution vendors (ISV) to build backup and disaster recovery solutions for managed disks. It allows you to get the changes between two snapshots of the same disk, thus copying only changed data between two snapshots across regions, reducing time and cost for backup and disaster recovery. For example, you can download the first incremental snapshot as a base blob in another region. For the subsequent incremental snapshots, you can copy only the changes since the last snapshot to the base blob. After copying the changes, you can take snapshots on the base blob that represent your point in time backup of the disk in another region. You can restore your disk either from the base blob or from a snapshot on the base blob in another region.

Incremental snapshots inherit all the compelling capabilities of current snapshots. They have a lifetime independent of their parent managed disks, making them available even when the parent managed disk is deleted. Moreover, they are accessible instantaneously meaning you can read the underlying VHD of incremental snapshots or restore disks from them as soon as they are created.

You can create incremental snapshots by setting the new incremental property to true.

az snapshot create
-g yourResourceGroupName
-n yourSnapshotName
-l westcentralus
–source subscriptions/yourSubscriptionId/resourceGroups/yourResourceGroupName/providers/Microsoft.Compute/disks/yourDiskName
–incremental

You can identify incremental snapshots of the same disk by using the SourceResourceId and SourceUniqueId properties of snapshots. SourceResourceId is the Azure Resource Manager (ARM) resource Id of the parent disk. SourceUniqueId is the value inherited from the UniqueId property of the disk. If you delete a disk and then create a disk with the same name, the value of the UniqueId property will change.

az snapshot show
-g yourResourceGroupName
-n yourSnapshotName
–query [creationData.sourceResourceId] -o tsv

az snapshot show
-g yourResourceGroupName
-n yourSnapshotName
–query [creationData.sourceUniqueId] -o tsv

Availability and pricing

You can now create incremental snapshots and generate SAS URI for reading the underlying data in West Central US region via Azure Compute Rest API version 2019-03-01. You can also use the latest Azure PowerShell SDK, .Net SDK and CLI to perform these operations. The differential capability is supported via the pre-released versions of .NET, Python, and CPP Storage SDKs only. Please email AzureDisks@microsoft.com to get access to these SDKs. We are going to add support for other SDKs and other regions soon.

The per GB pricing of incremental snapshots is the same as the current full snapshots. You can visit the managed disk pricing for more details about the snapshot pricing.

Getting started

Please email AzureDisks@microsoft.com to get access to the preview. 
Create an incremental snapshot using CLI.
Create an incremental snapshot using PowerShell.

Quelle: Azure