Azure.Source – Volume 78

Preview | News & updates | Technical content | Azure shows | Events | Customers, partners, and industries

Now in preview

Hybrid storage performance comes to Azure

When it comes to adding a performance tier between compute and file storage, Avere Systems has led the way with its high-performance caching appliance known as the Avere FXT Edge Filer. Last week at NAB, attendees will got a first look at the new Azure FXT Edge Filer, now with even more performance, memory, SSD, and support for Azure Blob. Since Microsoft’s acquisition of Avere last March, we’ve been working to provide an exciting combination of performance and efficiency to support hybrid storage architectures with the Avere appliance technology. We are currently previewing the FXT 6600 model at customer sites, with a second FXT 6400 model becoming available with general availability.

News and updates

Want to evaluate your cloud analytics provider? Here are the three questions to ask.

In February, an independent study by GigaOm compared Azure SQL Data Warehouse, Amazon Redshift, and Google BigQuery using the highly recognized TPC-H benchmark. They found that Azure SQL Data Warehouse is up to 14 times faster and costs 94 percent less than other cloud providers. And today, we are pleased to announce that in GigaOm’s second benchmark report, this time with the equally important TPC-DS benchmark, Azure SQL Data Warehouse is again the industry leader. Not Amazon Redshift. Not Google BigQuery. These results prove that Azure is the best place for all your analytics.

Introducing the App Service Migration Assistant for ASP.NET applications

In June 2018, we released the App Service Migration Assessment Tool. The Assessment Tool was designed to help customers quickly and easily assess whether a site could be moved to Azure App Service by scanning an externally accessible (HTTP) endpoint. Today we’re pleased to announce the release of an updated version, the App Service Migration Assistant! The new version helps customers and partners move sites identified by the assessment tool by quickly and easily migrating ASP.Net sites to App Service. Read this blog to learn more about the tool and begin your migration.

Expanding Azure IoT certification service to support Azure IoT Edge devices

In December 2018, Microsoft launched the Azure IoT certification service, a web-based test automation workflow to streamline the certification process through self-serve tools. Now we are taking steps to expand the service to  also support Azure IoT Edge Device certification. An Azure IoT Edge device is a device comprised of three key components: IoT Edge modules, IoT Edge runtime, and a cloud-based interface. Learn more about these three components in this blog explaining IoT Edge.

Azure Updates

Learn about important Azure product updates, roadmap, and announcements. Subscribe to notifications to stay informed.

Technical content

Smarter, faster, safer: Azure SQL Data Warehouse is simply unmatched

We want to call attention to the exciting news that Azure SQL Data Warehouse has again outperformed other cloud providers in the most recent GigaOm benchmark report. This is the result of relentless innovation and laser-focused execution on providing new features our customers need, all while reducing prices so customers get industry-leading performance at the best possible value. In this blog, we take a closer look at the technical capabilities of these new features and, most importantly, how you can start using them today.

Azure Security Center exposes crypto miner campaign

Azure Security Center discovered a new cryptocurrency mining operation on Azure customer resources. The operation took advantage of an old version of known open-source CMS, with a known RCE vulnerability as the entry point, and then after using the CRON utility for persistency, it mines “Monero” cryptocurrency using a new compiled binary of the “XMRig” open-source crypto mining tool. Check out our blog for details.

You gotta keep privileges separated

When writing scripts for automation or building out a service, don't run under your own credentials. This creates a single point of failure on you for the service. It's also good practice to separate out concerns between environments. This way even if someone accidentally runs a test command against production, it won't have disastrous results. One recommended approach is to use service principals. An Azure service principal is an identity for use with applications, services, and tools to access Azure resources. Using service principals allows us to assign specific permissions that are limited in scope to precisely what is required so we can minimize the impact if it's compromised! This blog explains how.

How do teams work together on an automated machine learning project?

When it comes to executing a machine learning project in an organization, data scientists, project managers, and business leads need to work together to deploy the best models to meet specific business objectives. A central objective of this step is to identify the key business variables that the analysis needs to predict. We refer to these variables as the model targets, and we use the metrics associated with them to determine the success of the project. In this use case, we look at how a data scientist, project manager, and business lead at a retail grocer can leverage automated machine learning and Azure Machine Learning service to reduce product overstock.

How to Use Azure Pipeline Task and Job Conditions

An Azure Pipeline Job is a grouping of tasks that run sequentially on the same target. In many cases, you will want to only execute a task or a job if a specific condition has been met. Azure Pipeline conditions allow us to define conditions under which a task or job will execute. In this blog, we will detail a common situation in which pipeline conditions are helpful, the configuration of this condition, and what documentation links offer more information.

Moving your database to Azure

In this session we show you how we migrated an on-premises MongoDB database to Azure Cosmos DB and SQL Server database to an Azure SQL Server Managed Instance. You’ll learn about data preparation decisions, performing the migration, and ensuring your application has zero downtime while switching over to the cloud hosted database providers.

Azure Stack IaaS – part seven of a series

Most apps get delivered by a team. When your team delivers the app through virtual machine (VMs), it is important to coordinate efforts. Born in the cloud to serve teams from all over the world, Azure and Azure Stack have some handy capabilities to help you coordinate VM operations across your team. In this blog, we look at features such as single sign-in, role-based access, and collaborating with people outside your organization.

How to accelerate DevOps with Machine Learning lifecycle management

DevOps is the union of people, processes, and products to enable the continuous delivery of value to end users. DevOps for machine learning is about bringing the lifecycle management of DevOps to Machine Learning. Utilizing Machine Learning, DevOps can easily manage, monitor, and version models while simplifying workflows and the collaboration process. Effectively managing the Machine Learning lifecycle is critical for DevOps’ success. And the first piece to machine learning lifecycle management is building your machine learning pipeline or pipelines. We explain how.

How do teams work together on an automated machine learning project?

When it comes to executing a machine learning project in an organization, data scientists, project managers, and business leads need to work together to deploy the best models to meet specific business objectives. A central objective of this step is to identify the key business variables that the analysis needs to predict. We refer to these variables as the model targets, and we use the metrics associated with them to determine the success of the project.

How to stay informed about Azure service issues

Azure Service Health helps you stay informed and take action when Azure service issues like outages and planned maintenance affect you. It provides you with a personalized dashboard that can help you understand issues that may be impacting resources in your Azure subscriptions. For any event, you can get guidance and support, share details with your colleagues, and receive issue updates. We’ve posted a new video series to help you learn how to use Azure Service Health and ensure you stay on top of service issues.

How to stay on top of Azure best practices

Optimizing your cloud workloads can seem like a complex and daunting task. We created Azure Advisor, a personalized guide to Azure best practices, to make it easier to get the most out of Azure.

How Skype modernized its backend infrastructure using Azure Cosmos DB

Founded in 2003, Skype has grown to become one of the world’s premier communication services, making it simple to share experiences with others wherever they are. Since its acquisition by Microsoft in 2010, Skype has grown to more than four billion total users, more than 300 million monthly active users, and more than 40 million concurrent users. In a three-part series, we discuss how Skype used Azure Cosmos DB to solve real-world challenges.

Azure shows

Episode 274 – Reliability Engineering | The Azure Podcast

David Blank-Edelman, a Senior Cloud Advocate at Microsoft, gives us some great insight into what customers should be thinking about when it comes to the reliability of their cloud applications.

HTML5 audio not supported

Using the new Basic Process in Azure DevOps | DevOps Lab

In this episode, Abel chats with Dan Hellem to walk through the details of the new Basic process in Azure DevOps and learn how it works.

Redis Edge on Azure IoT Edge | Internet of Things Show

RedisEdge from Redis Labs is a purpose-built database for the demanding conditions at the IoT edge. It has the ability to ingest millions of writes per second with <1ms latency, has a 5MB footprint, and is available on ARM32, ARM64, and x64 architectures.

Azure Monitor action groups | Azure Friday

Azure Monitor action groups enable you to define a list of actions to execute when an alert is triggered. In this episode, we demonstrate how to configure a Service Health alert to use an action group.

How to test Azure Functions | Azure Tips & Tricks

In this edition of Azure Tips and Tricks, learn how to test Azure Functions with unit and integration test methods.

Management Groups, Policy, and Blueprints in Azure Governance | Microsoft Mechanics – Azure

The latest on governing Azure subscriptions for Cloud Architects or Ops Managers. Satya Vel, from the Azure Governance Team, demonstrates Microsoft's approach to Azure Governance overall, which now includes more granular control of policy across different apps and departments in your organization with management groups. You'll also see the new Azure Blueprint templates that simplify setting up your environment to meet specific compliance requirements such as ISO, as well as easier tracking of policy changes and their impact. We'll show you how you can now apply governance capabilities across your Azure Kubernetes workloads.

Party with Palermo at the Microsoft MVP Summit | Azure DevOps Podcast

This week Jeffrey Palermo has a special episode for you all! It is recorded live, from the night before the Microsoft MVP Summit, at Jeffrey’s annual “Party with Palermo!” get-together for MVPs.

HTML5 audio not supported

Episode 6 – AI Forensics and Pharaoh Hounds | AzureABILITY Podcast

AI/Machine Learning pioneer Andre Magni visits the pod to talk computer intelligence; from Microsoft's AI mission (to amplify human ingenuity with intelligent technology) to data-curation gotchas and modelling pitfalls to identifying dead bodies using AI.

HTML5 audio not supported

Events

Countdown for Microsoft Build: Things to Do Part 1

Get ready to see the awesome sights of Seattle while you're at Microsoft Build this May, including the Museum of Pop Culture and Wings over Washington.

Microsoft at SAP Sapphire NOW 2019: A trusted path to cloud innovation

In a few weeks, more than 22,000 people from around the globe will converge in Orlando, Florida May 7-9  for the SAP Sapphire NOW and ASUG Annual Conference. Each year, the event brings together thought leaders across industries to find innovative ways to solve common challenges, unlock new opportunities, and take advantage of emerging technologies that are changing the business landscape as we know it. This year, Microsoft has elevated its presence with engaging in-booth experiences and informative sessions that will educate, intrigue, and inspire attendees as they take the next step in their digital transformation journey.

Customers, partners, and industries

Bitnami Apache Airflow Multi-Tier now available in Azure Marketplace

A few months ago, we released a blog post that provided guidance on how to deploy Apache Airflow on Azure. The template in the blog provided a good quick start solution for anyone looking to quickly run and deploy Apache Airflow on Azure in sequential executor mode for testing and proof of concept study.

Leveraging AI and digital twins to transform manufacturing with Sight Machine

Azure has mastered ingesting and storing manufacturing data with services such as Azure IoT Hub and Azure Data Lake, and now our partner Sight Machine has solved for the other huge challenge: data variety. Sight Machine on Azure is a leading AI-enabled analytics platform that enables manufacturers to normalize and contextualize plant floor data in real-time. The creation of these digital twins allows them to find new insights, transform operations, and unlock new value.

Azure AI does that?

Whether you’re just starting off in tech, building, managing, or deploying apps, gathering and analyzing data, or solving global issues —anyone can benefit from using cloud technology. In this post we’ve gathered five cool examples of innovative artificial intelligence (AI) to showcase how you can be a catalyst for real change.

Azure Front Door gets WAF support, a new Premium plan for Azure Functions & changes to Azure alerts | Azure This Week – A Cloud Guru

This time on Azure This Week, Lars covers Azure Front Door which gets Web Application Firewall support, Azure Functions get a new Premium plan for more serverless action, Azure alerts get an overhaul, and a new series – "Azure Fireside Chats" launches on A Cloud Guru.

Quelle: Azure

QnA Maker updates – April 2019

The QnA Maker service lets you easily create and manage a knowledge base from your data, including FAQ pages, support URLs, PDFs, and doc files. You can test and publish your knowledge base and then connect it to a bot using a bot framework sample or template.
Quelle: Azure

Evaporating a data lake: Otto Group’s lessons learned migrating a Hadoop infrastructure to GCP

Editor’s note: Today we’re hearing from Google Cloud Platform (GCP) customer Otto Group data.works GmbH, a services organization holding one of the largest retail user data pools in the German-speaking area. They offer e-commerce and logistics SaaS solutions and conduct R&D with sophisticated AI applications for Otto Group subsidiaries and third parties. Read on to learn how Otto Group data.works GmbH recently migrated its on-premises big data Hadoop data lake to GCP and the lessons they learned along the way.At Otto Group, our business intelligence unit decided to migrate our on-premises Hadoop data lake to GCP. Using managed cloud services for application development, machine learning, data storage and transformation instead of hosting everything on-premises has become popular for tech-focused business intelligence teams like ours. But actually migrating existing on-premises data warehouses and surrounding team processes to managed cloud services brings serious technical and organizational challenges.The Hadoop data lake and included data warehouses are essential to our e-commerce business. Otto Group BI aggregates anonymized data like clickstreams, user interactions, product information, CRM data, and order transactions of more than 70 online shops of the Otto Group.On top of this unique data pool, our agile, autonomous, and interdisciplinary product teams—consisting of data engineers, software engineers, and data scientists—develop machine learning-based recommendations, product image recognition, and personalization services. The many online retailers of the Otto Group, such as otto.de and aboutyou.de, integrate our services into their shops to enhance customer experience.In this blog post, we’ll discuss the motivation that drove us to consider moving to a cloud provider, how we evaluated different cloud providers, why we decided on GCP, the strategy we use to move our on-premises Hadoop data lake and team processes to GCP, and what we have learned so far.Before the cloud: an on-premises Hadoop data lakeWe started with an on-premises infrastructure consisting of a Hadoop cluster-based data lake design, as shown below. We used the Hadoop distributed file system (HDFS) to stage click events, product information, transaction and customer data from those 70 online shops, never deleting raw data.Overview of the previous on-premises data lakeFrom there, pipelines of MapReduce jobs, Spark jobs, and Hive queries clean, filter, join, and aggregate the data into hundreds of relational Hive database tables at various levels of abstraction. That let us offer harmonized views of commonly used data items in a data hub to our product teams. On top of this data hub, the teams’ data engineers, scientists, and analysts independently performed further aggregations to produce their own application-specific data marts.Our purpose-built open-source Hadoop job orchestrator Schedoscope does the declarative, data-driven scheduling of these pipelines, as well as managing metadata and data lineage.In addition, this infrastructure used a Redis cache and an Exasol EXAsolution main-memory relational database cluster for key-based lookup in web services and fast analytical data access, respectively. Schedoscope seamlessly mirrors Hive tables to the Redis cache and the Exasol databases as Hadoop processing finishes.Our data scientists ran their iPython notebooks and trained their models on a cluster of GPU-adorned compute nodes. These models were then usually deployed as dockerized Python web services on virtual machines offered by a traditional hosting provider.What was good…This on-premises setup allowed us to quickly grow a large and unique data lake. With Schedoscope’s support for iterative, lean-ops rollout of new and modified data processing pipelines, we could operate this data lake with a small team. We developed sophisticated machine learning-driven web services for the Otto Group shops. The shops were able to cluster purchase and return history of customers for fit prediction; get improved search results through intelligent search term expansion; sort product lists in a revenue-optimizing way; filter product reviews by topic and sentiment; and more….And what wasn’tHowever, as the data volume, number of data sources, and services connected to our data lake grew, we ran into various pain points that were hampering our agility, including lack of team autonomy, operational complexity, technology limitations, costs, and more.Seeing the allure of the cloudLet’s go through each of those pain points, along with how cloud could help us solve them.Limited team autonomy: A central Hadoop cluster running dependent data pipelines does not lend itself well to multi-tenancy. Product teams constantly needed to coordinate—in particular with the infrastructure team responsible for operating the cluster. This not only created organizational bottlenecks that limited productivity; it also worked directly against the very autonomy our product teams are supposed to have. The need to share resources led to the situation that teams could not take full responsibility for their services and pipelines from development, to deployment, to monitoring. This created even more pressure on the infrastructure team. Cloud platforms, on the other hand, allow teams to autonomously launch and destroy infrastructure components via API calls, without affecting other teams and without having to pass through a dedicated team managing centrally shared infrastructure.Operational complexity: Operating Hadoop clusters and compute node clusters as well as external database systems and caches created significant operational complexity. We had to operate and monitor not only our products and data pipelines, but also the Hadoop cluster, operating system, and hardware. The cloud offers managed services for data pipelines, storing data, and web services, so we do not need to operate at the hardware, operating system, and cluster technology level.Limited tech stack: Technologically, we were limited with the Hadoop offering. While our teams could achieve a lot with Hive, MapReduce, and Spark jobs, we often felt that our teams couldn’t use the best technology for the product but had to fit a design into rigid organizational and technology constraints. The cloud offers a variety of managed data stores like BigQuery or Cloud Storage, data processing services like Cloud Dataflow and Cloud Composer, and application platforms like App Engine, plus it’s constantly adding new ones. Compared to the Hadoop stack, this could significantly expand the resources for our teams to design the right solution.Mismatched languages and frameworks: Python machine learning frameworks and web services are usually not run on YARN. Hive and HDFS are not well-suited for interactive or random data access. For a data scientist to reasonably work with data in a Hadoop cluster, Hive tables must be synced to external data stores, adding more complexity. By offering numerous kinds of data stores suitable for analytics, random access, and batch processing, as well as by separating data processing from data storage, cloud platforms make it easier to process and use data in different contexts with different frameworks.Emerging stream processing: We started tapping into more streaming data sources, but this was at odds with Hadoop’s batch-oriented approach. We had to deploy a clustered message broker—Kafka—for persisting data streams. While it is possible to run Spark streaming on YARN and connect to Kafka, we found Flink more suitable as a streaming-native processing framework, which only added another cluster and layer of complexity. Cloud platforms offer managed message brokers as well as managed stream processing frameworks.Expansion velocity: The traditional enterprise procurement process we had to follow made adding nodes to the cluster time- and resource-consuming. It was common that we had to wait three to four months from RFP and purchase order to delivery and operations. With a cloud platform, infrastructure can be added within minutes by API calls. The challenge with cloud is to set up enterprise billing processes so that varying invoices can be handled every month without the constant involvement of procurement departments. However, this challenge has to be solved only once.Expansion costs: A natural reaction to slow enterprise procurement processes is to avoid having to go through them too often. Slow processes mean that team members tend to wait and batch demand for new nodes into larger orders. Larger orders not only increase the corporate politics that come along with them, but also reduce the rate of innovation, as large-node order volumes discourage teams from building (and possibly later scratching) prototypes of new and intrinsically immature ideas. The cloud lets us avoid hanging on to infrastructure we no longer need, freeing us from such inhibitions. Moreover, many frameworks offered by cloud platforms support autoscaling and automating expansion, so expansion naturally follows as new use cases arise.Starting the cloud evaluation processGiven this potential, we started to evaluate the major cloud platforms in April 2017. We decided to move the Otto Group BI data lake to GCP about six months later. We effectively started the migration in January 2018, and finished migrating by February 2019.Our evaluation included three main areas of focus:1. Technology. We created a test use case to evaluate provider technology stacks: building customer segments based on streaming web analytics data using customer search terms.Our on-premises infrastructure team implemented this use case with the managed services of the cloud providers under evaluation (on top of doing its day-to-day business). Our product teams were involved, too, via three multi-day hackathons where they evaluated the tech stacks from their perspective and quickly developed an appetite for cloud technology.Additionally, the infrastructure team kept product teams updated regularly with technology demos and presentations.As the result of this evaluation, we favored GCP. In particular, we liked:The variety of managed data stores offered—especially BigQuery and Cloud Bigtable—and their simplicity of operation;Cloud Dataflow, a fully managed data processing framework that supports event time-driven stream processing as well as batch processing in a unified manner;Google Kubernetes Engine (GKE), a managed distributed container orchestration system, making deployment of Docker-based web services simple; andCloudML Engine as a managed Tensorflow runtime, and the various GPU and TPU options for machine learning.2. Privacy. We involved our legal department early to understand the ramifications of moving privacy-sensitive data from our data lake to the cloud.We now encrypt and anonymize more data fields than was needed on-premises. With the move to streaming data and increased encryption needs, we also ported the existing central encryption clearinghouse to a streaming architecture in the cloud. (The on-premises implementation of the clearinghouse had reached its scaling limit and needed a redesign anyway.)3. Cost. We did not focus on pricing between different cloud providers. Rather, we compared cloud cost estimates against our current on-premises costs. In this regard, we found it important to not just spec out a comparable Hadoop cluster with virtual machines in the cloud and then compare costs. We wanted managed services to reduce our many pain points, not just migrate these pain points to the cloud.We wanted managed services to reduce our many pain points, not just migrate these pain points to the cloud.Instead of comparing a Hadoop cluster against virtual machines in the cloud, we compared our on-premises cluster against native solution designs for the managed services. It was more difficult to come up with realistic cost estimates, since the designs hadn’t been implemented yet. But extrapolating from our experiences with our test use case, we were confident that cloud costs would not exceed our on-premises costs, even after applying solid risk margins.Now that we’ve finished the migration, we can say that this is exactly what happened: We are not exceeding on-premises costs. This we already consider a big win. We not only have development velocity, but the operational stability of the product teams has increased noticeably and so has the performance of their products. Also, the product teams have focused so far on migrating their products and not yet on optimizing costs, so we expect our cloud costs to go further below on-premises costs as time goes on.Going to the cloud: Moving the data lake to GCPThere were a few areas to tackle when we started moving our infrastructure to GCP.1. SecurityOne early goal was to establish a state-of-the-art security posture. We had to continue to fulfill the corporate security guidelines of Otto Group, while at the same time granting a large degree of autonomy to the product teams to create their own cloud infrastructure and eliminate the collaboration pain points.As a balance between a high security standard and the restrictiveness it implies and team autonomy, we came up with the motto of “access frugality.” Teams can work freely in their GCP projects. They can independently create infrastructure like GKE clusters or use managed services such as Cloud Dataflow or Cloud Functions as they like. But they are also expected to be restrictive about resources like IAM permissions, external load balancers and public buckets.In order to get the teams started with the cloud migration as soon as possible, we took a gradual approach to security. Building our entire security posture before teams could start with the actual migration was not an option. So we agreed on the most relevant security guidelines with the teams, then established the rest during migration. As the migration proceeded, we also started to deploy processes and tools to enforce these guidelines and provide guardrails for the teams.We came up with the following three main themes that our security tooling should address (see more in the image below):Cloud security monitoring: This theme is about transparency of cloud resources and configuration. The idea is to protect teams from security issues by detecting them early and, in the best-case scenario, preventing them entirely. At the same time, monitoring must allow for exceptions: teams might consciously want to expose resources such as API endpoints publicly without being bothered by security alerts all the time. The key objective of this theme is to instill a profound security awareness in every team member.Cloud cost controls: This theme covers financial aspects of the security posture. Misconfigurations can lead to significant unwanted costs, unintentionally—for example, by allowing BigQuery database queries going rogue over large datasets by not forcing the user to provide partition time restrictions, or because of external financial DDoS attacks in an autoscaling environment.Cloud resource policy enforcement: Security monitoring tools can detect security issues. A consequent next step is to automatically undo obvious misconfigurations as they are detected. As an example, tooling could automatically revert public access on a storage bucket. Again, such tooling must allow for exceptions.Main themes of our security posture.Since there are plenty of security-related features within Google Cloud products and there is a large variety of open-source cloud security tools available, we didn’t want to reinvent the wheel.We decided to make use of GCP’s inherent security policy configuration options and tooling where we could, such as organization policies, IAM conditions and Cloud Security Command Center.As a tool for cloud security monitoring, we evaluated Security Monkey, developed by Netflix. Security Monkey scans cloud resources periodically and alerts on insecure configurations. We chose it for its maturity and the simple extensibility of the framework with custom watchers, auditors and alerters. With these, we implemented security checks we didn’t find in GCP, mostly around the time-to-live (TTL) of Service Account Keys and setting organizational policies for disallowing public data either in BigQuery data sets or Cloud Storage buckets. We set up three different classes of issues: compliance, security, and cost optimization-related issues.Security Monkey is used by all product teams here. Team members use the Security Monkey UI to view the identified potential security issues and either justify them right there or resolve the issue in GCP. We also use a whitelisting feature to filter for known configurations, like default service account IAM bindings, to make sure we only see relevant issues. Excessive issues and alerts in a dynamic environment like GCP can be intimidating and have most certainly a negative overwhelming effect.To improve the transparency of cloud resources, we built several dashboards on top of the data gathered by Security Monkey to visualize the current and historical state of the cloud environment. While adapting Security Monkey to our needs, we found working with the Security Monkey project to be a great experience. We were able to submit several bug fixes and feature pull requests and get them into the master quickly.We are now shifting our focus from passive cloud resource monitoring towards active cloud resource policy enforcement where configuration is automatically changed based on detected security issues. Security Monkey as well as the availability of near real-time audit events on GCP offer a good foundation for this.We believe that cloud security cannot be considered a simple project with a deadline and a result; rather, it is an aspect that must be considered during each phase of development.2. Approaches to data lake migrationThere were a few possible approaches to migrating our Hadoop data lake to GCP. We considered lift-and-shift, simply redeploying our Hadoop cluster to the cloud as a first step. While this was probably the simplest approach, we wouldn’t get benefits with regards to the pain points we had identified. We’d have to get productivity gains later by rearchitecting yet again to improve team autonomy, operational complexity, and tech advancements.At the other end of the spectrum, we could port Hadoop data pipelines to GCP managed services, reading and storing data to and from GCP managed data stores and then turning off the on-premises cluster infrastructure after porting finished. But that would take much longer to see benefits, since the product teams would have to wait until porting ended and all historical data was processed before they could use the new cloud technology.3. Our approach: Mirror data first, then port pipelinesSo we decided on an incremental approach to porting our Hadoop data pipelines while embracing the existence of our on-premises cluster while it was still there.As a first step, we extended Schedoscope with a BigQuery exporter, which makes it easy to write a Hive table partition to BigQuery efficiently in parallel. We also added the capability to perform additional encryptions via the clearinghouse during export to satisfy the needs of our legal department. We then augmented our on-premises Hadoop data pipelines with additional export steps. In this way, our on-premises Hive data was encrypted and mirrored to BigQuery as soon as it had been computed with only a little delay.Second, we exported the historical Hive table partitions over the course of four weeks. As a result, we ended up with a continuously updated mirror of our on-premises data in BigQuery. By summer 2018, our product teams could start porting their web services and models to the cloud, even though the core data pipelines of the data lake were still running on-premises, as shown here:OttoGroup’s gradual data syncIn fact, product teams were able to finish migrating essential services—such as customer review sentiment analysis, intelligent product list sorting, and search term expansion—to GCP by the end of 2018, even before all on-premises data pipelines had been ported,Next, we ported the existing Hadoop data pipelines from Schedoscope to managed GCP services. We went through each data source and no longer staged it to Hadoop but to GCP, either to Cloud Storage for batch sources or to Cloud Pub/Sub for new streaming sources.We then redesigned the data pipelines originating from the data source with GCP managed services, usually Cloud Dataflow, to bring the data to BigQuery. Once the data was in BigQuery, we also used simple SQL transformations and views. Generally, we orchestrate batch pipelines using Cloud Composer, managed by Airflow. Streaming pipelines are mostly designed as chains of Cloud Dataflow jobs decoupled by Cloud Pub/Sub topics, as shown here:Migrating jobs into GCP managed servicesThere is the problem, however, that aggregations still performed on-premises were sometimes dependent on data from sources that already had been migrated. A temporary measure to address this was to create exports from the cloud back to on-premises in order to feed downstream pipelines with required data until these pipelines were migrated, as shown below:The on-premises backport, a temporary measure during migrationThis implies, however, that data sources of similar type might be processed both on-premises and in the cloud. As an example, web tracking of one shop may already have been ported to GCP while another shop’s tracking data was still being processed on-premises.Since redesigning data pipelines during migration could involve changes to data models and structures, it was thus possible that similar data types were available heterogeneously in BigQuery: natively processed in the cloud and mirrored to the cloud from the Hadoop cluster.To deal with this problem, we required our product teams to design their cloud-based services and models such that they could take data from two different heterogeneously modeled data sources. This is not as difficult a challenge as it may seem, though: There is no need to create a unified representation of heterogeneous data for all use cases. Instead, data from two sources can be combined in a use case-specific way.Finally, on March 1, 2019, all sources and their pipelines had been ported to GCP managed services. We cut off the on-premises exports, disabled the backport of cloud data to on-premises, shut down the cluster, and removed any duplicate logic from the product teams’ service introduced during the data porting step.After the migration, our architecture now looks like this:The finished cloud migrationAfter the cloud: What have we learned moving to GCP?Our product teams started getting the benefits of the cloud pretty quickly. We were surprised to learn that the point of no return in our cloud migration process came in the fall of 2018, even though the main data pipelines were still running on-premises.Once our product teams had gained autonomy, we were no longer able to take it back even if we wanted to. Moreover, the product teams had experienced considerably increased productivity gains, as they were free to work at their pace with more technology choices and managed services with high SLAs. Going back to a central Hadoop cluster on-premises environment was no longer an option.There are a few other key takeaways from our cloud migration:1. We’ve moved closer to a decentralized DevOps culture. While moving our data pool and the various web services of the product teams to managed cloud services, we automatically started to develop a DevOps culture, scripting the creation of a cloud infrastructure that treats infrastructure as code. We want to further build on this by creating a blameless culture with shared responsibilities, minimal risks due to manual changes, and reduced delivery times. To reach this goal, we’re adding a high degree of automation and increasing knowledge sharing. Product team members no longer rely on a central infrastructure team, but create infrastructure on their own.What has remained a central task of the infrastructure team, however, is the bootstrapping of cloud environments for the product teams. The required information to create a GCP project is added to a central, versioned configuration file. Such information includes team and project name, cost center, and a few technical topics such as selecting a network in a shared VPC, VPN access to Otto Group campus, and DNS zones. From there, a continuous integration (CI) pipeline creates the project, assigns it to the correct billing account, and sets up basic permissions for the team’s user groups with default IAM policies. This process takes no more than 10 minutes. Teams take over control of the created project and its infrastructure from there and can start work right away.2. Some non-managed services are still necessary. While we encourage our product teams to make use of GCP’s managed cloud services as much as possible, we do host some services ourselves. The most prominent example of such a service is our source code management and CI/CD system that is shared between the product teams.While we would love to use a hosted service for this, our legal department regards the source code of our data-driven products as proprietary and requires us to manage it ourselves. Consequently, we have set up a Gitlab deployment on GKE running in the Europe West region. The configuration of Gitlab is fully automated via code to provide groups, repositories and permissions for each team.The implications of a self-hosted Gitlab deployment are that we have to take care of regular database backups and also have a process for disaster recovery. We have to guarantee a reasonably high availability and have to follow Gitlab’s lifecycle management for patching or updating system components.3. Cloud is not one-size-fits-all. We have quickly learned that team autonomy really means team autonomy. While teams often face similar problems, superimposing central tools threatens team productivity by introducing team interdependencies and coordination overhead. Central tools should only be established if it cannot be avoided (see the motivation for hosting our own CI/CD system above) or if collaboration benefits outweigh the coordination overhead introduced (being able to look at the code of other teams, for example).For example, even though all teams need to deal with the repetitive task of scheduling recurring jobs, we have not set up a central job scheduler such as we did with Schedoscope on-premises. Each team decides on the best solution for their products by either using their own instance of a GCP-managed service like Airflow or even building their own solution like our recently published CLASH tool.Instead of sharing tooling between teams, we have moved on to sharing knowledge. Teams share their perspectives on those solutions, lessons learned, and best practices in our regular internal “GCP 3D” presentations.The road aheadMigrating our Hadoop data lake to the cloud was a bold decision—but we are totally satisfied with how it turned out and how quickly we were able to pull it off. Of course, the freedom of a cloud environment enjoyed by autonomous teams comes with the challenge of global transparency. Security monitoring and cost control are two main areas in which we’ll continue to invest.A further pressing topic for us is metadata management. Not only is data stored using different datastore technologies (or not stored at all in the case of streaming data), data is also spread across the teams’ GCP projects. We’ll continue to explore how to provide an overview of all the data available and how to ensure data security and integrity.As a company, one of our core values is excellence in development and operations. With our migration, we’ve found that moving to a cloud environment has brought us significantly further towards these goals.
Quelle: Google Cloud Platform

Deploying Grafana for production deployments on Azure

This blog is co-authored by Nick Lopez, Technical Advisor at Microsoft.

Grafana is one of the popular and leading open source tools for visualizing time series metrics. Grafana has quickly become the preferred visualization tool of choice for developers and operations teams for monitoring server and application metrics. Grafana dashboards enable operation teams to quickly monitor and react to performance, availability, and overall health of the service. You can now also use it to monitor Azure services and applications by leveraging the Azure Monitor data source plugin, built by Grafana Labs. This plugin enables you to include all metrics from Azure Monitor and Application Insights in your Grafana dashboards. If you would like to quickly setup and test Grafana with Azure Monitor and Application Insights metrics, we recommend you refer to the Azure Monitor Documentation.

 

Grafana server image in Azure Marketplace provides a great QuickStart deployment experience. The image provisions a virtual machine (VM) with a pre-installed Grafana dashboard server, SQLite database  and the Azure plugin. The default setup with a single VM deployment is great for a proof of concept study and testing. For high availability of monitoring dashboards for your critical applications and services, it’s essential to think of high availability of Grafana deployments on Azure. The following is the proposed and proven architecture to setup Grafana for high availability and security on Azure.

Setting up Grafana for production deployments

Grafana Labs recommends setting up a separate highly available shared MySQL server for setting up Grafana for high availability. The Azure Database for MySQL and MariaDB are managed relational database services based on the community edition of MySQL and the MariaDB database engine. The service provides high availability at no additional cost, predictable performance, elastic scalability, automated backups and enterprise grade security with secure sockets layer (SSL) support, encryption at rest, advanced threat protection, and VNet service endpoint support. Utilizing a remote configuration database with Azure Database for MySQL or Azure Database for MariaDB service allows for horizontal scalability and high availability of Grafana instances required for enterprise production deployments.

Leveraging Bitnami Multi-Tier Grafana templates for production deployments

Bitnami lets you deploy a multi-node, production ready Grafana solution from the Azure Marketplace with just a few clicks. This solution uses several Grafana nodes with a pre-configured load balancer and Azure Database for MariaDB for data storage. The number of nodes can be chosen at deployment time depending on your requirements. Communication between the nodes and the Azure Database for MariaDB service is also encrypted with SSL to ensure security.

A key feature of Bitnami's Grafana solution is that it comes pre-configured to provide a fault-tolerant deployment. Requests are handled by the load balancer, which continuously tests nodes to check if they are alive and automatically reroutes requests if a node fails. Data (including session data) is stored in the Azure Database for MariaDB and not on the individual nodes. This approach improves performance and protects against data loss due to node failure.

For new deployments, you can launch Bitnami Grafana Multi-Tier through the Azure Marketplace!

Configuring existing installations of Grafana to use Azure Database for MySQL service

If you have an existing installation of Grafana that you would like to configure for high availability, you can use the following steps that demonstrate configuring Grafana instance to use Azure Database for MySQL server as the backend configuration database. In this walkthrough, we will be using an example of Ubuntu with Grafana installed and configure Azure Database for MySQL as a remote database for Grafana setup.

Create an Azure Database for MySQL server with the General Purpose tier which is recommended for production deployments. If you are not familiar with the database server creation, you can read the QuickStart tutorial to familiarize yourself with the workflow. If you are using Azure CLI, you can simply set it up using az mysql up.
If you have already installed Grafana on the Ubuntu server, you’ll need to edit the grafana.ini file to add the Azure Database for MySQL parameters. As per the Grafana documentation on the Database settings, we will focus on the database parameters noted in the documentation. Please note: The username must be in the format user@server due to the server identification method of Azure Database for MySQL. Other formats will cause connections to fail.
Azure Database for MySQL supports SSL connections. For enterprise production deployments, it is recommended to always enforce SSL. Additional information around setting up SSL with Azure Database for MySQL can be found in the Azure Database for MySQL documentation. Most modern installations of Ubuntu will have the necessary Baltimore Cyber Trust CA certificate already installed in your /etc/ssl/certs location. If needed, you can download the SSL Certificate CA used for Azure Database for MySQL from  this location. The SSL mode can be provided in two forms, skip-verify and true. With skip-verify we will not validate the certificate provided but the connection is still encrypted. With true we are going to ensure that the certificate provided is validated   by the Baltimore CA. This is useful for preventing “man in the middle” attacks. Note that in both situations, Grafana expects the certificate authority (CA) path to be provided.
Next, you have the option to store the sessions of users in the Azure DB for MySQL in the table session. This is configured in the same grafana.ini under the session section. This is beneficial for instance in situations where you have load balanced environments to maintain sessions for users accessing Grafana. In the provider_config parameter, we need to include the user@server, password, full server and the TLS/SSL method. In this manner, this can be true or ssl-verify. Note that this is the go-sql-driver/mysql driver where more documentation is available.
After this is all set, you should be able to start Grafana and verify the status with the commands below:

systemctl start grafana-server
systemctl status grafana-server

If you see any errors or issues, the default path for logging is /var/log/grafana/ where you can confirm what is preventing the startup. The following is a sample error where the username was not provided as user@server but rather just user.

lvl=eror msg="Server shutdown" logger=server reason="Service init failed: Migration failed err: Error 9999: An internal error has occurred. Please retry or report your issues.

Otherwise you should see the service in an Ok status and the initial startup will build all the necessary tables in the Azure DB for MySQL database.

Key takeaways

The single VM setup for Grafana is great for quick start, testing and a proof of concept study but it may not be suitable for production deployments.
For enterprise production deployments of Grafana, separating the configuration database to the dedicated server enables high availability and scalability.
The Bitnami Grafana Multi-Tier template provides production ready template leveraging the scale out design and security to provision Grafana with a few clicks with no extra cost.
Using managed database services like Azure Database for MySQL for production deployments provides built-in high availability, scalability, and enterprise security for the database repository.

Additional resources

Get started with Bitnami Multi-Tier Solutions on Microsoft Azure

Monitor Azure services and applications using Grafana

Monitor your Azure services in Grafana

Setting up Grafana for high availability

Azure Database for MySQL documentation

Acknowledgments

Special thanks to Shau Phang, Diana Putnam, Anitah Cantele and Bitnami team for their contributions to the blog post.
Quelle: Azure

Understanding GCP service accounts: three common use-cases

If you’re building applications on Google Cloud Platform (GCP), you’re probably familiar with the concept of a service account, a special Google account that belongs to your application or a virtual machine, and which can be treated as an identity and as a resource. Depending on your use case, there are different ways to manage service accounts and to give them access to resources. In this post we will look at some of those common use cases, and help you determine the appropriate operational model for managing your service accounts.Use case 1: Web application accessing GCP resourcesImagine your users are accessing a web app to which they are authorized via Cloud Identity-Aware Proxy (IAP). They do not require direct access to the underlying GCP resources—just to the web app that utilizes the GCP resources. The web app uses a service account to gain permissions to access GCP services, for example, Datastore. In this case the service account has a 1:1 map to the web app—it’s the identity of the web app. To get started, you create the service account in the GCP project that hosts the web application, and you grant the permissions your app needs to access GCP resources to the service account. Finally, configure your app to use the service account credentials.Use case 2: Cross-charging BigQuery usage to different cost centersIn this scenario, departmental users query a shared BigQuery dataset using a custom-built application. Because the queries must be cross-charged to the users’ cost center, the application runs on a VM with a service account that has the appropriate permissions to make queries against the BigQuery dataset.Each department has a set of projects that are labelled such that the resources used in that project appear in the billing exports. Each department also has to run the application from their assigned project so that the queries run against BigQuery can be appropriately cross-charged.To configure this for each of the departments’ projects, in each of the projects executing the queries, assign the IAM permissions required to run queries against the BigQuery datasets to the application’s service account.For more information on configuring the permissions for this scenario, see this resource.Use case 3: Managing service accounts used for operational and admin activitiesAs a system administrator or operator responsible for managing a GCP environment, you want to centrally manage common operations such as provisioning environments, auditing, etc., throughout your GCP environment.In this case, you’ll need to create a variety of service accounts with the appropriate permissions to enable various tasks. These service accounts are likely to have elevated privileges and have permissions granted at the appropriate level in the hierarchy. And like for all service accounts, you need them to follow best practices to prevent them from being exposed to unauthorized users. For example, you should add a project lien to the projects where these operational service accounts are created to help prevent them from being accidentally deleted.Crazy for service accountsAs you can see from the use cases discussed above, one model does not fit all and you will need to adopt the appropriate operational model to fit your use case. We hope walking through these use cases helps you to think about where you logically should place your service accounts. To learn more about service accounts, try one of the following tutorials to see how to use service account credentials with the GCP compute service of your choice:Using service accounts with GKE to authenticate to GCPUsing service accounts with Compute engine instances to authenticate to GCPService account for AppEngine
Quelle: Google Cloud Platform

The Hallway Track Is Open For Scheduling

The Hallway Track at DockerCon is an innovative space designed to help facilitate those valuable conversations that come from chance hallway encounters. Instead of leaving it to chance, we’ve partnered with e180 to provide a platform that helps you find like-minded people to meet and learn from, discussing topics you are both interested in.

 The Hallway Track is open Monday through Thursday, and it’s best to schedule your meetings in advance. Register for DockerCon and then follow these steps to log in and start scheduling your Hallway Tracks today:

Explore the Market – where all participants post knowledge offers of topics they are willing to share, or questions they want to brainstorm.
Pick a topic from the list and/or create your own offers or questions. You don’t have to be an expert to post!
Schedule your Hallway Tracks and meet in person at the Hallway Track Lounge at DockerCon (Lobby, Level 2).

The Hallway Track is your opportunity to meet and share knowledge with other attendees, Docker Staff, Speakers, and Docker Captains. Register for DockerCon today and look out for email instructions to log into the Hallway Track platform.
Arriving for early registration before the Welcome Reception on Monday? Hallway Track is open from 12:00 – 7:30pm;  make the most of your time and plan a full day of meetings!

#DockerCon is only 2 weeks away! Schedule your Hallway Tracks today!Click To Tweet

The post The Hallway Track Is Open For Scheduling appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/