Visual data ops for Apache Kafka on Azure HDInsight, powered by Lenses

This blog was written in collaboration with Andrew Stevenson, CTO at Lenses.

Apache Kafka is one of the most popular open source streaming platforms today. However, deploying and running Kafka remains a challenge for most. Azure HDInsight addresses this challenge by providing:

Ease-of-use: Quickly deploy Kafka clusters in the cloud and integrate simply with other Azure services.
Higher scale and lower total-cost-of-operations (TCO): With managed disks, compute and storage are separated, enabling you to have 100s of TBs on a cluster.
Enhanced security: Bring your own key (BYOK) encryption, custom virtual networks, and topic level security with Apache Ranger.

But that’s not all – you can now successfully manage your streaming data operations, from visibility to monitoring, with Lenses, an overlay platform now generally available as part of the Azure HDInsight application ecosystem, right from within the Azure portal!

With Lenses, customers can now:

Easily look inside Kafka topics
Inspect and modify streaming data using SQL
Visualize application landscapes

Look inside Kafka topics

A typical production Kafka cluster has thousands of topics. Imagine you want to get a high level view on all of these topics. You may want to understand the configuration of the various topics, such as the replication or partition distribution. Or you may want to look deeper inside a specific topic, investigating the message throughput and the leader broker.

While many of these insights can be provided through the Kafka CLI, Lenses greatly simplifies the experience by unifying key insights for topics and brokers via a simple to use and intuitive visual interface. With Lenses, inspecting your Kafka cluster is effortless.

Inspect and modify streaming data using SQL

What if you want to inspect the data within the Kafka topic and view the messages sent within a certain time frame? Or if you actually want to process a subset of that stream and write it back to another Kafka topic. You can achieve that with SQL queries and Processors within the Lenses UI. You can write SQL queries to validate your streaming data and unblock your client organizations faster.

SQL Processors can be deployed and monitored to perform real-time transforms and analytics, supporting all the features you would expect in SQL, like joins and aggregations. You can also configure Lenses to scale out processing with Azure Kubernetes Service (AKS).

Visualize application landscapes

At the end of the day, you’re trying to create a solution that will create business impact. That solution will be composed of various microservices, data producers, and analytical engines. Lenses gives you easy insights into your application landscape, describing the running processes and the lineage of your data platform.

In the Topology view, running applications are dynamically added, recovered at startup, and the topics are included. For creating end-to-end solutions, Lenses also provides an easy way to deploy connectors from the open source Stream Reactor project, containing a large collection of Kafka Connect Connectors.

Check out the following resources to get started with Lenses on Azure HDInsight:

Create an HDInsight Kafka cluster
Lenses on Azure HDInsight

Quelle: Azure

ANZ accelerates its digital transformation with Google Cloud smart analytics

In addition to secure, always-on service from their financial services providers, consumers increasingly expect personalized insights as well. As a result, financial services institutions are now leveraging cloud technology solutions to help them bridge from isolated data to rich customer recommendations. In this highly regulated industry, adopting cloud technology while complying with legal and regulatory requirements can be challenging. At Google Cloud, we understand that reducing costs while managing risks comes first, and we’ve been working hard to deliver secure and reliable platform modernization solutions that enable financial institutions to embrace cloud technology, create personalized customer experiences and transform their businesses.One of my favorite examples of how we’re driving transformation for financial institutions is our work with Australia and New Zealand Banking Group (ANZ), a company that’s changing the way it draws insights from data to help its Institutional customers make better strategic business decisions. ANZ is Australia’s third largest bank by market capitalization, with its Institutional Banking division operating across 34 markets globally.Looking for ways to add value to institutional customer relationships, ANZ wanted a tool to help deliver meaningful data insights quickly—insights which would ultimately help customers make informed business decisions on issues like liquidity, risk, and cash management, or strategic calls like store locations, inventory, and market positioning. It made sense for ANZ to leverage Google Cloud’s data analytics capabilities to help achieve this in an efficient, safe and secure way.Following a successful proof-of-concept, ANZ used our best-in-class smart analytics and data science technology to analyze aggregated, de-identified data sets and deliver these powerful insights to customers.BigQuery allowed ANZ to gain access to insights much faster than ever before—reducing time for analyzing a single table from five days to 20 seconds.Google Cloud Composer orchestrated data movement and transformation, making it easier to manage dependencies and multiple layers of the data pipeline.Google Kubernetes Engine created a platform to provide customized data services and data visualization experiences for customers.ANZ is also driving productivity across its organization by leveraging our big data analytics capabilities for heavy computational queries. These capabilities to help bankers reimagine daily tasks and processes, automate and accelerate previously manual operations such as analyzing aggregated, de-identified credit card data—something that used to take days for ANZ Bank but now takes seconds, and give bankers meaningful business insights 250 times faster than before.“Google Cloud improves how we process and use data internally, providing the technology and expertise to help us draw out data insights quickly so we can better serve the needs of our institutional customers,” said ANZ’s Chief Data Officer, Emma Gray. “There’s a genuine sense of collaboration with Google in solving our unique challenges and positioning ANZ and our customers for success today and in the future.”I’d like to thank ANZ for partnering with us to make banking better for customers around the world. To learn more about how Google Cloud is helping financial services institutions deepen customer relationships, lower costs and manage risk with secure and reliable data solutions, visit our website.
Quelle: Google Cloud Platform

Simplify the management of application configurations with Azure App Configuration

We’re excited to announce the public preview of Azure App Configuration, a new service aimed at simplifying the management of application configuration and feature flighting for developers and IT. App Configuration provides a centralized place in Microsoft Azure for users to store all their application settings and feature flags (a.k.a., feature toggles), control their accesses and deliver the configuration data where it is needed.

Eliminate hard-to-troubleshoot errors across distributed applications

Companies throughout industries are transforming into digital organizations in order to better serve their customers, foster tighter relationships and respond to competition faster. We have witnessed a rapid growth in the numbers of applications our customers have. Modern applications, particularly those running in a cloud, are typically made up of multiple components and distributed in nature. Spreading configuration data across these components often leads to hard-to-troubleshoot errors in production. When a company has a large portfolio of applications, these problems multiply very quickly.

With App Configuration, you can keep your application settings together so that:

You have a single consolidated view of all configuration data.
You can easily make changes to settings, compare values, and perform rollbacks.
You have numerous options to deliver these settings to your application, including injecting them directly into your compute service (e.g., App Service), embedding in a CI/CD pipeline, or retrieving them on-demand inside your code.

App Configuration allows you to maintain control over the configuration data and handle it with confidence.

Increase release velocity with feature flags

One of the core solutions we provide with App Configuration is feature management. Traditionally, a new application feature needs to go through a series of testing before it can be released. This generally leads to long development cycles. Newer software engineering methodologies, such as feature management using feature flags, help shorten the cycles by enabling real test in production while safeguarding the application stability. Feature management solves a multitude of developer challenges especially when building applications for the cloud.

App Configuration provides built-in support for feature management. You can leverage it to remotely control feature availability in your deployed application. While it can be used with any programming language, through its REST APIs, the .NET Core and ASP.NET Core libraries offer a complete end-to-end solution out of the box.

Get started now

App Configuration provides a complete turnkey solution for dealing with application settings and feature flags. It’s easy to onboard and use. You can find the complete documentation at, “Azure App Configuration Preview documentation.” Please give it a try and let us know what you think!
Quelle: Azure

Isolate app integrations for stability, scalability, and speed with an integration service environment

Innovation at scale is a common challenge facing large organizations. A key contributor to the challenge is the complexity in coordinating the sheer number of apps and environments.

Integration tools, such as Azure Logic Apps, give you the flexibility to scale and innovate as fast as you want, on-premises or in the cloud. This is a key capability you need to have in place when migrating to the cloud, or even if you're cloud native. Often, integration has been relegated as something to do after the fact. In the modern enterprise, however, application integration is something that has to be done in conjunction with application development and innovation.

An integration service environment is the ideal solution for organizations concerned about noisy neighbor issues, data isolation, or who need more flexibility and configurability than the core Logic Apps service offers.

Building upon the existing set of capabilities, we are releasing a number of new, exciting changes that make integration service environments even better, such as:

Faster deployment times by halving the previous provisioning time

Higher throughput limits for an individual Logic App and connectors

An individual Logic App can now run for up to a year (365 days)

Integration service environment for Logic Apps is the next step for organizations who are pursuing integration as part of their core application development strategy. Here’s what an integration service environment can offer:

Direct, secure access to your virtual network resources. Enables Logic Apps to have secure, direct access to private resources, such as virtual machines, servers, and other services in your virtual network including Azure services with service endpoints and on-premises resources via Azure ExpressRoute or site to site VPN.

Consistent, highly reliable performance. Eliminates the noisy neighbor issue, removing fear of intermittent slowdowns that can impact business critical processes with a dedicated runtime where only your Logic Apps execute in.

Isolated, private storage. Sensitive data subject to regulation is kept private and secure, opening new integration opportunities.

Predicable pricing. Provides a fixed monthly cost for Logic Apps. Each integration service environment includes the free usage of one standard integration account and one enterprise connector. If your Logic Apps action execution count exceeds 50 million action executions per month, the integration service environment could provide better value.

New to integration service environments for Logic Apps? Watch this Azure Friday introduction video for assistance.

Get started with an integration service environment for Azure Logic Apps today.
Quelle: Azure

Key causes of performance differences between SQL managed instance and SQL Server

Migrating to a Microsoft Azure SQL Database managed instance provides a host of operational and financial benefits you can only get from a fully managed and intelligent cloud database service. Some of these benefits come from features that optimize or improve overall database performance. After migration many of our customers are eager to compare workload performance with what they experienced with on-premises SQL Server, and sometimes they're surprised by the results. In many cases, you might get better results on the on-premises SQL Server database because a SQL Database managed instance introduces some overhead for manageability and high availability. In other cases, you might get better results on a SQL Database managed instance because the latest version of the database engine has improved query processing and optimization features compared to older versions of SQL Server.

This article will help you understand the underlying factors that can cause performance differences and the steps you can take to make fair comparisons between SQL Server and SQL Database.

If you're surprised by the comparison results, it's important to understand what factors could influence your workload and how to configure your test environments to ensure you have a fair comparison. Some of the top reasons why you might experience lower performance on a SQL Database managed instance compared to SQL Server are listed below. You can mitigate some of these by increasing and pre-allocating file sizes or adding cores; however, the others are prerequisites for guaranteed high availability and are part of the PaaS service.

Simple or bulk recovery model

The databases placed on the SQL Database managed instance are using a full database recovery model to provide high availability and guarantee no data loss. In this scenario, one of the most common reasons why you might get worse performance on a SQL Database managed instance is the fact that your source database uses a simple or bulk recovery model. The drawback of the full recovery model is that it generates more log data than the simple/bulk logged recovery model, meaning your DML transaction processing in the full recovery model will be slower.

You can use the following query to determine what recovery model is used on your databases:

select name, recovery_model_desc from sys.databases

If you want to compare the workload running on SQL Server and SQL Database managed instances, for a fair comparison make sure the databases on both sides are using the full recovery model.

Resource governance and HA configuration

SQL Database managed instance has built-in resource governance that ensures 99.99% availability, and guarantees that management operations such as automated backups will be completed even under high workloads. If you don’t use similar constraints on your SQL Server, the built-in resource governance on SQL Database managed instance might limit your workload.

For example, there's an instance log throughput limit (up to 22MBs on the general purpose and up to 48MBs on the business critical tier) that ensures you can't load more data than the instance can backup. In this case, you might see higher INSTANCE_LOG_GOVERNOR wait statistics that don’t exist in your SQL Server instance. These resource governance constraints might slow down operations such as bulk load or index rebuild because these operations require higher log rates.

In addition, the secondary replicas in business critical tier instances might slow down the primary database if they can't catch-up the changes and apply them, so you might see additional HADR_DATABASE_FLOW_CONTROL or HADR_THROTTLE_LOG_RATE_SEND_RECV wait statistics.

If you're comparing your SQL Server workload running on local SSD storage to the business critical tier, note that the business critical instance is an Always On availability group cluster with three secondary replicas. Make sure that your source SQL Server has an HA implementation similarly using Always On availability groups with at least one synchronous commit replica. If you're comparing the business critical tier with a single SQL Server instance writing to the local disk, this would be an unrealistic comparison due to the absence of HA on your source instance. If you are using async always on replicas, you would have HA with better performance, but in this case you are making the trade-off between the possibility of data-loss in favor of performance, and you will get the better results on the SQL Server instance.

Automated backup schedule

One of the main reasons why you would choose the SQL Database managed instance is the fact that it guarantees you will always have backups of your databases, even under heavy workloads. The databases in a SQL Database managed instance have scheduled full, incremental, and log backups. Full backups are taken every seven days, incremental every twelve hours, and log backups are taken every five to ten minutes. If you have multiple databases on the instance there's a high chance there is at least one backup currently running.

Since the backup operations are using some instance resources (CPU, disk, network), they can affect workload performance. Make sure the databases on the system that you compare with the managed instance have similar backup schedules. Otherwise, you might need to accept that you're getting better results on your SQL Server instance because you're making a trade-off between database recovery and performance, which is not possible on a SQL Database managed instance.

If you're seeing unexpected performance differences, check if there is some ongoing full/differential backup either on the SQL Database managed instance or SQL Server instance that can affect performance of the currently running workload, using the following query:

SELECT r.command, query = a.text, start_time, percent_complete,
eta = dateadd(second,estimated_completion_time/1000, getdate())
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) a
WHERE r.command IN ('BACKUP DATABASE','BACKUP LOG')

If you see currently running full or incremental backup during the short-running benchmark, you might pause your workload and resume it once the backup finishes.

Connection and App to Database proximity

The application accessing the databases and executing the benchmark queries on the SQL Database managed instance and SQL Server instance must be in a similar network proximity range in both cases. If you are placing your application and SQL Server database in the local environment (or running an app like HammerDB from the same machine where the SQL Server is installed) you will get better results on SQL Server compared to the SQL Database managed instance, which is placed on a distributed cloud environment with respect to the application. Make sure that in both cases you're running the benchmark application or query on separate virtual machines in the same region as SQL Database managed instance to get the valid results. If you're comparing an on-premises environment with the equivalent cloud environments, try to measure bandwidth and latency between the app and database and try to ensure they are similar.

SQL Database managed instance is accessed via proxy gateway nodes that accept the client requests and redirect them to the actual database engine nodes. In order to provide the results closer to your environment, enable ProxyOverride mode on your instance using Set-AzSqlInstance PowerShell command to enable direct access from the client to the nodes currently hosting your SQL Database managed instance.

In addition, due to compliance requirements, a SQL Database managed instance enforces SSL/TLS transport encryption which is always enabled. Encryption can introduce overhead in case of a large number of queries. If your on-premises environment does not enforce SSL encryption you will see additional network overhead in the SQL Database managed instance.

Transparent data encryption

The databases on SQL Database managed instance are encrypted by default using Transparent Data Encryption. Transparent Data Encryption encrypts/decrypts every page that is exchanged with the disk storage. This spends more CPU resources, and introduces additional latency in the process of fetching and saving the data pages to or from disk storage. Make sure that both databases on SQL Database managed instance and SQL Server have Transparent Data Encryption either turned on or off, and that database encryption/decryption operations have completed before starting performance testing.

You can use the following query to determine whether the databases are encrypted:

select name, is_encrypted from sys.databases

Another important factor that might affect your performance is encrypted TempDB. TempDB is encrypted if at least one database on your SQL Server or SQL Database managed instance is encrypted. As a result, you might compare two databases that are not encrypted, but due to some other SQL Database managed instance being encrypted (although it's not involved in the workload) the TempDB will also be encrypted. The unencrypted databases will still use encrypted TempDB and any query that creates temporary objects or uses spills would be slower. Note that TempDB will only get decrypted once all user databases on an instance are decrypted, and the instance restarts. Scaling a SQL Database managed instance to a new pricing tier and back is one way to restart it.

Database engine settings

Make sure the database engine setting such as database compatibility levels, trace flags, system configurations (‘cost threshold for parallelism’, ’max degree of parallelism’), database scoped configurations (LEGACY_CARDINALITY_ESTIMATOR, PARAMETER_SNIFFING, QUERY_OPTIMIZER_HOTFIXES, etc.), and database settings (AUTO_UPDATE_STATISTICS, DELAYED_DURABILITY) on the SQL Server and SQL Database managed instances are the same on both databases.

The following sample queries can help you to identify setting on SQL Server and Azure SQL Database managed instance:

select compatibility_level, snapshot_isolation_state_desc, is_read_committed_snapshot_on,

  is_auto_update_stats_on, is_auto_update_stats_async_on, delayed_durability_desc
from sys.databases;
GO

select * from sys.database_scoped_configurations;
GO

dbcc tracestatus;
GO

select * from sys.configurations;

Compare the results of these queries on the SQL Database managed instance and SQL Server and try to align the differences if you identify some.

Note: The list of trace flags and configurations might be very long so we recommend filtering them or lookng only on the trace flags you've changed or know are affecting performance. Some of the trace flags are pre-configured on SQL Database managed instance as part of PaaS configurations and they are not affecting performance.

You might experiment with changing the compatibility level to a higher value, turning on the legacy cardinality estimator, or the automatic tuning feature on the SQL Database managed instance, which might give you better results than your SQL Server database.

Also note that SQL Database managed instance might provide better performance even if you align all parameters because it has the latest improvements, or fixes that are not bound to compatibility level, or some features, like forcing last good plan, that might improve your workload.

Hardware and environment specification

SQL Database managed instance runs on standardized hardware with pre-defined technical characteristics that are probably different than your environment. Some of the characteristics you might need to consider when comparing your environment with the environment where the SQL Database managed instance is running are:

Number of cores should be the same both on SQL Server and the SQL Database managed instance. Note that a SQL Database managed instance uses 2.3-2.4 GHz processors, which might be different than your processor speed. It might consume more or less CPU for the same operation due to the CPU differences. If possible, check if hyperthreading is used on the SQL Server environment when comparing to the Gen4 and Gen5 hardware generations on a SQL Database managed instance. One on Gen4 hardware does not use hyperthreading, while on Gen5 it does. If you are comparing SQL Server running on a bare-metal machine with a SQL Database managed instance or SQL Server running on a virtual machine you'll probably get better results on a bare-metal instance.
Amount of memory including memory/core ratio (5.1 GB/core on Gen5, 7 GB/core on Gen4). Higher memory/core ratio provides bigger buffer pool cache and increases cache hit ratio. If your workload does not perform well on a managed interface with the memory/core ratio 5, then you probably need to choose a virtual machine with the appropriate memory/core ratio instead of a SQL Database managed instance.
IO characteristics – You need to be aware that performance of the storage system might be very different compared to your on-premises environment. A SQL Database managed instance is a cloud database and relies on Azure cloud infrastructure.

The general purpose tier uses remote Azure Premium disks where IO performance depends on the file sizes. If you reach the log limit that depends on the file size, you might notice WRITE_LOG waits and less IOPS in file statistics. This issue might occur on a SQL Database managed instance if the log files are small and not pre-allocated. You might need to increase the size of some files in the general purpose tier to get better performance (see this Tech Community article Storage performance best practices and considerations for Azure SQL Managed Instance General Purpose tier).
A SQL Database managed instance does not use instant file initialization, so you might see additional PREEMPTIVE_OS_WRITEFILEGATHER wait statistics since the date files are filled with zero bytes during file growth.

Local or remote storage types – Make sure you're considering local SSD versus remote storage while doing the comparison. The general purpose tier uses remote storage (Azure Premium Storage) that can't match your on-premises environment if it uses local SSD or a high-performance SAN. In this case you would need to use the business critical tier as a target. The general purpose tier can be compared with other cloud databases like SQL Server on Azure Virtual Machines that also use remote storage (Azure Premium Storage). In addition, beware that remote storage used by a general purpose instance is still different than remote storage used by a SQL Virtual Machine because:

The general purpose tier uses a dedicated IO resource per each database file that depends on the size of the individual files, while SQL Server on Azure Virtual Machine uses shared IO resources for all files where IO characteristics depend on the size of the disk. If you have many small files, you will get better performance on a SQL Virtual Machine, while you can get better performance on a SQL Database managed instance if the usage of files can be parallelized because there are no noisy neighbors who are sharing the same IO resources.
SQL Virtual Machines use a read-caching mechanism that improves read speed.

If your hardware specs and resource allocation are different, you might expect different performance results that can be resolved only by changing the service tier or increasing file size. If you are comparing a SQL Database managed instance with SQL Server on Azure Virtual Machines, make sure that you are choosing a virtual machine series that has memory/cpu ratio similar to SQL Database managed instance, such as DS series.

Azure SQL Database managed instance provides a powerful set of tools that can help you troubleshoot and improve performance of your databases, in addition to built-in intelligence that could automatically resolve potential issues. Learn more about monitoring and tuning capabilities of Azure SQL Database managed instance in the following article: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-monitoring-tuning-index
Quelle: Azure