Azure SQL Data Warehouse releases new capabilities for performance and security

As the amount of data stored and queried continues to rise, it becomes increasingly important to have the most price-performant data warehouse. While we’re excited about being the industry leader in both of Gigaom’s TPC-H and TPC-DS benchmark reports, we don’t plan to stop innovating on behalf of our customers.

As Rohan Kumar mentioned in his blog on Monday, we’re excited to introduce several new features that will continue to make Azure SQL Data Warehouse the unmatched industry leader in price-performance, flexibility, and security.

To enable customers to continue improving the performance of their applications without adding any additional cost, we’re announcing preview availability of result-set caching, materialized views, and ordered clustered columnstore indexes.

In addition to price-performance enhancements, we’ve added new capabilities that enable customers to be more agile and flexible. The first is workload importance, which is a new feature that enables users to decide how workloads with conflicting needs get prioritized. Second, our new support for automatic statistics maintenance (auto-update statistics) means that manageability and maintenance of Azure SQL Data Warehouse just got easier and more effective. And finally, we’re also adding support for managing and querying JSON data. Users can now load JSON data directly into their data warehouses and mix it with other relational data, leading to faster and easier insights.

Our last announcement focuses on security and privacy. As you know, deploying data warehousing solutions in the cloud demands sophisticated and robust security. While Azure SQL Data Warehouse already enables an advanced security model to be deployed, today we’re announcing support for Dynamic Data Masking (DDM). DDM allows you to protect private data, through user-defined policies, ensuring it’s visible only to those that have permission to see it.

In the sections below, we’ll dive into these new features and the benefits that each provide.

Price-performance

Price-performance is a reoccurring theme in our releases because it ensures we provide one of the fastest analytics services at incredible value. With new functionalities announced today, we continue to demonstrate our commitment towards offering the leading price-performance platform.

Interactive dashboarding with result-set caching (preview)

Interactive dashboards come with predictable and repetitive query patterns. Result-set caching, now available in preview, helps with this scenario as it enables instant query response times while reducing time-to-insight for business analysts and reporting users.

With result-set caching enabled, Azure SQL Data Warehouse automatically caches results from repetitive queries, causing subsequent query executions to return results from the persisted cache that omits full query execution. In addition to saving compute cycles, queries satisfied by result-set cache do not use any concurrency slots and thus do not count against existing concurrency limits. For security reasons, only users with the appropriate security credentials can access the result sets in cache.

Materialized views to improve performance (preview)

Another new feature that greatly enhances query performance for a wide set of queries is materialized view support, now available in preview. A materialized view improves the performance of complex queries (typically queries with joins and aggregations) while offering simple maintenance operations.

When materialized views are created, Azure SQL Data Warehouse query optimizer transparently and automatically rewrites user queries to leverage deployed materialized views, leading to improved query performance. Best of all, as the data gets loaded into base tables, Azure SQL Data Warehouse automatically maintains and refreshes materialized views, providing a simplified view of maintenance and management. As the user queries leverage materialized views, queries run significantly faster and use less system resources. The more complex and expensive the query within the view is, the bigger potential there is for execution time savings.

Fast scans with ordered clustered columnstore indexes (preview)

Columnstore is a key enabler for storing and efficiently querying large amounts of data. For each table, it divides incoming data into row groups and each column of a row group forms a segment on a disk. When querying columnstore indexes, only the column segments that are relevant to user queries are read from the disk. Ordered clustered columnstore indexes further optimize query execution by enabling efficient segment elimination.

Due to pre-ordered data, you can drastically reduce the number of segments that are read from the disk, leading to faster query processing. Ordered clustered columnstore indexes is now available in preview, and queries containing filters and predicates can greatly benefit from this feature.

Flexibility

As business requirements evolve, the ability to change and adapt solution behavior is one of the key benefits of a modern data warehousing product. The ability to handle and manage heterogeneous data that enterprises have while offering ease of use and management is critical. To support these needs, Azure SQL Data Warehouse is introducing the following new functionalities to help you deal with ever-evolving requirements.

Prioritize workloads with workload importance (general availability)

Running mixed workloads on your analytics solution is often a necessity to effectively and quickly execute business processes. In situations where resources are constrained, the capability to decide which workloads need to be executed first is critical, as it helps with overall solution cost management. For instance, executive dashboard reports may be more important than ad-hoc queries. Workload importance now enables this scenario. Requests with higher importance are guaranteed quicker access to resources, which helps meet predefined SLAs and ensures important requests are prioritized.

Workload classification concept

To define workload priority, various requests must be classified. Azure SQL Data Warehouse supports flexible classification policies that can be set for a SQL query, a database user, database role, Azure Active Directory login, or Azure Active Directory group. Workload classification is achieved using the new CREATE WORKLOAD CLASSIFIER syntax.

The diagram below illustrates the workload classification and importance function:

Workload importance concept

Workload importance is established through classification. Importance influences a requester's access to system resources  including memory, CPU, and IO and locks. A request can be assigned one of these five levels of importance: low, below_normal, normal, above_normal, and high. If a request with above_normal importance is scheduled, it gets access to resources before a request with the default normal importance.

Manage and query JSON data (preview)

Organizations are increasingly faced with dealing with multiple data sources and heterogeneous file formats, JSON being among the top ones, aside from CSV files. To speed up time to insight and minimize unnecessary data transformation processes, Azure SQL Data Warehouse now enables support for querying JSON data. This feature is now available in preview.

Business analysts can now use the familiar T-SQL language to query and manipulate documents that are formatted as JSON data. JSON functions, such as JSON_VALUE, JSON_QUERY, JSON_MODIFY, and OPENJSON are now supported in Azure SQL Data Warehouse. Azure SQL Data Warehouse can now effectively support both relational and non-relational data, including joins between the two, while enabling users to use their traditional BI tools, such as Power BI.

Automatic statistics maintenance and update (preview)

Azure SQL Data Warehouse implements a cost-based optimizer to ensure optimal execution plans are being generated and used. For any cost-based optimizer to be effective, column level statistics are needed. When these statistics are stale, there is potential for selecting a non-optimal plan, leading to slower query performance.

Today, we’re extending that support for auto statistics creation by adding the ability to automatically refresh and maintain statistics. As data warehouse tables get loaded and updated, the system can now automatically detect and update out-of-date statistics. With the auto-update statistics capability now available in preview, Azure SQL Data Warehouse delivers full statistics management capabilities while simplifying statistics maintenance processes. You no longer need to manually maintain statistics, which leads to a simplified and more cost-effective data warehouse deployment.

Security

Azure SQL Data Warehouse provides one of the most advanced security and privacy features in the market. This is achieved through using proven SQL Server technology. SQL Server, as the core technology and component of Azure SQL Data Warehouse, has been the least vulnerable databases over the last eight years according to the NIST national vulnerabilities database. To expand existing Azure SQL Data Warehouse's security and privacy features, we’re announcing Dynamic Data Masking (DDM) support is now available in preview.

Protect sensitive data with dynamic data masking (preview)

Dynamic data masking (DDM) enables administrators and data developers to control access to their company’s data, allowing sensitive data to be safe and restricted. It prevents unauthorized access to private data by obscuring the data on-the-fly. Based on user-defined data masking policies, Azure SQL Data Warehouse can dynamically obfuscate data as the queries execute, and before results are shown to users.

Azure SQL Data Warehouse implements the DDM capability directly inside the engine. When creating tables with DDM, policies are stored in the system's metadata and then enforced by the engine as queries get executed. This centralized policy enforcement process simplifies data masking rules management as access control is not implemented and repeated at the application layer. As various users access queries tables, policies are automatically honored and applied while protecting sensitive data. DDM comes with flexible policies and you can choose to define a partial mask, which exposes some of the data in the selected columns, or a full mask that obfuscates the data completely. Azure SQL Data Warehouse also provides built-in masking functions that users can choose from.

Next steps

Get started with a free Azure SQL Data Warehouse account.
Learn more about workload management concepts and workload management scenarios.
Learn more about why analytics in Azure is simply unmatched.

Please note that the preview features mentioned in this blog are being rolled out to all regions. Check the version deployed to your instance and review the latest Azure SQL Data Warehouse release notes to learn more. For preview questions, please contact AskADWPreview@microsoft.com.
Quelle: Azure

Tips and best practices for moving your VMs to Compute Engine

When you’re moving to the cloud, it’s important to remember that a migration is not just a single, giant step. It is a journey that involves many smaller steps. At Google Cloud Platform (GCP), we’ve developed guidance and best practices for migrating VMs to GCP that we’re sharing here. Note that for this guide and blog post, we’re focused specifically on migrating VMs to Google’s Compute Engine. Read on for more on the benefits of GCP and best practices for migrating.Why migrate to Compute Engine?As you probably know, all of the VMs you’re migrating require computing resources along with other services that make applications work, such as databases, messaging, and analytics. As you consider where to run these VMs, here are the primary benefits of running them on Compute Engine:Cost reduction. With sustained use discounts on Compute Engine VMs, costs can be significantly lower than managing hardware or virtual machines in a traditional data center. When migrating from a different cloud to GCP, you can take advantage of those same pricing advantages.Agility. Most customers see an immediate improvement in agility because you can create virtual machines almost instantly and don’t have to wait for resources to be acquired and provisioned. You can quickly spin up new applications, experiment with them, and turn them off as necessary.Reduced overhead. Data centers usually require many different vendors, each with their own relationship, billing model, and contracts. Moving to the cloud can significantly reduce that overhead. Your staff no longer have to deal with the management overhead of running a data center and can focus on what makes your business thrive.Once you’ve picked Compute Engine as your migration target, what are some other things you need to consider for your migration journey? We’ll outline some of them here.Calculating the costsBefore you move any VMs, you’ll want to calculate the cost of the move. This means evaluating the cost of what you are currently running in your data centers or existing cloud environments. You can learn about cost management and which partner can best meet your needs in GCP’s VM migration center.Assessing the VMs to migrateAfter you have evaluated the cost of the move, you can start looking at which VMs to migrate. In modern enterprises, there are many different kinds of applications running on VMs, and it usually doesn’t make sense to move all of them together at the same time to the cloud. Doing this well often requires a thorough assessment, which GCP currently offers at no cost.Designing the migrationWhen you have decided which VMs to move, you need to design your cloud environment before you move anything. The first step is to find out how your current environment compares to GCP. Then you can start planning what your environment should look like on GCP. Below are some of the steps to getting started down your migration path.Establishing governanceYou need to establish who in your company can have permission to create, access, modify, and destroy cloud resources. You must also determine how resources will be paid for. You can find guidance in the IAM best practices documentation.Creating a networkBefore you move any VMs, the network they migrate to must exist. Similar to permissions and accounts, it’s important to create this network in advance, because establishing procedures after applications are in flight can be difficult.Planning for operationsWhen you do have your VMs running in the cloud, you need to monitor them, retain logs, and operationally manage them, just as you would in any system. You must think about these operations when you’re doing your advanced planning to make sure there aren’t any surprises after migrating.Migrating VMs to the cloudFinally, you should migrate your first VMs. The first migration will serve as your template for future migrations. You will surely refine your process as you do further migrating, but it’s important to record everything you do in the first migration in particular.Velostrata, Google Cloud’s migration tool, gives users a way to migrate VMs to Google Cloud Platform quickly, safely, and at scale. Velostrata uses streaming technology to reduce migration time, provides right-sizing recommendations before you migrate to help you choose appropriate instance types, and provides built-in testing and rollback (when needed). Velostrata is also free to use for customers migrating to GCP.These tips offer a quick look at what you should think about before migrating VMs to the cloud. For much more detailed guidance, check out this guide to best practices for migrating VMs to GCP.
Quelle: Google Cloud Platform

Google Cloud networking in-depth: What’s new with Cloud DNS

Editor’s note:At Google Cloud Next ‘19 we announced several additions to our networking portfolio, including new features for Cloud DNS. This blog post will get you into deep dive on those additions. Now, read on to learn more.Google Cloud DNS is a scalable, reliable and managed authoritative Domain Name System (DNS) service that translates requests for domain names like www.google.com into IP addresses like 74.125.29.101. Running on the same high performing, low-latency and high availability infrastructure as Google, Cloud DNS is a cost-effective and easy way to make your applications and services available to your enterprise users.In the past couple of months, we’ve launched many Cloud DNS networking features to make it easier for you to deploy and connect services on your private GCP networks. Today, let’s dive deeper into what we announced and how these new features can help you easily publish and manage millions of DNS zones and records from a simple user interface.Private zones perform internal DNS resolution for your private GCP networksCloud DNS private zones (GA) provide an easy-to-manage internal DNS solution for your private GCP networks, eliminating the need to provision and manage additional software and resources. Since it restricts DNS queries for private zones to a private network, no one else can access your internal network information.Cloud DNS private zones offers flexibility in your configurations by allowing multiple zones to be attached to a single VPC network. Additionally, support for split horizons allows you to have a private zone share the same name as a public zone while resolving to different IP addresses in each zone.DNS peering allows one network to forward DNS requests to another networkWhen GCP networks are peered, they do not automatically share private DNS zones, DNS policies, or even internal DNS records. Cloud DNS peering, currently in beta, provides a second method for sharing DNS data. You can configure all or a portion of the DNS namespace to be sent from one VPC to another and, once there, it will respect the DNS policies or matching zones defined in the peered network.You can use DNS peering to connect multiple VPCs to your on-prem DNS service by setting up a hub and spoke model for your VPCs. Here, the hub VPC utilizes DNS forwarding to perform the hybrid connection, and the spoke VPCs uses DNS peering to connect to the hub VPC.Cloud DNS can now export query logs and monitoring metrics from private zones to StackdriverWith DNS logging and monitoring (beta), you can now view DNS logs for private zones in Stackdriver, and export them to any destination that Stackdriver Logging export supports. Logged queries can come from Compute Engine virtual machines, Google Kubernetes Engine containers in the same VPC network or peering zones, or they can come from on-premises clients via inbound DNS forwarding.Monitoring metrics report on the DNS response types and the number of each response seen. Together these logs and metrics can help debug your DNS configuration or support DNS security analysis tools, so you can identify threats within your private networks.Keeping up with Cloud DNSTogether, Cloud DNS private zones, peering, and logging improve the flexibility of your private cloud architecture, while providing you visibility into your private DNS traffic. Let us know how you plan to use these new networking services, and what capabilities you’d like to have in the future. You can learn more about GCP’s cloud networking portfolio online and reach us at gcp-networking@google.com.
Quelle: Google Cloud Platform