Large-Scale Analysis of DNS Query Logs Reveals Botnets in the Cloud

This post was co-authored by Tomer Teller, Senior Security Program Manager, Azure Security.

The arms race between data security professionals and cybercriminals continues at a rapid pace. More than ever, attackers exploit compute resources for malicious purposes by deploying malware, known as “bots”, in virtual machines running in the cloud. Even a conservative estimate reveals that, at least, 1 in every 10,000 machines are part of some known Botnet.

To better protect VMs in the cloud, Azure Security Center (ASC) applies a novel supervised Machine Learning model for high-precision Botnet detection based on analysis of DNS query logs. This model achieves 95% precision and 43% recall and can detect Botnets before they are reported by antimalware companies.

Communication patterns between Botnets and their CnC server

Bots are controlled by the attacker (Botmaster) using a Command and Control (CnC) server or servers. The Bots which are part of this network are called Botnets (or Zombies). A typical Bot network (Botnet) structure is illustrated in the following figure.

Historically, a CnC server was assigned a static IP address making it very easy to take down or blacklist. To avoid detection, Botmasters responded by creating more complex bot/CnC communication patterns.

For example, attackers developed methods, such as Fast-Flux (Mehta, 2014), that use domain names to locate the CnC server which frequently changes its IP address. In addition, domain generation algorithms (DGA) are also used by various families of malware, with Conficker being perhaps the most notorious example. The DGA pattern works by periodically generating a large number of domain names that can be used by bots as connection points to the CnC servers.

More recently, social networks and other user-generated content sites are being exploited by Botmasters to pass information without ever establishing a direct link with a CnC server. Security professionals can therefore no longer rely on simple rule-based approaches to detect these complex communication patterns.

Opportunity to detect Botnets in the cloud

One of the more common applications of machine learning in the cybersecurity domain is anomaly detection. The idea is that a compromised machine exhibits anomalous behavior. While this assumption is usually correct, the opposite seldom holds. Therefore, such techniques achieve low precision and thus produce many false alarms.

Cloud providers such as Microsoft possess a unique opportunity to detect Botnet activity with much greater accuracy by applying large scale machine learning over multiple data sources as a well as a combined view of all the VM logs. Unlike most other systems which analyze data from each machine in isolation, our approach can effectively uncover patterns that are typical to Botnets.

Gathering the data

We collect DNS query and response data from Azure VMs. The logs contain around 50TB of data per day and includes information such as the query name, queried domain name server, the DNS response, and other DNS logging information.

In addition to DNS query and response data, we also use a Microsoft automated machine-readable feed of threat intelligence (TI). The feed includes information about IP addresses of devices which are likely to be part of a Botnet as well as the IP addresses and domains of known CnC servers.

To achieve optimal results, we model Botnet detection as a 2-class supervised learning problem. That is, to classify that a VM (on a specific date) is part of a Botnet based on that VM’s DNS query log. VM instances are labeled as possibly participating in a Botnet based on the following criteria:

The IP address of the VM appears in the TI Botnet feed of that same day.
The VM issued a DNS query with a domain known to belong to a CnC.
The VM received a DNS response to an issued query and the resulting mapped IP is a known IP address of a CnC.

Feature extraction

The VM instances represent a VM on a specific day and are labeled as possible participants of a botnet based on the TI feed of that day. In our problem, feature extraction is difficult; the number of domains accessed by a given VM can be very large and the total number of possible domains is massive. Hence, the domain space is huge and relatively dense.

Moreover, since the model is used continuously, it needs to identify Botnets even when they query for domains is unseen during training. Based on communication patterns with CnC servers our features should capture the insights laid out in the following table.

Name
Explanation

Rare domains
Domain names of CnC servers are rare since they are seldom requested by legitimate users

Young domains
When a domain generation algorithm (DGA) is used the CnC server frequently acquires new domain names hence they tend to be recently registered. We use a massive daily updated data feed to map domain names to their registration date

Domains Idiosyncratic to Botnets
Botnets controlled by the same CnC server issue DNS queries which contain similarities to each other yet are different from others

Non-existent domain responses

When DGA is used, Botnets query many non-existing domains before they find the actual domain of their CnC server for that time

To efficiently generate the features for each instance we apply two passes over the dataset. In the first pass, we generate a Reputation Table (RT) which maps domains at a given day to:

Rareness scores
Youngness scores
Botnet idiosyncratic scores

In the second pass we calculate the features for each instance based on the reputation scores of domains it queried for. The RT is calculated as follows:

To generate features for a VM at a given day, we create a set Dnx and Dx which are sets of all the non-existent and existent domains queried by the VM, respectively. We produce the feature vectors by summing up the corresponding values in the reputation table for each domain in Dnx and Dx separately.

Similarly, we do the same for the name-servers being queried (i.e., the DNS server that eventually resolved the DNS query). The latter features help identify legitimate scenarios in which rare subdomains of a non-malicious zone are accessed, e.g., rare subdomains of Yahoo.com. For classification, we used Apache Spark’s Gradient-Boosted Trees with default parameters.

Experimental evaluation

We use the Microsoft TI feed to generate the labels for our daily Azure VM instances. However, these labels are not perfect; Botnets can still remain undetected for quite some time. Hence, the labels we produced based on the feed are not comprehensive. This makes our evaluation setting different from that of a standard classification problem, since our goal is not to perfectly match labels that can be extracted based on the feed. This would simply duplicate information which is already available in the daily feed. Instead, our goal is to find compromised VMs before they appear in the TI feed.

With this in mind, we trained our model on a week of data from early June 2016. We let the model classify instances from late June and produce our “ground truth” for evaluation based on labels generated from the TI feed looking forward in time one week (into July). We report the accuracy of our model in the following confusion matrix.

From the matrix, we learn that the model classified 432 (411+21) instances from the test set as being Botnets. Out of these, 95% (411) eventually appear in the Interflow feed (within a week), hence the model achieves 95% precision and 43% recall. Note that the 5% apparent FPs may still potentially be Botnets that have not yet appeared in the feed, hence they require further investigation.

Conclusions

We present a novel supervised ML model for Botnet detection based on DNS logs. We generate the labels for the supervised model based on a threat intelligence feed provided by anti-malware vendors. We show that the model is able to identify with high accuracy the VMs that are part of a botnet well before they become part of the TI feed. This new Botnet detection feature will reduce the risk of Azure VMs becoming infected with malware.
Quelle: Azure

Announcing Support for Multi-member Consortium Blockchain Networks on Azure

As your blockchain application development efforts and pilots mature, we realize that the requirements for the underlying consortium network will change, and that you will need to easily and securely create and deploy across multiple regions and support members comprised from organizations that exist within separate administrative and trust boundaries. We are excited to announce expansion of our blockchain support on Azure to be the first public cloud that enables multi-member consortium blockchain networks addressing enterprise scenarios that require a deployment of a private network across Azure regions, subscriptions, and Azure Active Directory (Azure AD) tenants.

As we work with customers, we see scenarios divide into three common topologies:

1.       Single organization, multiple subscriptions: This is a common topology when divisions in an organization do not trust each other, for example when one division is auditing another division. Each division has its own footprint, but they are physically separated in different subscriptions across the same Azure AD tenant.

2.       Multiple organizations, private: This is the true consortium scenario where each organization will have its own footprint and subscriptions, Azure AD tenants, and regions are all different. Given enterprise IT requirements, the services deployed must not be publicly accessible on the internet, even though communication will occur across organizations.

3.       Multiple organizations, public-facing: Similar to the above topology, but in industries, enterprises, or scenarios where IT requirements allow or require the services deployed to be accessible to the public, over the internet. This simplifies the network connectivity requirements for the distributed system.

Today, we are releasing a set of solution templates in the Azure Marketplace that address the first two topologies configuring multi-region and multi-member Ethereum Consortium Blockchain Networks with a simple multi-step process through the Azure Portal or cmdline. The first template deploys and configures the footprint for the initial consortium member (or region), while the second template deploys, connects, and configures additional members (or regions) to form the overarching private network.  If you are still experimenting, we suggest using the simpler single subscription deployment solution released in November.

These solution templates are designed to make it easier and quicker to deploy and configure a multi-member consortium Ethereum network with minimal Azure and Ethereum knowledge. With a handful of user inputs and a single-click deployment, each member can provision their network footprint, using Microsoft Azure compute, networking, and storage services across the globe. Each member&;s network footprint consists of a set of load-balanced transaction nodes with which an application or user can interact to submit transactions, a set of mining nodes to record transactions, and a VPN gateway. A subsequent connection step connects the gateways to create a fully configured multi-member blockchain network.

A multi-member network architecture is illustrated below.

It is important to note that to whom a member connects is not dictated by the template. The consortium should determine the network connectivity model, whether hub and spoke or mesh.

For more information about the solution, you can visit our guided walkthrough.

As with our first blockchain solution, rather than spending hours building out and configuring the infrastructure and networking across organizations, we have automated these time-consuming pieces to allow you to focus on building out the consortium and your production pilots.

Let us know if you have any questions, feedback, or additional requests once you try out these new blockchain solutions. We are excited to help you expand your blockchain deployments to true multi-party topologies.
Quelle: Azure

Introducing Backup Pre-Checks for Backup of Azure VMs

Over the past couple of weeks, we have announced multiple enhancements for backup and recovery of both Windows and Linux Azure Virtual Machines that reinforce Azure Backup’s cloud-first approach of backing up critical enterprise data in Azure. Enterprise production environments in Azure are becoming increasingly dynamic and are characterized by frequent VM configuration changes (such as network or platform related updates) that can adversely impact backup. Today, we are taking a step to enable customers to monitor the impact of configuration changes and take steps to ensure the continuity of successful backup operations. We are excited to announce the preview of Backup Pre-Checks for Azure Virtual Machines.

Backup Pre-Checks, as the name suggests, check your VMs’ configuration for issues that can adversely affect backups, aggregate this information so you can view it directly from the Recovery Services Vault dashboard and provide recommendations for corrective measures to ensure successful file-consistent or application-consistent backups, wherever applicable. All this without any infrastructure and at no additional cost.

Backup Pre-Checks run as part of the scheduled backup operations for your Azure VMs and complete with one of the following states:

Passed: This state indicates that your VMs configuration is conducive for successful backups and no corrective action needs to be taken.
Warning: This state indicates one or more issues in VM’s configuration that might lead to backup failures and provides recommended steps to ensure successful backups. Not having the latest VM Agent installed, for example, can cause backups to fail intermittently and falls in this class of issues.
Critical: This state indicates one or more critical issues in the VM’s configuration that will lead to backup failures and provides required steps to ensure successful backups. A network issue caused due to an update to the NSG rules of a VM, for example, will fail backups as it prevents the VM from communicating with the Azure Backup service and falls in this class of issues.

Value proposition

Identify and Monitor VM configuration issues at scale: With the aggregated view of the Backup Pre-Check status across all VMs on the Recovery Services Vault, you can keep track of how many VMs need corrective configuration changes to ensure successful backups
Resolve configuration issues more efficiently: Use the Backup pre-check states to rank which VMs need configuration changes. Address the “Critical” Backup Pre-Check status for your VMs first, using the specific required steps and ensure their successful backups before addressing the “Warning” Backup Pre-check states for your VMs.
Automated execution: You don’t need to maintain or apply separate schedules for Backup Pre-Checks as they are integrated with existing backup schedules and therefore are assured to execute automatically and get the latest VM configuration information at the same cadence as their backups.

Getting started

Follow the steps below to start resolving any issues reported by Backup Pre-Checks for Virtual Machine backups on your Recovery Services Vault.

Click on the ‘Backup Pre-Check Status (Azure VMs)’ tile on the Recovery Services Vault dashboard.
Click on any VM with Backup Pre-Check status of either Critical or Warning. This would open the VM details blade.
Click on the blade notification on the top of the blade to reveal the configuration issue description and remedial steps.

Related links and additional content

Learn more about preparing your VMs for successful backups 
Need help? Reach out to Azure Backup forum for support
Tell us how we can improve Azure Backup by contributing new ideas and voting up existing ones
Follow us on Twitter @AzureBackup for the latest news and updates
New to Azure Backup, sign up for a free Azure trial subscription

Quelle: Azure

Azure AD B2C Access Tokens now in public preview

We are excited to announce that now you can have greater control over your web API’s when you secure them using Azure AD B2C. Today, we are enabling the public preview for using access tokens with your web API’s. This is a powerful feature that many of you have been asking for. The introduction of this feature makes it possible to create web API’s that can be accessed by different client applications. You can even grant permissions to your API on an app-to-app basis. By having more control over who can access your API, you will be able to develop apps with tighter security. Getting started Create the Web API Go to the Azure AD B2C Settings blade in your Azure AD B2C tenant and add a new application. Give your application a name, set ‘Include web app / web API’ to ‘YES’, and enter a ‘Reply URL’ and an ‘App ID URI’. After creating your web API, click on the application, and then ‘Published scopes’. In this blade, you can add the scopes, or permissions, that a client application can request. The ‘user_impersonation’ permission is available by default. Create the client application Inside the ‘Applications’ blade, register a new application. After creating it, select ‘Api access’. Click the ‘Add’ button. In the next blade, select the API and its permissions you would like to grant your client application. By default, applications are granted the ability to access the user’s profile via the “openid” permission, and generate refresh tokens via the “offline_access” permission. These can be removed if you do not want your client application to have this functionality. Acquiring an Access Token Making a request to Azure AD B2C for an access token is similar to the way requests are made for id tokens. The main difference is the value entered in the “scope” parameter. The “scope” parameter contains the specific resource and its permissions your app is requesting. For example, to access the “read” permission for the resource application with an App ID URI of “https://B2CBlog.onmicrosoft.com/notes”, the scope in your request would be “https://B2CBlog.onmicrosoft.com/notes/read”. Below is an example of an authorization code request with the following scopes: “https://B2CBlog.onmicrosoft.com/notes/read”, “openid”, and “offline_access”. https://login.microsoftonline.com/B2CBlog.onmicrosoft.com/oauth2/v2.0/authorize? p=<yourPolicyId>&client_id=<appID_of_your_client_application>&nonce=anyRandomValue &redirect_uri=<redirect_uri_of_your_client_application>&response_type=code &scope=https%3A%2F%2FB2CBlog.onmicrosoft.com%2Fnotes%2Fread+openid+offline_access If you would like to learn more about this feature or try it out using our samples, please check out our documentation. Keep your great feedback coming on UserVoice and Twitter (@azuread). If you have questions, get help using Stack Overflow (use the ‘azure-ad-b2c’ tag).
Quelle: Azure

Lighting up network innovation

Driving up network speed, reducing cost, saving power, expanding capacity, and automating management become crucial when you run one of the world’s largest cloud infrastructures. Microsoft has invested heavily in optical technology to meet these needs for its Azure network infrastructure. The goal is to provide faster, less expensive, and more reliable service to customers, and at the same time, enable the networking industry to benefit from this work. We’ve been collaborating with industry leaders to develop optical solutions that add more capacity for metro area, long-haul, and even undersea cable deployments. We have integrated these optical solutions within network switches and manage them through Software Defined Networking (SDN).

Our goal was to provide 500 percent additional optics capacity at 10x reduced power, a fraction of the previous footprint at a lower cost than what’s possible with traditional systems. Microsoft chose to ignore the chicken and egg problem and create a demand for 100 Gbps optics in a stagnant ecosystem unable to meet the demands of . In this blog, we explain the improvements we’ve made and where we’re boldly heading next.

Optical innovation leadership

We began thinking about how to more efficiently move network traffic between cloud datacenters, both within metro areas and over long distances around the world. We homed in on fiber optics, or “optics,” as an area where we could innovate, and decided to invest in our own optical program to integrate all optics into our network switching platforms.

What do we mean when we talk about optics? Optics is the means for transmitting network traffic between our global datacenters. Copper cabling has been the traditional means of carrying data and is still a significant component of server racks within the datacenter. However, moving beyond the rack at high bandwidth (for example, 100 Gbps and more) requires optical technologies. Optical links, light over fiber, replace copper wires to extend the reach and bandwidth of network connections.

Optical transmitters “push” pulses of light through fiber optic cables, converting high-speed electrical transmission signals from a network switch to optical signals over fiber. Optical receivers convert the signals back to electrical at the far end of the cable. This makes it possible to interconnect high-speed switching systems tens of kilometers apart in metro areas, or thousands of kilometers apart between regional datacenters.

To connect devices within the datacenter, each device has its own dedicated fiber. Since the light’s optical wavelength, or color, is isolated by the fiber, the color used to make the connections can be reused on every connection. By using a single color, optic manufacturers can improve costs of high-volume manufacturing. However, single-color has a high fiber cost, particularly as distances increase beyond 500 meters. Although this cost is manageable in intra-datacenter implementations where distances are shorter, the fiber used for inter-datacenter connections, metro and long-haul, is much more expensive.

Cost

We focused on optics because the cost can be 10x the cost of the switch port and even more for ultra-long haul. We began by looking for partners to collaborate on ultra-high integration of optics into new commodity switching platforms to break this pattern. Simultaneously, we developed open line systems (OLS) for both metro and ultra-long-haul transmission solutions to accept the cost-optimized optical sources. Microsoft partnered with several networking suppliers, including Arista, Cisco, and Juniper, to integrate these optics with substantially reduced power, a very small footprint, and much lower cost to create a highly interoperable ecosystem.

Figure 1. Traditional closed line system

In the past, suppliers have tried to integrate optics directly into switches, but these attempts didn’t include SDN capabilities. SDN is what enables network operators to orchestrate the optical sources and line system with switches and routers. By innovating with the OLS concept, including interfaces to the SDN controller, we can successfully build optics directly into commodity switches and make them fully interoperable. By integrating optics into the switch, we can easily manage and automate the entire solution at a large scale.

With recent OLS advances, we’re also able to achieve 70 percent more spectral efficiency for ultra–long haul connections of our datacenters between distant regions. We have drastically cut costs with this approach, more than doubling the capacity between datacenter when combining lower channel spacing with new modulation techniques that offer 150–200 Gbps per channel. 

Figure 2. Open line system (OLS)

Power

In cloud-scale datacenters, power usage is a major consideration and can limit overall capacity of the solution. As such, we developed a new type of inexpensive 100 Gbps colored optic that fits into a tiny industry standard QSFP28 package to cover distances within metro area and the metro OLS that’s needed to support it. This solution completely replaces expensive long-haul optics for distances up to 80 km.

Due to miniaturization, integration, and innovations in both ultra–long-haul and metro optics, network operators can take advantage of these new approaches and use a fraction of the power (up to 10x less power) while giving customers up to 500 percent more capacity. We’ve expanded capacity and spectral efficiencies at lower overall cost in both capital expenditure and operating expenses than our current systems.

Space

The physical equipment necessary to connect datacenters between regions takes a large amount of space in some of the most expensive datacenter realty. Optical equipment often dominates limited rack space. The equipment necessary to connect datacenters within a region can also require a large amount of space and can limit the number of servers that can be deployed.

By integrating the metro optics and long-haul optics into commodity switching platforms, we’ve reduced the total space needed for optical equipment to just a few racks. In turn, this creates space for more switching equipment and more capacity. By miniaturizing optics, we’ve reduced the overall size of the metro switching equipment to half of its previous footprint, while still offering 500 percent more capacity.

Figure 3. Inphi’s ColorZ® product—large-scale integration of metro-optimized optics into a standard switch-pluggable QSFP28 package

Automation

Microsoft is focused on simplicity and efficiency in monitoring and maintenance in cloud datacenters. We recognized that a further opportunity to serve the industry lay in full automation of our optical systems to provide reliable capacity at scale and on demand.

Monitoring for legacy systems can’t distinguish optical defects from switching defects. This can result in delays in diagnosing and repairing hardware failures. For the optical space, this has historically been a manual process.

We saw that we could solve this problem by fully integrating optics into commodity switches and making them accessible with our SDN monitoring and automation tooling. By driving an open and optimized OLS model for optical networking equipment, we’ve ensured that the proper interfaces are present to integrate optical operations into SDN orchestration. Now automation can quickly mitigate defects across all networking layers, including service repair, with end-to-end work flow management. The industry benefits from this because optics monitoring and mitigations can now keep pace with cloud scale and growth patterns.

Industry impact

Microsoft has incorporated all these technologies into the Azure network, but the industry at large will benefit. For example, findings from ACG Research show that the Microsoft metro solution will result in a more than 65 percent reduction in Total Cost of Ownership. In addition, the research demonstrates power savings of more than 70 percent over 5 years.

Several of our partners are making available the building blocks of the Microsoft implementation of open optical systems. For example:

Cisco and Arista provide the integration of ultra–long-haul optics into their cloud switching platforms.
If your switches don’t support optical integration, several suppliers offer dense, ultra–long-haul solutions that enable disaggregation of optics from the OLS in the form of pizza boxes.
ADVA Optical Networking provides open metro OLS solutions that support Inphi ColorZ® optics and several other turnkey alternatives.
Most ultra–long-haul line systems have supported International Telecommunication Union–defined alien wavelengths (optical sources) for quite some time. Talk to your supplier for additional details.

If you’re interested in the deep, technical details behind these innovations, you can read the following technical papers:

Interoperation of Layer-2/3 Modular Switches with 8QAM/16QAM Integrated Coherent Optics over 2000 km Open Line System
Demonstration and Performance Analysis of 4 Tb/s DWDM Metro-DCI System with 100G PAM4 QSFP28 Modules
Transmission Performance of Layer-2/3 Modular Switch with mQAM Coherent ASIC and CFP2-ACOs over Flex-Grid OLS with 104 Channels Spaced 37.5 GHz.
Open Undersea Cable Systems for Cloud Scale Operation

Opening new frontiers of innovation

As these innovations in optics demonstrate, Microsoft is developing unique networking solutions and opening our advances for the benefit of the entire industry. Microsoft is working with our partners to bring even more integration, miniaturization, and power savings into future 400 Gbps interconnects that will power our network.

Read more

To read more posts from this series please visit:

Networking innovations that drive the cloud disruption
SONiC: The networking switch software that powers the Microsoft Global Cloud
How Microsoft builds its fast and reliable global network

Quelle: Azure

Analyze your data with Application Insights Analytics

Analytics is a powerful search tool that lets you analyze large volumes of any JSON or CSV data. You can run a wide range of queries, including statistical and machine learning algorithms. You might be familiar with it as part of Application Insights, but you can also apply it to any stream of NoSQL data.

For example, let’s suppose you receive a data feed about flights. You could automate a daily analysis of route popularity and congestion. Analytics can run complex queries, including joins, aggregations, and statistical functions, to extract the necessary results. You can view the results in the range of charts available in Analytics. Or you could have Power BI run the queries each day, plot the results on maps, and present them on a website.

Analytics was originally designed as the powerful analysis tool for web application telemetry from Application Insights. But in this blog, we’ll focus on applying it to a separate stream of data.

Set up

To analyze your data with Analytics, you need an account in Microsoft Azure.

Sign in to the portal and set up a Storage resource in Azure. This is where you will put your data before it’s sucked into Analytics.

Create an Application Insights resource. Then navigate from there to the Analytics page.

Define your data source

Before you analyze some data, you need to tell Analytics about its format. Like we said, we’ll skip the Application Insights material for now, and go straight to defining our own data source.

This opens a wizard where you name the data source and define its schema. You can do that either by providing an explicit schema, or by uploading a small sample of my data – that’s usually easier.

In the flight data example, the files are in CSV format. The sample data file includes headers, and the schema is automatically inferred from it. You get the opportunity to update the inferred data types and field names if necessary.

Ingest data

Once you’ve defined a schema, you can upload data files as often as you like. Data files of hundreds of MB are easily handled by Analytics.

To ingest the data, it’s easiest to automate the process with a short script. The script uploads the data to Azure storage, and then notifies Analytics to ingest it. There’s a sample in the import documentation.

Run queries

Here’s a query to look for the top 10 destination airports by airlines.

| summarize count() by Destination_airport
| top 10 by count_ desc
| render piechart

Result:

The query language is powerful but easy to learn, and has a piped model in which each operator performs one task – much easier to work with than the nested SELECTs of SQL.

Join multiple tables

Analytics can ingest multiple sources and your queries can run joins over them.

For example, but wouldn’t it much nicer to see the airports identified by full name instead of by their codes? Let’s add a new data source, airportsData, which maps each airport code to its name and other information. Now we can perform a join on the tables:

airlineRoutes
| summarize routeCount = count()
by airportCode = destination_airport // rename field for join
| top 10 by routeCount count_ desc
| join kind=inner ( airportsData ) on airportCode
| project routeCount, airportCode // the fields we want
| render piechart

Augmenting Application Insights telemetry

The main job of Analytics is as the powerful query tool of Application Insights, which monitors the health and usage of your web applications. One of the reasons for importing data into Analytics is to augment the telemetry data. For example, to make the telemetry reports more readable, query URLs can be translated to page names.

Get started today

Analytics can be applied to your data today. Read detailed how-to here.

Whether you want to enrich your data or to analyze the logging data of your application, you can easily add a new data source and start ingesting the data. With a high-volume ingestion, you can now apply the power of Analytics query language to your own custom data.

As always, feel free to send us your questions or feedback by using one of the following channels:

· Try Application Analytics

· Suggest ideas and vote in Application Insights ideas

· Join the conversation at the Application Insights Community
Quelle: Azure

Azure SQL hybrid data movement

As is getting more and more popular today, many companies are choosing to deploy a hybrid environment using a mix of on-premises data center and public cloud. It gives the businesses greater flexibility and more data deployment options. For example, a company can host business critical or sensitive data in on-premises data centers and deploy less-critical data or test and development environment in the public cloud. A hybrid cloud environment will also help large companies to migrate on-premises data center to cloud in multiple stages without interfering with the business.

Moving data around efficiently in a hybrid cloud environment is critical and challenging. In this blog, we are going to introduce options in different data movement scenarios built on top of on-premises SQL Server, Azure SQL VMs and Azure SQL Databases:

Migrate data from on-premises SQL Server to Azure
Replicate data for business continuity
Replicate data to scale out read-only workload
Replicate data to refresh dev-test environment
Distribute referencing data or multi-master
Backup and restore data
Migrate cold data from on-premises SQL Server to Azure
Move data into data warehouse
Move data into big data platform
Move data from other data platforms

We are going to mention the following technologies and tools in this blog:

Export and import .bacpac files
bcp
Transactional replication, including peer-to-peer transaction replication
Merge replication
SQL Server backup and restore, including managed backup and file snapshot backup
Always On availability groups
Data Migration Assistant (DMA)
Azure SQL Data Sync
SQL Server Integration Services (SSIS)
Azure SQL Database copy
Azure Data Factory (ADF)
SQL Server Migration Assistant (SSMA)
Attunity CDC for SSIS
SQL Server Stretch Database

The goal of this blog is to help you to choose the right technologies and tools to implement different scenarios. Implementation details and step by step instructions will not be covered in this blog, however we will provide links to related resources.

Migrate data from on-premises SQL Server to Azure

When you are migrating existing data from on-premises SQL Server databases to Azure, there are a few key facts you should measure and consider:

Azure SQL Databases (PaaS) or Azure SQL VM (IaaS), which is the better option? This is out of scope of our topic today. Please see “Choose a cloud SQL Server option: Azure SQL (PaaS) Database or SQL Server on Azure VMs (IaaS)” for more details.
How many databases are you going to migrate? How large are they?
How much downtime can your service or application afford without significant business impact?

Azure SQL Databases

If you can afford some downtime, or if you are performing a test migration, you can use bacpac to migrate your databases to Azure SQL Databases. See the blog of “Migrating from SQL Server to Azure SQL Database using Bacpac Files” for detailed instructions.

When you migrate databases, especially large databases using bacpac, plan for a long enough application downtime. Depending on the database size, the downtime can be hours.

When you cannot afford to remove your databases from production during the migration, you can consider using transaction replication as the migration solution (SQL Azure Database as push subscriber). See the “Migration from SQL Server to Azure SQL Database Using Transactional Replication” and “Replication to SQL Database” for details, including the limitations of transaction replication.

See “SQL Server database migration to SQL Database in the cloud” for more about migration to Azure SQL Databases.

Azure SQL VM

If you decide to migrate and host your data in Azure SQL VM, you will have several more options, including creating a Always On replica, backup/restore, etc. See “Migrate a SQL Server database to SQL Server in an Azure VM” for more details.

You can also use DMA (Data Migration Assistant) to migrate on-premises SQL Server databases to Azure SQL VM. DMA can migrate not only data, but also other server objects like logins, users and roles. DMA can also be used to detect compatibility issues before the migration. See the document of “Data Migration Assistant (DMA)” for more details.

Replicate data for business continuity

Disruptive events can happen in any data platform and cause data loss or your databases and application to become unavailable. Capability of fast recovery from data loss or database downtime is important for business continuity, especially for business-critical databases and applications.

Azure SQL Databases

Azure SQL Databases automatically have more than one copy created to ensure the high availability (99.99% availability SLA). To prevent or mitigate business discontinuity in event of a data center outage, you can either create Active Geo-replication or restore database from geo-redundant backup.

Active geo-replication will provide minimum downtime and data loss during the data center outage. It can also be used to scale out read-only workloads (will discuss in next scenario); however, this will introduce extra cost to have active geo-replication replicas. Consider this option for business-critical databases. See “Overview: SQL Database Active Geo-Replication” for more details about active geo-replication.

You can also recover your database from geo-redundant backup only when necessary. It will introduce longer recovery time and more data loss. It provides business continuity during data center outage with lower cost. See “Recover an Azure SQL database using automated database backups” for more details:

You can find more details about SQL Azure Databases business continuity in “Overview of business continuity with Azure SQL Database”.

Azure SQL VM

For SQL VM, you can setup Always On availability group or failover cluster instance to manage the downtime during VM reboot or outage.

See “High availability and disaster recovery for SQL Server in Azure Virtual Machines” for more details.

SQL VM as DR solution for on-premises SQL Server

To create a DR site without building a data center in another region, you can extend on-premises Availability Groups to Azure by provisioning one or more Azure SQL VMs and then adding them as replicas to your on-premises Availability Group. See section “Hybrid IT: Disaster recovery solution” in “High availability and disaster recovery for SQL Server in Azure Virtual Machines” for more details.

Replicate data to scale out for read-only workload

In many systems, certain applications only need to read the information from the databases. For example, in the information publishing service, only the publisher need to update the data, and all subscribers only need to read the data. To offload the primary database, you can replicate data and redirect read-only workload to other replicas.

Azure SQL Databases

In addition of providing business continuity in event of disaster, Active Geo-replication also can be used to offload read-only workloads such as reporting jobs to the secondary databases. If you only intend to use the secondary databases for load balancing, you can create the secondary databases in the same region.

See “Overview: SQL Database Active Geo-Replication” for more details about Active Geo-replication.

SQL VMs and on-premises SQL Server

To scale out SQL VM or on-premises SQL Server, you can build readable Always On replicas. Consider to build the replica in the same region unless the read traffic is going to happen in a different region.

Replicate data to refresh dev-test environment

Before system upgrade or deployment of a new system, you may want to test it using a copy of the production data. Building a separate dev-test environment will help you to run the test without impact on your production environment.

Azure SQL Databases

To create a replication of the live production environment in Azure DB for dev-test environment, you can use database copy.

If you want to replicate a snapshot of production environment with old data within retention period (35 days for Standard and Premium; 7 days of Basic), you can restore the database to the point in time you want.

Azure SQL VM and on-premises SQL Server

To replicate data from Azure Databases to on-premises or Azure SQL VM, you can export the database into bacpac and import into SQL Server running in Azure VM or on-premises. If you only want to replicate specific tables instead of the whole database, you can run SqlPackage in Azure VM. See “Export an Azure SQL database or a SQL Server database to a BACPAC file” for more details.

Distributing referencing data/Multi-master

International ISVs and corporates usually have clients or branches in different countries or regions. To reduce the performance impact from network latency, they want some business referencing data, like SKUs, user information, etc., distributed to Azure SQL or SQL Server databases. In a typical scenario, a central database will host all reference data and distribute it to different clients or branches. The clients or branches can also update the reference data locally and push the change back to the central database.

Azure SQL Data Sync can be used to implement the data distribution between on-premises SQL Server, Azure SQL VM and Azure SQL databases, in uni-direction or bi-direction. See “Getting Started with Azure SQL Data Sync (Preview)” to learn more about Azure Data Sync.

Azure Data Sync is now only available in old Azure portal. It will be available in new Azure portal very soon. See the blog of “Azure Data Sync Update” for more details.

On-premises or Azure SQL VMs to Azure SQL Databases

When you are design and implement such a distributed multi-master system, especially cross internet, you should consider to shard the database and only sync data when necessary to reduce the latency.

If the central database is hosted on-premises or in Azure SQL VM, you can use transactional replication to distribute data. Azure SQL databases can be configured as push subscribers and replicate data from the publisher, the central database. Transaction replication can only replicate data in one-way.

Between on-premises or Azure SQL VMs

If all your data in hosted in on-premises SQL Server or Azure SQL VM, and you need to sync data in bi-direction, except using Azure SQL Data Sync, you can setup either Peer-to-peer replication or Merge replication.

If you can design your application or service to ensure certain rows will be modified only in one node, peer-to-peer replication is recommended. If the application requires sophisticated conflict detection and resolution capacities, use merge replication. Merge replication introduces more complex topology and higher maintenance cost comparing to other sync and replication technologies. Only use Merge replication when other technologies can’t solve your problem.

Peer-to-peer replication and Merge replication will not work on Azure SQL Database, as neither publisher nor subscriber.

See Peer to peer transaction replication and Merge replication for more details.

Backup and restore data

Backing up your database is essential for protecting your data. It allows you to recover data from accidental corruption or deletion. We always recommend to store backup files separately from the databases. The hybrid environment will allow you to implement this easily.

Azure SQL Databases

Azure SQL Databases automatically backup the databases at no additional charge. You can restore the database to any point in time during the retention (7 days for Basic and 35 days for Standard and Premium). All backup files are replicated to multiple copies also to a different region. See “Learn about SQL Database backups” for more details.

If your business requires longer retention for backup, you can configure long term retention backup to keep the backup up to 10 years. See “Storing Azure SQL Database Backups for up to 10 years” for more details.

If you want to restore the database to on-premises SQL Server or store the backup in your own local file system or other cloud platforms, you can export the database into a bacpac file. You can use Azure automation to schedule the backup periodically. The sample script is provided in the following document: “Export an Azure SQL database or a SQL Server database to a BACPAC file”.

Azure SQL VMs

If you are running SQL Server on Azure VM, we recommend you to backup your databases to Azure storage (Backup to URL). This feature is supported in SQL Server 2012 SP1 CU2 or later version.

You can also configure the Managed Backup to enable automated backup management. The feature is available in SQL Server 2014 or later version. In SQL Server 2016, it supports custom schedule. See “Backup and Restore for SQL Server in Azure Virtual Machines” for more details.

To ensure the availability of backup files in event of disaster or data center outage, we recommend you use GRS or RA-GRS storage to store the backup files. It will automatically replicate your backup files to a different region. See “Azure Storage replication” for more details.

If you are running SQL Server 2016 and host data files in Azure storage, another option is to use file snapshot backup. It provides near instantaneous backup and rapid restores for database files stored in Azure storage. See “File-Snapshot Backups for Database Files in Azure” for more details.

On-premises SQL Server

We always recommend users to store backup files in a different location from database files. If you are running SQL Server 2014 or later version on-premises, you can use the Backup to URL feature to back up your database or transaction log directly to Azure storage. However, due to the network latency, backing up to and restoring from Azure storage may introduce lower throughput than the same operation against local disk. See “SQL Server Backup and Restore with Windows Azure Blob Storage Service” for more details.

Migrate cold data from on-premises SQL Server to Azure

As the database size is fast growing, managing and storing historical or cold data efficiently becomes a big problem. You can use Stretch Database to migrate the cold data from on-premises SQL Server to Azure and keep the data online. See “Introduction to Stretch Database” for more details about Stretch Database.

Move data into data warehouse

Companies periodically move data from OLTP system to OLAP system/data warehouse for data analysis and reporting. The process usually includes extracting data from data source, transformation and loading data into the target data warehouse (ETL).

The data warehouse and OLAP system can be built on top of Azure SQL Databases, Azure SQL VMs or Azure SQL Data Warehouse.

Most on-premises SQL Server customers use SSIS (SQL Server Integration Services) to load data into data warehouse. It extracts data from OLTP system, transforms it, and loads it into data warehouses. SSIS can also be used in the hybrid environment. For exiting SSIS customers, it will reduce the cost to build a new ETL process. If you are loading data into SQL VM or SQL Databases, we recommend running SSIS on Azure VM. See the blog of “Running SSIS on Azure VM (IaaS) – Do more with less money” for more information.

If you are moving data into Azure Data Warehouse, you can also use ADF (Azure Data Factory) or bcp as the loading tools. See “SQL Data Warehouse Migrate Your Data” and “Use Azure Data Factory with SQL Data Warehouse” for details.

Move data into Azure Big Data Stores for Advanced Analytics

To build big data advanced analytics solution in Azure, users need to load online transactional data and other reference data from various data sources into a modernized multi-platform data warehouse, which usually consists of Azure Blob/Azure Data Lake as the staging area, perform transformation activities such as HIVE/PIG/Spark, and finally load “cooked” data into Azure Data Warehouse for BI and reporting.

If you want to move your data into Azure and build advanced analytics solution on top of it, you can use ADF (Azure Data Factory), which is a fully managed data integration service that orchestrates the movement and transformation of data.  Please see “Introduction to Azure Data Factory Service, a data integration service in the cloud” for more details about ADF.

If you are an existing SSIS user, SSIS is another option you can use to load data into big data stores such as Azure Storage Blob and Azure Data Lake Store.  You will need the Azure feature pack to load data into Azure.

Move data from other data platforms

If you want to migrate databases from other DBMSs (Oracle, MySQL, etc…) to Azure SQL Databases or Azure SQL VM, you can use SSMA (SQL Server Migration Assistant) as a helpful migration tool. See “SQL Server Migration Assistant” for more details.

If you want to continuously move data from other DBMSs or other data format like flat files to Azure SQL Databases or Azure SQL VM, or you need to do some transformation before loading data into Azure, you can use SSIS. Attunity CDC for SSIS or SQL Server CDC for Oracle by Attunity provides end to end operational data replication solution. For more information, see “Attunity CDC for SSIS” and “SQL Server 2012 CDC for Oracle – a Review of One Implementation”.

Summary

In this blog, we discussed how to choose the right technologies and tools for different hybrid data movement scenarios. It is just a starting point and general guidance for these use cases. You will still need to evaluate different solutions based on your business needs.

If you have any further question, please post it in MSDN forum.

If you have any feedback to Azure SQL Databases or Azure SQL VMs, please submit it in https://feedback.azure.com.
Quelle: Azure

Azure SQL hybrid data movement

As is getting more and more popular today, many companies are choosing to deploy a hybrid environment using a mix of on-premises data center and public cloud. It gives the businesses greater flexibility and more data deployment options. For example, a company can host business critical or sensitive data in on-premises data centers and deploy less-critical data or test and development environment in the public cloud. A hybrid cloud environment will also help large companies to migrate on-premises data center to cloud in multiple stages without interfering with the business.

Moving data around efficiently in a hybrid cloud environment is critical and challenging. In this blog, we are going to introduce options in different data movement scenarios built on top of on-premises SQL Server, Azure SQL VMs and Azure SQL Databases:

Migrate data from on-premises SQL Server to Azure
Replicate data for business continuity
Replicate data to scale out read-only workload
Replicate data to refresh dev-test environment
Distribute referencing data or multi-master
Backup and restore data
Migrate cold data from on-premises SQL Server to Azure
Move data into data warehouse
Move data into big data platform
Move data from other data platforms

We are going to mention the following technologies and tools in this blog:

Export and import .bacpac files
bcp
Transactional replication, including peer-to-peer transaction replication
Merge replication
SQL Server backup and restore, including managed backup and file snapshot backup
Always On availability groups
Data Migration Assistant (DMA)
Azure SQL Data Sync
SQL Server Integration Services (SSIS)
Azure SQL Database copy
Azure Data Factory (ADF)
SQL Server Migration Assistant (SSMA)
Attunity CDC for SSIS
SQL Server Stretch Database

The goal of this blog is to help you to choose the right technologies and tools to implement different scenarios. Implementation details and step by step instructions will not be covered in this blog, however we will provide links to related resources.

Migrate data from on-premises SQL Server to Azure

When you are migrating existing data from on-premises SQL Server databases to Azure, there are a few key facts you should measure and consider:

Azure SQL Databases (PaaS) or Azure SQL VM (IaaS), which is the better option? This is out of scope of our topic today. Please see “Choose a cloud SQL Server option: Azure SQL (PaaS) Database or SQL Server on Azure VMs (IaaS)” for more details.
How many databases are you going to migrate? How large are they?
How much downtime can your service or application afford without significant business impact?

Azure SQL Databases

If you can afford some downtime, or if you are performing a test migration, you can use bacpac to migrate your databases to Azure SQL Databases. See the blog of “Migrating from SQL Server to Azure SQL Database using Bacpac Files” for detailed instructions.

When you migrate databases, especially large databases using bacpac, plan for a long enough application downtime. Depending on the database size, the downtime can be hours.

When you cannot afford to remove your databases from production during the migration, you can consider using transaction replication as the migration solution (SQL Azure Database as push subscriber). See the “Migration from SQL Server to Azure SQL Database Using Transactional Replication” and “Replication to SQL Database” for details, including the limitations of transaction replication.

See “SQL Server database migration to SQL Database in the cloud” for more about migration to Azure SQL Databases.

Azure SQL VM

If you decide to migrate and host your data in Azure SQL VM, you will have several more options, including creating a Always On replica, backup/restore, etc. See “Migrate a SQL Server database to SQL Server in an Azure VM” for more details.

You can also use DMA (Data Migration Assistant) to migrate on-premises SQL Server databases to Azure SQL VM. DMA can migrate not only data, but also other server objects like logins, users and roles. DMA can also be used to detect compatibility issues before the migration. See the document of “Data Migration Assistant (DMA)” for more details.

Replicate data for business continuity

Disruptive events can happen in any data platform and cause data loss or your databases and application to become unavailable. Capability of fast recovery from data loss or database downtime is important for business continuity, especially for business-critical databases and applications.

Azure SQL Databases

Azure SQL Databases automatically have more than one copy created to ensure the high availability (99.99% availability SLA). To prevent or mitigate business discontinuity in event of a data center outage, you can either create Active Geo-replication or restore database from geo-redundant backup.

Active geo-replication will provide minimum downtime and data loss during the data center outage. It can also be used to scale out read-only workloads (will discuss in next scenario); however, this will introduce extra cost to have active geo-replication replicas. Consider this option for business-critical databases. See “Overview: SQL Database Active Geo-Replication” for more details about active geo-replication.

You can also recover your database from geo-redundant backup only when necessary. It will introduce longer recovery time and more data loss. It provides business continuity during data center outage with lower cost. See “Recover an Azure SQL database using automated database backups” for more details:

You can find more details about SQL Azure Databases business continuity in “Overview of business continuity with Azure SQL Database”.

Azure SQL VM

For SQL VM, you can setup Always On availability group or failover cluster instance to manage the downtime during VM reboot or outage.

See “High availability and disaster recovery for SQL Server in Azure Virtual Machines” for more details.

SQL VM as DR solution for on-premises SQL Server

To create a DR site without building a data center in another region, you can extend on-premises Availability Groups to Azure by provisioning one or more Azure SQL VMs and then adding them as replicas to your on-premises Availability Group. See section “Hybrid IT: Disaster recovery solution” in “High availability and disaster recovery for SQL Server in Azure Virtual Machines” for more details.

Replicate data to scale out for read-only workload

In many systems, certain applications only need to read the information from the databases. For example, in the information publishing service, only the publisher need to update the data, and all subscribers only need to read the data. To offload the primary database, you can replicate data and redirect read-only workload to other replicas.

Azure SQL Databases

In addition of providing business continuity in event of disaster, Active Geo-replication also can be used to offload read-only workloads such as reporting jobs to the secondary databases. If you only intend to use the secondary databases for load balancing, you can create the secondary databases in the same region.

See “Overview: SQL Database Active Geo-Replication” for more details about Active Geo-replication.

SQL VMs and on-premises SQL Server

To scale out SQL VM or on-premises SQL Server, you can build readable Always On replicas. Consider to build the replica in the same region unless the read traffic is going to happen in a different region.

Replicate data to refresh dev-test environment

Before system upgrade or deployment of a new system, you may want to test it using a copy of the production data. Building a separate dev-test environment will help you to run the test without impact on your production environment.

Azure SQL Databases

To create a replication of the live production environment in Azure DB for dev-test environment, you can use database copy.

If you want to replicate a snapshot of production environment with old data within retention period (35 days for Standard and Premium; 7 days of Basic), you can restore the database to the point in time you want.

Azure SQL VM and on-premises SQL Server

To replicate data from Azure Databases to on-premises or Azure SQL VM, you can export the database into bacpac and import into SQL Server running in Azure VM or on-premises. If you only want to replicate specific tables instead of the whole database, you can run SqlPackage in Azure VM. See “Export an Azure SQL database or a SQL Server database to a BACPAC file” for more details.

Distributing referencing data/Multi-master

International ISVs and corporates usually have clients or branches in different countries or regions. To reduce the performance impact from network latency, they want some business referencing data, like SKUs, user information, etc., distributed to Azure SQL or SQL Server databases. In a typical scenario, a central database will host all reference data and distribute it to different clients or branches. The clients or branches can also update the reference data locally and push the change back to the central database.

Azure SQL Data Sync can be used to implement the data distribution between on-premises SQL Server, Azure SQL VM and Azure SQL databases, in uni-direction or bi-direction. See “Getting Started with Azure SQL Data Sync (Preview)” to learn more about Azure Data Sync.

Azure Data Sync is now only available in old Azure portal. It will be available in new Azure portal very soon. See the blog of “Azure Data Sync Update” for more details.

On-premises or Azure SQL VMs to Azure SQL Databases

When you are design and implement such a distributed multi-master system, especially cross internet, you should consider to shard the database and only sync data when necessary to reduce the latency.

If the central database is hosted on-premises or in Azure SQL VM, you can use transactional replication to distribute data. Azure SQL databases can be configured as push subscribers and replicate data from the publisher, the central database. Transaction replication can only replicate data in one-way.

Between on-premises or Azure SQL VMs

If all your data in hosted in on-premises SQL Server or Azure SQL VM, and you need to sync data in bi-direction, except using Azure SQL Data Sync, you can setup either Peer-to-peer replication or Merge replication.

If you can design your application or service to ensure certain rows will be modified only in one node, peer-to-peer replication is recommended. If the application requires sophisticated conflict detection and resolution capacities, use merge replication. Merge replication introduces more complex topology and higher maintenance cost comparing to other sync and replication technologies. Only use Merge replication when other technologies can’t solve your problem.

Peer-to-peer replication and Merge replication will not work on Azure SQL Database, as neither publisher nor subscriber.

See Peer to peer transaction replication and Merge replication for more details.

Backup and restore data

Backing up your database is essential for protecting your data. It allows you to recover data from accidental corruption or deletion. We always recommend to store backup files separately from the databases. The hybrid environment will allow you to implement this easily.

Azure SQL Databases

Azure SQL Databases automatically backup the databases at no additional charge. You can restore the database to any point in time during the retention (7 days for Basic and 35 days for Standard and Premium). All backup files are replicated to multiple copies also to a different region. See “Learn about SQL Database backups” for more details.

If your business requires longer retention for backup, you can configure long term retention backup to keep the backup up to 10 years. See “Storing Azure SQL Database Backups for up to 10 years” for more details.

If you want to restore the database to on-premises SQL Server or store the backup in your own local file system or other cloud platforms, you can export the database into a bacpac file. You can use Azure automation to schedule the backup periodically. The sample script is provided in the following document: “Export an Azure SQL database or a SQL Server database to a BACPAC file”.

Azure SQL VMs

If you are running SQL Server on Azure VM, we recommend you to backup your databases to Azure storage (Backup to URL). This feature is supported in SQL Server 2012 SP1 CU2 or later version.

You can also configure the Managed Backup to enable automated backup management. The feature is available in SQL Server 2014 or later version. In SQL Server 2016, it supports custom schedule. See “Backup and Restore for SQL Server in Azure Virtual Machines” for more details.

To ensure the availability of backup files in event of disaster or data center outage, we recommend you use GRS or RA-GRS storage to store the backup files. It will automatically replicate your backup files to a different region. See “Azure Storage replication” for more details.

If you are running SQL Server 2016 and host data files in Azure storage, another option is to use file snapshot backup. It provides near instantaneous backup and rapid restores for database files stored in Azure storage. See “File-Snapshot Backups for Database Files in Azure” for more details.

On-premises SQL Server

We always recommend users to store backup files in a different location from database files. If you are running SQL Server 2014 or later version on-premises, you can use the Backup to URL feature to back up your database or transaction log directly to Azure storage. However, due to the network latency, backing up to and restoring from Azure storage may introduce lower throughput than the same operation against local disk. See “SQL Server Backup and Restore with Windows Azure Blob Storage Service” for more details.

Migrate cold data from on-premises SQL Server to Azure

As the database size is fast growing, managing and storing historical or cold data efficiently becomes a big problem. You can use Stretch Database to migrate the cold data from on-premises SQL Server to Azure and keep the data online. See “Introduction to Stretch Database” for more details about Stretch Database.

Move data into data warehouse

Companies periodically move data from OLTP system to OLAP system/data warehouse for data analysis and reporting. The process usually includes extracting data from data source, transformation and loading data into the target data warehouse (ETL).

The data warehouse and OLAP system can be built on top of Azure SQL Databases, Azure SQL VMs or Azure SQL Data Warehouse.

Most on-premises SQL Server customers use SSIS (SQL Server Integration Services) to load data into data warehouse. It extracts data from OLTP system, transforms it, and loads it into data warehouses. SSIS can also be used in the hybrid environment. For exiting SSIS customers, it will reduce the cost to build a new ETL process. If you are loading data into SQL VM or SQL Databases, we recommend running SSIS on Azure VM. See the blog of “Running SSIS on Azure VM (IaaS) – Do more with less money” for more information.

If you are moving data into Azure Data Warehouse, you can also use ADF (Azure Data Factory) or bcp as the loading tools. See “SQL Data Warehouse Migrate Your Data” and “Use Azure Data Factory with SQL Data Warehouse” for details.

Move data into Azure Big Data Stores for Advanced Analytics

To build big data advanced analytics solution in Azure, users need to load online transactional data and other reference data from various data sources into a modernized multi-platform data warehouse, which usually consists of Azure Blob/Azure Data Lake as the staging area, perform transformation activities such as HIVE/PIG/Spark, and finally load “cooked” data into Azure Data Warehouse for BI and reporting.

If you want to move your data into Azure and build advanced analytics solution on top of it, you can use ADF (Azure Data Factory), which is a fully managed data integration service that orchestrates the movement and transformation of data.  Please see “Introduction to Azure Data Factory Service, a data integration service in the cloud” for more details about ADF.

If you are an existing SSIS user, SSIS is another option you can use to load data into big data stores such as Azure Storage Blob and Azure Data Lake Store.  You will need the Azure feature pack to load data into Azure.

Move data from other data platforms

If you want to migrate databases from other DBMSs (Oracle, MySQL, etc…) to Azure SQL Databases or Azure SQL VM, you can use SSMA (SQL Server Migration Assistant) as a helpful migration tool. See “SQL Server Migration Assistant” for more details.

If you want to continuously move data from other DBMSs or other data format like flat files to Azure SQL Databases or Azure SQL VM, or you need to do some transformation before loading data into Azure, you can use SSIS. Attunity CDC for SSIS or SQL Server CDC for Oracle by Attunity provides end to end operational data replication solution. For more information, see “Attunity CDC for SSIS” and “SQL Server 2012 CDC for Oracle – a Review of One Implementation”.

Summary

In this blog, we discussed how to choose the right technologies and tools for different hybrid data movement scenarios. It is just a starting point and general guidance for these use cases. You will still need to evaluate different solutions based on your business needs.

If you have any further question, please post it in MSDN forum.

If you have any feedback to Azure SQL Databases or Azure SQL VMs, please submit it in https://feedback.azure.com.
Quelle: Azure

Announcing Azure Service Fabric 5.5 and SDK 2.5

Customers around the world are delivering their mission critical business applications as always-on, scalable, and distributed services built using Azure Service Fabric. Last week we rolled out Azure Service Fabric 5.5 to Azure clusters in 26 regions across the world. Today, we’re excited to announce the release of version 2.5 of the Azure Service Fabric SDK and the corresponding 5.5 release of the Azure Service Fabric runtime and standalone Windows Server installer.

If you&;re using Visual Studio 2017, the Service Fabric tools are built in, so you&039;ll only need to install the Microsoft Azure Service Fabric SDK. If you&039;re using Visual Studio 2015, install the Microsoft Azure Service Fabric SDK and Tools.*

Get the new standalone package for Windows Server.

This release has a number of great new features along with the usual bug fixes and optimizations. Here are a few highlights of this release, in no particular order:

Support for compressed application packages for faster image store upload

Previously, application packages were always a directory structure. While this format was simple to edit it could occasionally result in application packages that were quite large. This can be problematic when copying and registering in a Service Fabric cluster, especially on slower connections or across larger distances. In this release we have added support for compressed packages prior to upload to the cluster. 

Improved upgrade behavior to catch additional errors during upgrade and improve deployment safety

In this release, we’ve increased the default health check duration between upgrade domains so that the automated upgrade rollback function has a chance to catch a wider range of errors. This makes upgrades slightly longer, but much safer by default.

We also improved the health evaluation of entities by checking that they have at least one report from their system authority component. This ensures that the health store view is consistent with the state of the system as viewed by the authority components, adding to even greater upgrade safety.

ASP.NET Core integration

Integration with ASP.NET Core is now fully supported in both stateless and stateful Reliable Services, available as add-on NuGet packages. These packages allow you to easily bootstrap an ASP.NET Core web application in a stateless or stateful service using either Kestrel or WebListener. The integration also features custom Service Fabric middleware designed to help handle service resolution when connecting to an ASP.NET Core Service Fabric service. Learn more about ASP.NET Core in Service Fabric.

Refresh application debug mode in Visual Studio 2015 (Preview this release)

In conjunction with ASP.NET Core integration support, we’ve added a new application debug mode to the Service Fabric Application project. Refresh Mode allows you to quickly iterate between writing code and debugging and supports edit and refresh for ASP.NET Core services, so you can now develop ASP.NET Core services in Service Fabric the same way you would outside of Service Fabric.
Note that Refresh Mode is a preview feature in this release. Refresh Mode will also be available in Visual Studio 2017 soon.

.NET Core support using csproj project system in Visual Studio 2017

Service Fabric services for .NET Core now supports the new simplified .csproj project system in Visual Studio 2017. Migrating to csproj from existing xproj projects is also supported, but is a one-way migration.

For more details on these features and others, along with bug fixes and known issues, please see the detailed release notes.

*Note that there is a known issue causing occasional failures when following WebPI links in Google Chrome. If you run into this, either try the link in another browser or launch the WebPI client directly and search for Service Fabric.
Quelle: Azure

Azure Resource Manager template reference now available

We have published new documentation for creating Azure Resource Manager templates. The documentation includes reference content that presents the JSON syntax and property values you need when adding resources to your templates.

If you are new to Resource Manager and templates, see Azure Resource Manager overview for an introduction to the terms and concepts of Azure Resource Manager.

Simplify template creation by copying JSON directly into your template

The template reference documentation helps you understand what resource types are available, and what values to use in your template. It includes the API version number to use for each resource type, and all the valid properties. You simply copy the provided JSON into the resources section of your template, and edit the values for your scenario.

The property tables describe the available values.

Find a resource type

You can easily navigate through the available types in the left pane. However, if you know the resource type, you can go directly to it with the following URL format:

https://docs.microsoft.com/azure/templates/{provider-namespace}/{resource-type}

For example, the SQL database reference content is available at:

https://docs.microsoft.com/azure/templates/microsoft.sql/servers/databases

Please give us your feedback

The template reference content represents a new type of documentation for docs.microsoft.com. As you use it to build your templates, let us know how it can be improved. Please provide feedback about your experience.
Quelle: Azure