Anomaly detection using streaming analytics & AI

An organization’s ability to quickly detect and respond to anomalies is critical to success in a digitally transforming culture. Google Cloud customers can strengthen this ability by using rich artificial intelligence and machine learning (AI/ML) capabilities in conjunction with an enterprise-class streaming analytics platform. We refer to this combination of fast data and advanced analytics as real-time AI. There are many applications for real-time AI across businesses, including anomaly detection, video analysis, and forecasting. In this post, we walk through a real-time AI pattern for detecting anomalies in log files. By analyzing and extracting features from network logs, we helped a telecommunications (telco) customer build a streaming analytics pipeline to detect anomalies. We also discuss how you can adapt this pattern to meet your organization’s real-time needs.How anomaly detection can help your businessAnomaly detection allows companies to identify, or even predict, abnormal patterns in unbounded data streams. Whether you are a large retailer identifying positive buying behaviors, a financial services provider detecting fraud, or a telco company identifying and mitigating potential threats, behavioral patterns that provide useful insights exist in your data. Enabling real-time anomaly detection for a security use caseFor telco customers, protecting their wireless networks from security threats is critical. By 2022, mobile data traffic is expected to reach 77.5 exabytes per month worldwide at a compound annual growth rate of 46%. This explosion of data increases the risk of attacks from unknown sources and is driving telco customers to look for new ways to detect threats, such as using machine learning techniques.  A signature-based pattern has been the primary technique used by many customers. In a signature based pattern, network traffic is investigated by comparing against repositories of signatures extracted from malicious objects. Although this technique works well for known threats, it is difficult to detect new attacks because no pattern or signature is available. In this blog, we walk through building a machine learning-based network anomaly detection solution by highlighting the following key components:Generating synthetic data to simulate production volume using Dataflow and Pub/Sub.Extracting features and real time prediction using Dataflow. Training and normalizing data using BigQuery ML’s built-in k-means clustering model.De-identifying sensitive data using Dataflow and Cloud DLP.Figure 1: Reference Architecture For a Real-Time Anomaly Detection SolutionGenerating synthetic NetFlow log using Dataflow and Pub/Sub Let’s start with synthetic data generation, which maps to the Ingest/Trigger section in figure 1. For illustrative purposes, we simulated NetFlow log by using an open source data generator pipeline. Figure 2: Pipeline Publishing Synthetic NetFlow Log Data at 250k msg/secIn figure 2, you can see that this pipeline of simulated data is publishing data at 250k elements/sec. Extracting features and tokenizing sensitive data using Dataflow and Cloud DLPWe have open sourced an anomaly detection pipeline that aggregates and ingests 150 GB of data in a 10-minute window. First, we find the subnet of the destination IP, dstSubnet. After that, to extract some basic features, we aggregate data by both destination subnets and subscriber ID. Using Apache Beam transforms, this pipeline first converts JSON messages to a Row type and then aggregates data using the schema. As you can see from the Feature Extraction Using Beam Schema Inferring snippet, you can extract sample features by using built-in Apache Beam Java SDK aggregation transforms, such as Min, Max and ApproximateUnique functions.The subscriberId values in our dataset might contain PII, such as IMSI numbers. In order to avoid storing PII as plain text to BigQuery, we used Cloud DLP to de-identify IMSI numbers. We picked deterministic encryption where data can be de-identified (or tokenized) and re-identified (or de-tokenized) using the same CryptoKey. To minimize the frequency of calls to the Cloud DLP service and to fit within the limitations of DLP message size(0.5 MB), we built a microbatch approach by using Apache Beam’s state and timer API. This sample code shows how the request is buffered and emitted based on a batch size and an event time trigger. To meet concurrent requests for our data volume, we increased the default Cloud DLP API quota limit to 40,000 API calls per minute.Figure 3: Anomaly detection pipeline in DataflowTrain and normalize data using BigQuery MLWe then train and normalize data in BigQuery, as seen as the Store component in figure 1. To handle large volumes of daily data (20 TB), we used ingestion-time partitioned tables and clustering by the subscriberID and dstSubnet field. Storing data in a partitioned table allows us to quickly select training data using filters for days (e.g. 10 days) and a group of subscribers (e.g. users from organization X).We used the k-means clustering algorithm in BigQuery ML to train a model and create clusters. Since BigQuery ML enables training models by using standard SQL, to automate overall model creation and training processes, we used stored procedures and scheduled queries. We were able to create a k-means clustering model in less than 15 minutes for a terabyte-scale dataset. After experimenting with multiple cluster sizes, our model evaluation suggested that we used four. Next, we normalized the data by finding a normalized distance for each cluster.Realtime outlier detection using DataflowThe final step in our journey is to detect outliers, which is step 4 in the reference architecture in figure 1. To detect outliers in real-time, we extended the same pipeline used for feature extraction. First, we feed the normalized data to the pipeline as a side input. Now that we have normalized our data available from the pipeline, we find the nearest centroid by calculating the distance between the centroid and input vector. Lastly, to find outliers, we calculate how far the input vector is from the nearest centroid. If the distance is three standard deviations above the mean, as indicated in this diagram, we output those data points as outliers in a BigQuery table. To test if our model can successfully detect an anomaly, we manually published an outlier message. For the subscriber ID ‘000000000000000’, we used a higher number of transmission, 150000 bytes, and receiving, 40000 bytes, than the usual volumes. We then query the outlier table in BigQuery and we can see that subscriber ID is stored in a de-identified format, as expected, because of the chosen Cloud DLP transformation. To retrieve our original data in a secure Pub/Sub subscription, we use this data re-identification pipeline. As shown in figure, our original outlier subscriber ID (00000000000000000) is successfully re-identified.Insights appliedThe insights identified in an advanced, real-time pipeline are only as good as the improvement they enable within an organization. To make these insights actionable, you can enable dashboards for data storytelling, alerts for exception-based management, and actions for process streamlining or automatic mitigation. For anomaly detection, the anomalies identified can be immediately available in Looker as dashboard visualizations or used to trigger an alert or action when an anomalous condition is met. In the case of anomaly detection, you can use an action to create a ticket in a ticketing system for additional investigation and tracking.Figure 4: Looker dashboard to monitor outliers and take actionsSummaryReal-time AI solutions have the biggest impact when approached with the end-goal in mind (How will this help us meet our business goals?) and the flexibility to adapt as needs change (How do we quickly evolve as our goals, learnings, and environment change?). Whether you are a security team looking to better identify the unknown or a retailer hoping to better spot positive buying trends, Google Cloud has the tools required to turn that business need into a solution.In this blog, we showed you how you can build a secure, real-time anomaly detection solution using Dataflow, BigQuery ML and Cloud DLP. Although finding anomalies using well-defined probability distribution may not be completely accurate to solve adversarial use cases, it’s important to perform further analysis to confidently identify any security risks. If you’d like to give it a try, you can refer to this github repo for a reference implementation.
Quelle: Google Cloud Platform

The future of cloud as supply chain for new telco services

If there’s anything we’ve learned so far in 2020, it’s that no one can predict day to day what is going to happen next. Therefore, we believe technology should be there to help developers and operators build for agility and manage for change. This is especially true for our customers in the telecommunications industry. Communications service providers (CSPs) have seen their network resiliency and service delivery put to the test as students, enterprise users, and consumers have all shifted rapidly to digital-only engagements to learn, conduct business, and stay connected. The trick for CSPs, however, is enabling agility and accelerating innovation across a globally distributed network without having to manage underlying complexity.This topic and more are explored by Jennifer Lin, VP of Product Management and Chen Goldberg, Senior Director of Engineering, in “The Future of Tech” podcast hosted by Avishai Sharlin, Division President of Amdocs Technology, a Google Cloud partner and a leading provider of software and services to communications and media companies. They discussed the rise of 5G/edge data-driven services, the evolution of open-source technologies, and the role Anthos can play in driving application modernization and speed of innovation amongst Telcos.Cloud’s role in the supply chain for new 5G/edge servicesOne of the hot topics was 5G/edge computing and the role they can play in helping CSPs deliver more personalized, data-driven services to customers. However, telco networks have grown rather complex over the years. In order to unlock the potential of these technologies, CSPs may need to build more intelligent automation into their networks and remove some of that complexity.“This move from IT monolithic systems from single vendors to 5G/edge data-driven services is about delivering a customer experience,” said Lin. “The pace at which we can move in [the] cloud as the supply chain for new services is phenomenal.” Evolution of open-source and how Anthos speeds Telco innovationThere’s also been an enormous amount of change and development in the application layer over the past few years. Open-source technologies like Docker and Kubernetes helped developers achieve faster speed in innovation by making systems more composable and portable. However, according to Goldberg, there were still some things missing, and that was what drove the creation of Anthos.“We went from building a product like Google Kubernetes Engine, which was just the container orchestration manager experience to something like Anthos [because] we have seen that just the portability of workload[s] is not enough,“ said Goldberg. “Our customers actually want us to take control and give them a managed experience wherever they build. That really gives them that engineering velocity.”Furthermore, a few years ago, the industry was still missing a platform that would enable developers and IT to not only build but also manage applications consistently across on-premises data centers, cloud environments, and at the edge. A solution like this would be key because many CSPs still have a large percentage of their data residing on premises and they will likely continue to live on-premises. Therefore, Anthos for Telecom was also developed to help CSPs more easily manage day two operations for applications that run across mixed deployment environments. To learn more, we invite you to tune in for the full conversation on “The Future of Tech” podcast. And for additional information on how Google Cloud is working with strategic partners like Amdocs to deliver solutions to help CSPs modernize core OSS/BSS systems, harness data and analytics, and monetize on 5G/Edge, you can also check out the Google Cloud Next ‘20 OnAir session, “Accelerating telecommunications growth.”
Quelle: Google Cloud Platform

Docker’s sessions at KubeCon 2020

In a few weeks, August 17-20, lots of us at Docker in Europe were looking forward to hopping on the train down to Amsterdam for KubeCon CloudNativeCon Europe. But like every other event since March, this one is virtual so we will all be at home joining remotely. Most of the sessions are pre recorded with live Q&A, the format that we used at DockerCon 2020. As a speaker I really enjoyed this format at DockerCon, we got an opportunity to clarify and answer extra questions during the talk. It will be rather different from the normal KubeCon experience with thousands of people at the venue though!

Our talks

Chris Crone has been closely involved with the CNAB (Cloud Native Application Bundle) project since the launch in late 2018. He will be talking about how to Simplify Your Cloud Native Application Packaging and Deployments, and will explain why CNAB is a great tool for developers. Packaging up entire applications into self contained artifacts is a really useful tool, an extension of packaging up a single container. The tooling, especially Porter has been making a lot of progress recently so if you heard about CNAB before and are wondering what has been happening this talk is for you, or if you are new to CNAB.

On the subject of putting new things in registries, Silvin Lubecki and Djordje Lukic from our Paris team will be giving a talk about storing absolutely anything into a container registry, Sharing is Caring! Push Your Cloud Application to a Container Registry. The movement for putting everything into container registries is taking off now, once they were just for containers, but now we are seeing Helm charts and lots more cloud native artifacts being put into registries, but there are some difficulties which Silvin and Djordje will help you out with.

I am giving a talk about working in security, How to Work in Cloud Native Security, Demystifying the Security Role. Have you ever wanted to work in security? It is a really interesting field, with a real shortage of people, so if you are working in tech or about to start, I will talk about how to get into the field. It is actually surprisingly accessible and a fascinating field.

Since the last KubeCon, Docker, Microsoft, Amazon and many others have been working on a new version of Notary, the CNCF project that is a tool for signing containers. With Steve Lasker from Microsoft and Omar Paul from Amazon we will cover the current progress and the roadmap in the Intro and Update.

Finally I will be in the open CNCF meeting and public Q&A, which will be held live, along with Chris Aniszczyk, Liz Rice, Saad Ali, Michelle Noorali, Sheng Liang and Katie Gamanji. Come along and ask questions about the CNCF!

What about Docker Captains?

In addition, don’t miss the talks from the Docker Captains. Lee Calcote, is talking about the intricacies of service mesh performance and giving the introduction to the CNCF SIG Network. Adrian Mouat will be talking at the Cloud Native Security Day on day 0 of the conference, on Image Provenance and Security in Kubernetes.
The post Docker’s sessions at KubeCon 2020 appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/