Monitoring feature attributions: How Google saved one of the largest ML services in trouble

An emergency in the largest MLOps at GoogleClaudiu Gruia is a software engineer at Google who works on machine learning (ML) models that recommend content to billions of users daily. In Oct 2019, Claudiu was notified by an alert from a monitoring service. A specific model feature (let us call this feature F1) had reduced in importance. The importance of the feature is measured using the concept of Feature Attribution, the influence of the feature on the model’s predictions. This reduction in importance was associated with a large drop in the model’s accuracy.The attribution (feature importance) of the feature F1 dropped suddenlyIn response to the alert, Claudiu quickly retrained the model, and the two other features (F4 and F6 below) rose in importance, effectively substituting for F1, eliminating the drop in model quality. Had it not been for the alert and Claudiu’s quick fix, the user-experience of a large consumer service would have suffered.After the retraining, the feature F4 and F6 covered the F1 lossMonitoring without the ground truthSo what happened? F1 was a feature generated by a separate team. On further investigation, it was found that a certain infrastructure migration caused F1 to significantly lose coverage and consequently its attribution across examples.The easiest way to detect this kind of model failure is to track one or more model quality metrics (e.g., accuracy), and alert the developer if the metric drops below a threshold. But unfortunately, most model quality metrics rely on comparing the model’s prediction to “ground truth” labels which may not be available in real-time. For instance, in tasks such as fraud detection, credit lending or estimating conversion rates for online ads, the groundtruth for a prediction may lag by days, weeks or months. In the absence of the ground truth, ML engineers at Google rely on proxy measures of model quality degradations, derived using model inputs and predictions as two available observables. There are two main measures:Feature Distribution monitoring: detecting the skew and drift of feature distribution Feature Attribution monitoring: detecting the skew and drift of feature importance scoreIn the recent post Monitor models for training-serving skew with Vertex AI, we explored the first measure, Feature Distribution monitoring, for detecting any skew and anomalies happening in the feature itself at the serving time (in comparison to training or some other baseline). In the rest of this post, we discuss the second measure, Feature Attribution monitoring, which has also been successfully used to monitor large ML services at Google.Feature Attributions monitoringFeature Attributions is a family of methods for explaining a model’s predictions on a given input by attributing it to features of the individual inputs. The attributions are proportional to the contribution of the feature to the prediction. They are typically signed, indicating whether a feature helps push the prediction up or down. Finally, attributions across all features are required to add up to the model’s prediction score.(Photo by Dlanglois, CC BY-SA 3.0)Feature Attributions have been successfully used in the industry and also at Google to improve model transparency, debug models, and assess model robustness. Prominent algorithms for computing feature attributions include SHAP, Integrated Gradients and LIME. Each algorithm offers a slightly different set of properties. For an in-depth technical discussion, refer to our AI Explanations Whitepaper.An Example of Feature AttributionsMonitoring Feature AttributionsWhile feature distribution monitoring is a handy tool, it suffers from the following limitations: (1) Feature drift scores do not convey the impact the drift has on the model’s prediction (2) There is no unified drift measure that works across different feature types and representations (numeric, categorical, images, embeddings, etc.), (3) Feature drift scores do not account for drift in the correlation between features. To address this, on September 10th, Vertex Model Monitoring added new functionality to monitor feature attributions. In contrast to feature distribution monitoring, the key idea is to monitor the contribution of each feature to the prediction (i.e., attribution) during serving to report any significant drifts relative to training (or some other baseline). This has several notable benefits:Drift scores correspond to impact on predictions. A large change in attribution to a feature by definition means that the feature’s contribution to the prediction has changed. Since the prediction is equal to the sum of the feature contributions, large attribution drift is usually indicative of large drift in the model predictions. (But there may be false positives if the attribution drifts across features cancel out leading to negligible prediction drift. For more discussion on false positives and false negatives, please see Note #1)Uniform analysis units across feature representations. Feature attributions are always numeric, regardless of the underlying feature type. Moreover, due to their additive nature, attributions to a multi-dimensional feature (e.g., embeddings) can be reduced to a single numeric value by adding up the attributions across dimensions. This allows using standard univariate drift detection methods for all feature types.  Account for feature interactions. Attributions account for the feature’s contribution to the prediction, both individually and via interactions with other features. Thus, distribution of feature attributions may change, even if the marginal distribution of the feature does not change but its correlation with the features it interacts with changes.Monitor feature groups. Since attributions are additive, we can add up attributions to related features to obtain attribution to a feature group. For instance, in a house pricing model, we can combine the attribution to all features pertaining to the location of the house (e.g., city, school district) into a single value. This group-level attribution can then be tracked to monitor for changes in the feature group.Track importances across model updates. Monitoring attributions across model retraining helps understanding how the relative importance of a feature changes with model retraining. For instance, in the example mentioned in the beginning, we noticed that features F4 and F6 stepped up in importance after retraining.Enabling the serviceVertex Model Monitoring now supports Feature AttributionsOnce a prediction endpoint is up and running, you can turn on skew or drift detection for both Feature Distibution and Feature Attributions by running a single gcloud command like the following; no need for any pre-processing or extra setup tasks.Here are the key parameters:emails: The email addresses to which you would like monitoring alerts to be sentendpoint: the prediction endpoint ID to be monitoredprediction-sampling-rate: This parameter controls the fraction of the incoming prediction requests that are logged and analyzed for monitoring purposesfeature-thresholds: Specify which input features to monitor Feature Distribution, along with the alerting threshold for each feature. feature-attribution-thresholds: Specify which input features to monitor Feature Attributions, along with the alerting threshold for each feature. You can also use the Console UI of setup the monitoring when creating a new Endpoint:Using Console UI to set up a Feature Attributions and Feature Distribution monitoringFor the detailed instructions on how to set up the monitoring, please refer to the documentation. After enabling it, you would see some alerts on the console like below whenever any feature attribution skews or drifts are detected, and also receive an email for the same. The Ops engineer can then take appropriate corrective action.Example: The feature attribution of “cigsPerDay” has crossed the alert thresholdDesign choicesLastly, we go over two important technical considerations involved in designing feature attributions monitoring.Selecting the prediction class for attribution. In case of classification models, feature attributions are specific to an input and prediction class. When monitoring a distribution of inputs, which prediction class must be used for computing attributions? We recommend using the class that is considered as the prediction decision for the input. For multiclass models, this is usually the class with the largest score (i.e., “argmax” class). In some cases there is a specific protagonist class (for e.g., the “fraud”  class in a fraud prediction model) whose score is considered by downstream applications. In such cases, it is reasonable to always use the protagonist class for attributions. Comparing attribution distributions. There are several choices for comparing distributions of attributions, including, distribution divergence metrics (e.g., Jensen-Shannon divergence) and various statistical tests (e.g. Kolmogorov-Smirnov test). Here, we use a relatively simple method of comparing the average absolute value of the attributions. This value captures the magnitude of contribution of each feature. Since attributions are in units of the prediction score, the difference in average absolute attribution can also be interpreted in units of prediction score. A large difference typically translates into a large impact on the prediction.Next stepsTo get started with Feature Attribution monitoring, start trying it with the Model Monitoring documentation. Also, Marc Cohen created a great Notebook material for learning how to use the functionality with an end-to-end scenario. By incorporating Vertex Model Monitoring and Explainable AI features with the best practices, you would be able to experience and learn “how to build and operate Google-scale production ML systems” for supporting mission critical businesses and services.Note #1:When Feature Attribution monitoring exposes false positives and false negativesFeature Attribution monitoring is a powerful tool, but also has some caveats; sometimes it exposes false positives and false negatives, as illustrated by the following cases. Thus, when you apply the method to a production system, consider using it in a combination with other methods such as Feature Distribution monitoring for better understanding of the behaviour of your ML models.[False negative] Univariate drift in attributions may fail to capture multivariate drift in features when the model has no interactionsExample: Consider a linear model y = x1 +…+ xn. Here, univariate drift in attributions will be proportional to univariate drift in features. Thus, attribution drift would be tiny if univariate drift in features is tiny, regardless of any multivariate drift.[False negative] Drift in features that are unimportant to the model but affect model performance but may not manifest up in the attribution space.Example: Consider a task y = x1 XOR x2 and model  y_hat = x1. Let’s say the training distribution is an equal mix of <1, 0> and <0, 0> while the production distribution is an equal mix of <1, 1> and <0, 1>. While feature x2 has zero attribution (and therefore zero attribution drift), drift in x2 has a massive impact on model performance.[False positive] Drift in important features may not always affect model performanceExample: Let’s say in the XOR example, the production distribution consists  of just <1, 0>. While there is large drift in the input feature x1, it does not affect performance.Note #2: Combining Feature Distribution and Feature AttributionsBy combining both Feature Distribution and Feature Attributions monitoring, we can obtain deeper insights on what changes might be affecting the model. The table below provides some potential directions based on combining the observations from the two monitoring methods.Related ArticleMonitor models for training-serving skew with Vertex AIThis blog post focuses on how Vertex AI enables one of the core aspects of MLOps: monitoring models deployed in production for training-s…Read Article
Quelle: Google Cloud Platform

Docker Captain Take 5 – Francesco Ciulla

Docker Captain Take 5 – Francesco Ciulla

Docker Captains are select members of the community that are both experts in their field and are passionate about sharing their Docker knowledge with others. “Docker Captains Take 5” is a regular blog series where we get a closer look at our Captains and ask them the same broad set of questions ranging from what their best Docker tip is to whether they prefer cats or dogs (personally, we like whales and turtles over here). Today, we’re interviewing Francesco Ciulla who joined the Docker Captains program last month. He is a DevOps consultant and is based in Rome.

How/when did you first discover Docker?

It was around 2015. I was very curious and started researching. The funny fact is that I didn’t have any online presence at the time so I was just studying on my own trying to figure out how it worked.

And now I know many Docker Captains, like Bret Fisher, Michael Irwin, and Gianluca Arbezzano. It’s amazing!

What is your favorite Docker command?

This is a nice question! I think I will go with “docker compose up –build”, this is exactly what you need to test your command on your development environment. Another one is docker exec, it’s very handy.

What is your top tip you think other people don’t know for working with Docker?

Many think that Docker is simply something you install on your machine, but it is much more: it is an entire service platform that offers complete end-to-end support for containerizing your applications. 

It also has the best community that exists, and this makes the difference in terms of use of a product, it is not only for the features it offers but also for the countless answers that the people who use Docker exchange with each other. 

What’s the coolest Docker demo you have done/seen ?

Personally I did a Docker Demo for the Google Developer Group Memphis about one year ago, and it was my first live webinar as a speaker! I totally loved that and I got many questions at the end!

What have you worked on in the past 6 months that you’re particularly proud of?

A few months ago, I became an Advocate developer for TinyStacks, a company which helps you to deploy your Docker application on AWS, leaving you the control of what you have deployed and started writing a series of articles and making a series of videos with them. It is an incredible experience. Our YouTube channel is where I create technical content – https://www.youtube.com/watch?v=5NUAZSvWAo0 

What do you anticipate will be Docker’s biggest announcement next year?

Docker Desktop becoming even more powerful and lightweight, with improved UI commands and better container monitoring and control.

What do you think is going to be Docker’s biggest challenge next year?

Giving the support for the new Business model and answer all the questions that will come from all the companies and individuals using Docker.

What are some personal goals for the next year with respect to the Docker community?

I would like to participate in the next DockerCon as a speaker. This has been one of my dreams and now I see this reachable. Another goal would be to invite Docker executives on my YouTube Channel, I have created a new format which I think would fit this!

What talk would you most love to see at DockerCon 2022?

I would like to see an in-person panel with multiple guest speakers, basically what I did during the last Docker Community All-Hands but in-person.

Looking to the distant future, what is the technology that you’re most excited about and that you think holds a lot of promise?

I am very excited about Web 3.0 and about how it will revolutionize how we use internet and our devices.

Rapid fire questions…

What new skill have you mastered during the pandemic?

Online presence and how to make videos

Cats or Dogs?

Cats

Salty, sour or sweet?

Salty

Beach or mountains?

Mountains

Your most often used emoji?

  , of course
The post Docker Captain Take 5 – Francesco Ciulla appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/

AWS IoT Device Defender unterstützt jetzt Detect-Alarmverifizierungszustände

Mit AWS IoT Device Defender können Kunden jetzt einen Alarm basierend auf ihrer Untersuchung erkannter Verhaltensanomalien verifizieren. Sie können einen Alarm als echt-positiv, gutartig-positiv, falsch-positiv oder unbekannt verifizieren und eine Beschreibung der Verifizierung angeben. Benutzer, z. B. ein Sicherheits- oder Betriebsteam, können damit Alarme verwalten und die Reaktionszeit verbessern.
Quelle: aws.amazon.com

AWS Storage Gateway vereinfacht die Band-Verwaltung in Tape Gateway

AWS Storage Gateway macht es Ihnen jetzt einfacher und schneller, Ihre in AWS gespeicherten virtuellen Bänder mit Tape Gateway zu suchen, anzuzeigen und zu verwalten. Auf der Seite Bänder in der Storage-Gateway-Managementkonsole können Sie jetzt schnell nach Bändern suchen, indem Sie gängige Filter wie Band-Barcode und -Status verwenden. Sie wählen einfach den gewünschten Filter aus dem Dropdown-Menü aus und grenzen Ihre Suche schnell auf den entsprechenden Bandsatz ein, was Ihnen Zeit und Verwaltungsaufwand spart. Um beispielsweise archivierte Bänder zu löschen, die Ihren definierten Aufbewahrungszeitraum überschreiten, können Sie den Filter Archiviert verwenden, den gewünschten Zeitraum auswählen und alle Bänder, die Ihrem angegebenen Filter entsprechen, mit wenigen Klicks löschen.
Quelle: aws.amazon.com

Amazon RDS for Oracle unterstützt jetzt sqlnet.ora-Client-Parameter für die Oracle Native Network Encryption (NNE)-Option

Amazon Relational Database Service (Amazon RDS) for Oracle unterstützt jetzt vier neue, vom Kunden modifizierbare sqlnet.ora-Client-Parameter für die Oracle Native Network Encryption (NNE)-Option. Amazon RDS for Oracle unterstützt bereits Serverparameter, die Verschlüsselungseigenschaften für eingehende Sitzungen festlegen. Diese Client-Parameter gelten für ausgehende Verbindungen, wie sie z. B. von Datenbankverbindungen verwendet werden.
Quelle: aws.amazon.com