September 2019 - Seite 87 von 88 - Cloud Computing Köln

Kubernetes reached an important milestone recently: the publication of its first-ever security audit! Sponsored by the Cloud Native Computing Foundation (CNCF), this security audit reinforces what has been apparent to us for some time now: Kubernetes is a mature open-source project for organizations to use as their infrastructure foundation.While every audit will uncover something, this report only found a relatively small number of significant vulnerabilities that need to be addressed. “Despite many important findings, we did not see fundamental architectural design flaws, or critical vulnerabilities that should cause pause when adopting Kubernetes for high-security workloads or critical business functions,” said Aaron Small, Product Manager, Google Cloud and member of the Security Audit Working Group. Further, Kubernetes has an established vulnerability reporting, response, and disclosure process, which is staffed with senior developers who can triage and take action on issues.Performing this security audit was a big effort on behalf of the CNCF, which has a mandate to improve the security of its projects via its Best Practices Badge Program. To take Kubernetes through this first security audit, the Kubernetes Steering Committee formed a working group, developed an RFP, worked with vendors, reviewed and then finally published the report. You can get your hands on the full report on the Working Group’s GitHub page, or read the highlights in the CNCF blog post.Kubernetes security for GKE and Anthos usersClocking in at 241 pages, the final report is very thorough and interesting and we encourage you to read it. But what if you’re just interested in what this report means for Google Cloud’s managed platforms, Google Kubernetes Engine (GKE) and Anthos? If you’re not going to read the whole thing, here’s the gist of the report and takeaways for Google Cloud customers.GKE makes it easy for you to follow recommended configurationsThe report lays out a list of recommended actions for cluster administrators, including using RBAC, applying a Network Policy, and limiting access to logs which may contain sensitive information. The report also calls out Kubernetes’ default settings. In GKE, we’ve been actively changing these over time, including turning off ABAC and basic authentication by default, to make sure new clusters you create are more secure. To apply the recommended configurations in GKE, and see which have already been applied for you, check out the GKE hardening guide.It’s not all up to you The threat model assessed the security posture of eight major components, but because of the GKE shared responsibility model, you don’t have to worry about all of them. GKE is responsible for providing updates to vulnerabilities for the eight components listed in the report, while you as the user are responsible for upgrading nodes and configuration related to workloads. You don’t even need to upgrade nodes if you leave node auto-upgrade enabled. Kubernetes and GKE security are only going to keep getting betterWith more eyes on this shared, open source technology, more well-hidden bugs are likely to be found and remediated. The Kubernetes community dedicated significant time and resources to this audit, emphasizing that security is truly a top priority. With open audits like the one performed by the CNCF, it’s easier for researchers—or your team—to understand the real threats, and spend their time further researching or remediating the most complex issues. And when issues do arise, as we’ve seen multiple times with recent vulnerabilities, the upstream Kubernetes Product Security Committee is on top of it, quickly responding and providing fixes to the community. Finally, since GKE is an official distribution, we pick up patches as they become available in Kubernetes and make them available automatically for the control plane, master, and node. Masters are automatically upgraded and patched, and if you have node auto-upgrade enabled, your node patches will be automatically applied too. You can track the progress to address the vulnerabilities surfaced by this report in the issue dashboard.If you want to dig in deeper, check out the full report, available on GitHub. Thanks again to the Kubernetes Security Audit Working Group, the CNCF, Trail of Bits and Atredis Partners for the amazing work they did to complete this in-depth assessment! To learn more about trends in container security here at Google Cloud, be sure to follow our Exploring container security blog series.
Quelle: Google Cloud Platform

2. September 2019

da Agency

How to quickly solve machine learning forecasting problems using Pandas and BigQuery

Time-series forecasting problems are ubiquitous throughout the business world. For example, you may want to predict the probability that some event will happen in the future or forecast how many units of a product you’ll sell over the next six months. Forecasting like this can be posed as a supervised machine learning problem. Like many machine learning problems, the most time-consuming part of forecasting can be setting up the problem, constructing the input, and feature engineering. Once you have created the features and labels that come out of this process, you are ready to train your model.A common approach to creating features and labels is to use a sliding window where the features are historical entries and the label(s) represent entries in the future. As any data-scientist that works with time-series knows, this sliding window approach can be tricky to get right.A sliding window on an example dataset. Each window represents a feature vector for the dataset and the label(s) is one or more points in the future.Below is a good workflow for tackling forecasting problems:1. Create features and labels on a subsample of data using Pandas and train an initial model locally2. Create features and labels on the full dataset using BigQuery3. Utilize BigQuery ML to build a scalable machine learning model4. (Advanced) Build a forecasting model using Recurrent Neural Networks in Keras and TensorFlowIn the rest of this blog, we’ll use an example to provide more detail into how to build a forecasting model using the above workflow. (The code is available on AI Hub)First, train locallyMachine learning is all about running experiments. The faster you can run experiments, the more quickly you can get feedback, and thus the faster you can get to a Minimum Viable Model (MVM). It’s beneficial, then, to first work on a subsample of your dataset and train locally before scaling out your model using the entire dataset.Let’s build a model to forecast the median housing price week-by-week for New York City. We spun up a Deep Learning VM on Cloud AI Platform and loaded our data from nyc.gov into BigQuery. Our dataset goes back to 2003, but for now let’s just use prices beginning 2011.Since our goal is to forecast future prices, let’s create sliding windows that accumulate historical prices (features) and a future price (label). Our source table contains date and median price:Here is the entire dataset plotted over time:To create our features, we’ll pick a historical window size—e.g., one year—that will be used to forecast the median home price in six months. To do this, we have implemented a reusable function based on Pandas that allows you to easily generate time-series features and labels. Feel free to use this function on your own dataset.After running create_rolling_features_label, a feature vector of length 52 (plus the date features) is created for each example, representing the features before the prediction date.This can be shown with a rolling window:The create_rolling_features_label function creates windows for the feature and label. In this case, the features consist of 52 weeks and the label consists of a week 6 months into the future.Once we have the features and labels, the next step is to create a training and test set. In time-series problems, it’s important to split them temporally so that you are not leaking future information that would not be available at test time into the trained model.The training set (blue) will consist of data where the label occurs before the split date (2015-12-30′), while the test set (green) consists of rows where the label is after this date.In practice, you may want to scale your data using z-normalization or detrend your data to reduce seasonality effects. It may help to utilize differencing, as well to remove trend information. Now that we have features and labels, this simply becomes a traditional supervised learning problem, and you can use your favorite ML library to train a model. Here is a simple example using sklearn:Scale our modelLet’s imagine we want to put our model into production and automatically run it every week, using batch jobs, to get a better idea of future sales.Let’s also imagine we may want to forecast a model day-by-day.Our data is stored in BigQuery, so let’s use the same logic that we used in Pandas to create features and labels, but instead run it at scale using BigQuery. We have developed a generalized Python function that creates a SQL string that lets you do this with BigQuery:We pass the table name that contains our data, the value name that we are interested in, the window size (which is the input sequence length), the horizon of how far ahead in time we skip between our features and our labels, and the labels_size (which is the output sequence length). Labels size is equal to 1 here because, for now, we are only modeling sequence-to-one—even though this data pipeline can handle sequence-to-sequence. Feel free to write your own sequence-to-sequence model to take full advantage of the data pipeline!We can then execute the SQL string scalable_time_series in BigQuery. A sample of the output shows that each row is a different sequence. For each sequence, we can see the time ranges of the features and the labels. For the features, the timespan is 52 weeks, which is the window_size, and for labels it is one day, which is the labels_size.Looking at the same sampled rows, we can see how the training data is laid out. We have a column for each timestep of the previous price, starting with the farthest back in time on the left and moving forward. The last column is the label, the price one week ahead.Now we have our data, ready for training, in a BigQuery table. Let’s take advantage of BigQuery ML and build a forecasting model using SQL.Above we are creating a linear regression model using our 52 past price features and predicting our label price_ahead_1. This will create a BQML MODEL in our bqml_forecasting dataset.We can check how our model performed by calling TRAINING_INFO. This shows the training run index, iteration index, the training and eval loss at each iteration, the duration of the iteration, and the iteration’s learning rate. Our model is training well since the eval loss is continually getting smaller for each iteration.We can also do an evaluation of our trained model by calling EVALUATE. This will show common evaluation metrics that we can use to compare our model with other models to find the best choice among all of our options.Lastly, machine learning is all about prediction. The training is just a means to an end. We can get our predictions by using the above query, where we have prepended predicted_ to the name of our label.Now, let’s imagine that we want to run this model every week. We can easily create a batch job that is automatically executed using a scheduled query.Of course, if we want to build a more custom model, we can use TensorFlow or another machine library, while using this same data engineering approach to create our features and labels to be read into our custom machine learning model. This technique could possibly improve performance.To use an ML framework like TensorFlow, we’ll need to write the model code and also get our data in the right format to be read into our model. We can make a slight modification to the previous query we used for BigQuery ML so that the data will be amenable to the CSV file format. For this example, imagine you wanted to build a sequence-to-sequence model in TensorFlow that can handle variable length features. One approach to achieve this would be to aggregate all the features into a single column named med_sales_price_agg, separated by semicolons. The features (if we have more than just this feature in the future) and the label are all separated by a comma.We’ll execute the query in BigQuery and will make a table for train and eval. This will then get exported to CSV files in Cloud Storage. The diagram above is what one of the exported CSV files looks like—at least the header and the first line—with some comments added. Then when reading the data into our model using tf.data, we will specify the delimiter pattern shown above to correctly parse the data.Please check out our notebook on AI Hub for an end-to-end example showing how this would work in practice and how to submit a training job on Google Cloud AI Platform. For model serving, the model can deployed on AI Platform or it can be deployed directly in BigQuery. ConclusionThat’s it! The workflow we shared will allow you to automatically and quickly setup any time-series forecasting problem. Of course, this framework can also be adapted for a classification problem, like using a customer’s historical behavior to predict the probability of churn or to identify anomalous behavior over time. Regardless of the model you build, these approaches let you quickly build an initial model locally, then scale to the cloud using BigQuery.Learn more about BigQuery and AI Platform.
Quelle: Google Cloud Platform