Blogs, week of Feb 27th

Here’s what RDO enthusiasts have been blogging about in the last couple of weeks. I encourage you to particularly read Julie’ excellent writeup of the OpenStack Pike PTG last week in Atlanta. And have a look at my video series from the PTG for other engineers’ perspectives.

OpenStack Pike PTG: OpenStack Client – Tips and background for interested contributors by jpichon

Last week I went off to Atlanta for the first OpenStack Project Teams Gathering, for a productive week discussing all sort of issues and cross-projects concerns with fellow OpenStack contributors.

Read more at http://tm3.org/eb

SDN with Red Hat OpenStack Platform: OpenDaylight Integration by Nir Yechiel, Senior Technical Product Manager at Red Hat

OpenDaylight is an open source project under the Linux Foundation with the goal of furthering the adoption and innovation of software-defined networking (SDN) through the creation of a common industry supported platform. Red Hat is a Platinum Founding member of OpenDaylight and part of the community alongside a list of participants that covers the gamut  from individual contributors to large network companies, making it a powerful and innovative engine that can cover many use-cases.

Read more at http://tm3.org/e8

Installing TripleO Quickstart by Carlos Camacho

This is a brief recipe about how to manually install TripleO Quickstart in a remote 32GB RAM box and not dying trying it.

Read more at http://tm3.org/ea

RDO Ocata released by jpena

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Ocata for RPM-based distributions, CentOS Linux 7 and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Ocata is the 15th release from the OpenStack project, which is the work of more than 2500 contributors from around the world (source).

Read more at http://tm3.org/e9

OpenStack Project Team Gathering, Atlanta, 2017 by Rich Bowen

Over the last several years, OpenStack has conducted OpenStack Summit twice a year. One of these occurs in North America, and the other one alternates between Europe and Asia/Pacific.

Read more at http://tm3.org/e0

Setting up a nested KVM guest for developing & testing PCI device assignment with NUMA by Daniel Berrange

Over the past few years OpenStack Nova project has gained support for managing VM usage of NUMA, huge pages and PCI device assignment. One of the more challenging aspects of this is availability of hardware to develop and test against. In the ideal world it would be possible to emulate everything we need using KVM, enabling developers / test infrastructure to exercise the code without needing access to bare metal hardware supporting these features.

Read more at http://tm3.org/e1

ANNOUNCE: libosinfo 1.0.0 release by Daniel Berrange

NB, this blog post was intended to be published back in November last year, but got forgotten in draft stage. Publishing now in case anyone missed the release…

Read more at http://tm3.org/e2

Containerizing Databases with Kubernetes and Stateful Sets by Andrew Beekhof

The canonical example for Stateful Sets with a replicated application in Kubernetes is a database.

Read more at http://tm3.org/e3

Announcing the ARA 0.11 release by dmsimard

We’re on the road to version 1.0.0 and we’re getting closer: introducing the release of version 0.11!

Read more at http://tm3.org/e4
Quelle: RDO

Optimizing rolling feature engineering for time series data

In this blog post, I want to talk about how data scientists can efficiently perform certain types of feature engineering at scale. Before we dive into sample code, I will briefly set the context of how telemetry data gets generated and why businesses are interested in using such data.

To get started, we know that these days machines are instrumented with multiple in-built sensors to record various measurements while it is in operation. Thus, these machines end up generating a lot of telemetry data that can be used once this data is transferred off these machines and stored in a centralized repository. Businesses these days hope to use their amassed data to help answer questions like, “When is a machine likely to fail?” or, “When does a spare part for a machine need to be re-ordered?” Eventually this could help them reduce time and costs incurred in adhoc maintenance activities.

After having built many models, I have noticed that typical telemetry data that gets generated from the various sensors in their raw format add very little value. Sensors by design can generate data at a regular time interval, thus the data consists of multiple time series which can be sorted by time for each machine to build meaningful additional features. So, data scientists, like me, end up enhancing the dataset by performing additional feature engineering on this raw sensor data.

The most common features I begin with are to build out rolling aggregates using my preferred statistical programming language on a sample dataset. Here are some code snippets on how I would generate rolling aggregates for a specific window size using R/Python for machines which records voltage, rotation, pressure, and vibration measurements by date. These code snippets can be run on any other local R/Python IDE, within a Jupyter notebook or within an Azure ML Studio environment.

R

Python

telemetrymean <- telemetry %>%
    arrange(machineID, datetime) %>%
    group_by(machineID) %>%

    mutate(voltmean = rollapply(volt, width = 3, FUN = mean, align = “right”, fill = NA, by = 3),
                  rotatemean = rollapply(rotate, width = 3, FUN = mean, align = “right”, fill = NA, by = 3),
                  pressuremean = rollapply(pressure, width = 3, FUN = mean, align = “right”, fill = NA, by = 3),
                  vibrationmean = rollapply(vibration, width = 3, FUN = mean, align = “right”, fill = NA, by = 3)) %>%
    select(datetime, machineID, voltmean, rotatemean, pressuremean, vibrationmean) %>%
    filter(!is.na(voltmean)) %>%
    ungroup()

temp = []
fields = [&;volt&039;, &039;rotate&039;, &039;pressure&039;, &039;vibration&039;]
for col in fields:
    temp.append(pd.pivot_table(telemetry,
                               index=&039;datetime&039;,
                               columns=&039;machineID&039;,
                               values=col).resample(&039;3H&039;, closed=&039;left&039;, label=&039;right&039;, how=&039;mean&039;).unstack())
telemetry_mean_3h = pd.concat(temp, axis=1)
telemetry_mean_3h.columns = [i + &039;mean_3h&039; for i in fields]
telemetry_mean_3h.reset_index(inplace=True)

For more details on a description of the end to end use case please review the R code and Python code.

Once my R/Python code is tested in the local environment with a small dataset and deemed fit, I would then need to move it into a production environment. I would now need to also consider the various options on how to scale the same computation for a much larger dataset while ensuring efficiency. I have noticed that it is often more efficient to work with data that is indexed for such large-scale computations using some form of SQL query. Here is how I translated the code originally written in R/Python into SQL query language. 

Sample SQL code

select rt.datetime, rt.machineID, rt.voltmean, rt.rotatemean, rt.pressuremean, rt.vibrationmean
from
(select avg(volt) over(partition by machineID order by machineID, datetime rows 2 preceding) as voltmean,
        avg(rotate) over(partition by machineID order by machineID, datetime rows 2 preceding) as rotatemean,
        avg(pressure) over(partition by machineID order by machineID, datetime rows 2 preceding) as pressuremean,
        avg(vibration) over(partition by machineID order by machineID, datetime rows 2 preceding) as vibrationmean,
        row_number() over (partition by machineID order by machineID, datetime) as rn,
        machineID, datetime
from telemetry) rt
where rt.rn % 3 = 0 and rt.voltmean is not null
order by rt.machineID, rt.datetime

For more details please review the SQL code.

Based on my experience with predictive maintenance use cases, I have noticed that SQL rolling feature engineering was best suited for time series ordered data split by machine. For on-prem scenarios, now with SQL Server R Services, it also enables R enthusiasts to run their R code to do other data wrangling, model building and even scoring code from right within SQL Server. Overall, this ends up being more efficient as there is no data movement, and the computation ends up being scalable.

However, there are many other ways of operationalizing this type of feature engineering at scale. For example, R Server on HDInsight combines the functionality of R with the power of Hadoop and Spark, and Azure Data Lake Analytics now supports running R on petabytes of data. The power of can be put towards transforming raw sensor data into meaningful data that can be leveraged for machine learning applications to provide value back to the business.
Quelle: Azure

Connect Tableau to an Azure Analysis Services server

With Azure Analysis Services, you can connect to your severs by using Power BI, Excel, and many third-party client tools. In this post, we’ll focus on how to connect to your server from Tableau Desktop.

Before getting started, you’ll need:

A data model deployed at an Azure Analysis Services server – Creating your first data model in Azure Analysis Services.
Tableau Desktop
The latest MSOLAP.7 provider

In Tableau Desktop 10.1, under Connect, click To a Server > Microsoft Analysis Services.

In the connection dialog, in Server, enter the name of your Azure Analysis Services server. Then select Use a specific username and password, and then type the organizational user name, for example nancy@adventureworks.com, and password.

In the Data Source tab, select the database and cube/model or perspective, and then click on Sheet 1.

The Tableau workbook is now connected to your Azure Analysis Services server. You will see the fields from your model listed under dimensions and measures on the side. You can drag and drop those fields to the sheet to start building out your visuals.

Learn more about Azure Analysis Services.
Quelle: Azure

AWS Direct Connect enables Link Aggregation Group for additional AWS regions

We are excited to announce support for 1G and 10G Link Aggregation Groups (LAG). Customers in AWS GovCloud (US), Europe (Frankfurt), Europe (London), Europe (Ireland), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney) regions can start using LAG to link existing connections on the same AWS device, or request new connections. Customers who wish to purchase multiple ports, but treat them like a single managed connection can now use our LAG feature to do just this. In addition to ordering and managing bundles, customers can now see when their ports fall on the same router so customers can manage your network availability.
Quelle: aws.amazon.com